XmlDocument + StringWriter = EVIL
ok, you can proly mark this one up for me just being lazy/dumb. but, after months of nagging problems w/ string encodings for XSL-transformed results, it finally dawned on me how stoopid i've been.
XmlDocument + StringWriter = EVIL
cuz it's all about the encoding, folks.
since i do mostly web apps, i do lots of XSLT work in C#. this usually goes just great, but occasionally i end up w/ goofy encoding problems. for example, sometimes MSIE will refuse to render results as HTML and will instead just belch the XML onto the client window. sometimes, even though i *know* i indicate UTF-8
in my XSL documents, the result displayed in the browser shows UTF-16
. it really gets bad when i start putting together XML pipelines mixing plain XML w/ transformed docs. sometimes i just pull my hair out.
and it's all because i'm lazy/dumb. cuz StringWriter
has no business being involved in an XML pipeline. we all know that right? and we all know why, right? do we?
i did. but i forgot.
see strings are stored internally as UTF-16
(Unicode) in C#. that's cool. makes sense. but not when you want to ship around the string results of an XML pipeline. that's when you usually want to stick w/ UTF-8
. but StringWriter
don't play dat.
so i just stopped using StringWriter
to hold output form XML/XSL work. instead i use MemoryStream
and make sure to set the encoding beforehand. here's some examples:
first, the wrong/dumb/Old-Mike way:
private string Transform(XmlDocument xmldoc, XmlDocument xsldoc, XsltArgumentList args) { XPathNavigator xdNav = xmldoc.CreateNavigator(); XslTransform tr = new XslTransform(); tr.Load(xsldoc); StringWriter sw = new StringWriter(); tr.Transform(xdNav, args, sw); return sw.ToString(); }
the above code will always return a string encoded in UTF-16
. bummer.
now the proper/sane/New-Mike way:
private string Transform(XmlDocument xmldoc, XmlDocument xsldoc, XsltArgumentList args) { XPathNavigator xpn = xmldoc.CreateNavigator(); XslTransform tr = new XslTransform(); tr.Load(xsldoc); System.IO.MemoryStream ms = new System.IO.MemoryStream(); tr.Transform(xpn, args, ms); System.Text.Encoding enc = System.Text.Encoding.UTF8; return enc.GetString(ms.ToArray()); }
this will return UTF-8
every time. much better.