| | It took me a bit of time but I finally found a solution for loading a XML file that has accented characters (like áéíóúâä) into a UTF-8 format. I'm loading data for a client and it ended up having a name in it with an accented e character. For the first time on this project System.XmlDocument.Load() was blowing up. With a lot of Googling I finally found a link that gave me, what I hope, is the solution for this problem. For now it is working so I'll go with it. The link to the article I found is in the code sample below as well as the image to the left. The magic happens by reading in while enforcing a double-byte encoding then saving out in an encoding that gives the visual representation we, in the US, would expect. Hope this helps someone else. |
1: #region Fix Character Encodings
2: // Need to drop accented characters back to normal characters...
3: // reference: http://www.codeproject.com/KB/cs/EncodingAccents.aspx //
4: StreamReader sr = new StreamReader(_pfInfo.WorkingFile, Encoding.GetEncoding("iso-8859-1"));
5: string fileContents = sr.ReadToEnd();
6: sr.Close();
7: sr = null;
8:
9: StreamWriter sw = new StreamWriter(_pfInfo.WorkingFile, false, Encoding.GetEncoding("iso-8859-8"));
10: sw.Write(fileContents);
11: sw.Flush();
12: sw.Close();
13: sw = null;
14: #endregion
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }