Have you ever seen an HTML page or email where everything looks fine, except instead of apostrophes there are odd question marks, or square blocks? You might also see other characters replaced similarly.
Most commonly, this occurs when importing HTML that has been created by Microsoft Word. For generating HTML, Word uses a specific character set called “Windows Latin 1” that has special characters like ‘smart quotes’ and trademark symbols.
When you view the email on your own machine, those characters will show up, but then when imported into Campaign Monitor they might disappear or be converted into incorrect characters.
Character encoding makes the difference
The reason is that Campaign Monitor sends in UTF-8 encoding (which covers a wide number of languages), and the special characters are not in the same location in UTF-8 as they are in Windows Latin 1.
So what to do about it? Well the first (and most thorough) option is to just not use MS Word to generate HTML. Word tends not only to cause character problems, but also adds vast amounts of unnecessary HTML to even simple pages.
If you view the source you will see rampaging hordes of span tags and CSS with oddly named classes everywhere. It can also tend to break tags that Campaign Monitor uses like <unsubscribe></unsubscribe> by inserting other tags inside them.
Of course, you can go right up to tools like Dreamweaver if you have the need.
Another alternative is to do some ‘find and replace’ work in notepad or similar to remove Word’s smart characters and replace them with the correct unicode characters. Some common ones to look out for are:
- For “ Left double quotes: Use “
- For ” Left double quotes: Use ”
- For ’ Apostrophe: Use ’
That way you can have the typographically correct characters show up in your email. Character encoding can be a tricky area, and you have to keep an eye on it in your HTML, in your subscribe form pages and in the subscriber lists your import.
Always keep in mind that Campaign Monitor will send in UTF-8 no matter what, so you want to import everything in UTF-8 to begin with, so no conversion occurs.
For more information on HTML and character encoding, read The Definitive Guide to Web Character Encoding at SitePoint.