Remove odd characters from PDF prints in VB.NET

By Aug 17, 2011

I had an odd bug at work the other day. In one of our data reporting systems, the text content for one report was rendering to the browser just fine.  But when the same report was exported to PDF using ABCpdf from WebSupergoo some characters were all mangled as shown in the following screenshots:

The messed up PDF output

The correct browser rendering

Well what to do?  I didn't even have any idea what the bad characters were but I suspected that they were some sort of UTF-8 abomination of double/single quotes or something of the sort.  So here's what I did...

  • Copied a portion of text from the browser containing the offending character, and pasted into Notepad++. Notepad showed me immediately that the character was an odd apostrophy looking thing.
  • Browsed over to a nice little Unicode Converter and pasted the odd appostraphy looking thing into the converter input field and clicked Convert.
  • The conversion showed me that the decimal code for this particular character is 8217. There were a couple of other characters with decimal codes of 8221 and 8220 (UTF8 left and right double quote characters).

Ah, now we have decimal equivalents for our stange characters.  This might be good - where to next?  Well, the Replace function sounds like a good bet so I tried...

Text = Text.Replace(Chr(8217), " ' ") which should have replaced the UTF8 appostrophy with the ASCII apostrophy.  It didn't ... WTH???!?

After a little more poking around I found out that we need to use ChrW() for character codes in the range < -32768 or > 65535 as described here. Now that we have the right function, our line becomes...

Text = Text.Replace(ChrW(8217), " ' ")

Success!!!! Yeah! Our funky UTF8 apostrophy character now prints as an ASCII appostrophy in our PDF prints.

Thanks for reading!

 

Add Your Comments