I came across this funny bug when one of my friends forwarded a message to me describing itand also it was posted here in the forum. Unfortunately the message missed the technical aspects of the bug. The reason for this bug is interesting enough to blog.
Here are the steps to see what the bug does.
1. Open Notepad
2. Type in this sentence exactly (without quotes): "this app can break"
3. Save the file
4. Close Notepad
5. Open the saved file by double clicking it
Most users would find 9 boxes, instead of that string.
Similar thing happens with other strings like:
1. "Bush hid the facts"
2. "Bill hid the facts"
3. "aa aaa aaa"
4. "bb bbb bbb"
There are many more. You can even craft such strings, if you understand what is going on.
Let's take "this app can break" as an example and try to understand what's going on.
The hex-codes for the string is:
74 68 69 73 20 61 70 70 20 63 61 6e 20 62 72 65 61 6b
Now let us assume that these 18 bytes do not represent ANSI or ASCII characters. Instead let us assume they represent Unicode characters and try to interpret the text now.
After re-arranging them to represent Unicode characters, we get this:
6874 7369 6120 7070 6320 6e61 6220 6572 6b61
Click on the codes to find out what characters they represent. Each code represents a CJK ideograph! (CJK stands for Chinese, Japanese, and Korean).
So, the whole confusion is that the codes for those 18 ASCII characters also happen to represent 9 valid Unicode characters.
When notepad opens a text file, it tries to guess whether the byte stream represents Unicode characters. If it finds they aren't Unicode characters, it interprets them as ASCII characters and displays the content of the file. In this particular case, notepad finds the byte stream to be Unicode and hence displays them as Unicode characters.
If you find 9 boxes, it's because you don't have CJK fonts installed on your system and hence you can't see the CJK ideographs. Instead, notepad displays them as boxes.
Tags: bug, notepad, unicode, windows