The complete answer to "how did bush hide the facts"?

Discussion in 'The ChitChat Lounge' started by slashboyin, Jun 26, 2006.

  1. slashboyin

    slashboyin New Member

    I came across this funny bug when one of my friends forwarded a message to me describing itand also it was posted here in the forum. Unfortunately the message missed the technical aspects of the bug. The reason for this bug is interesting enough to blog.

    Here are the steps to see what the bug does.

    1. Open Notepad
    2. Type in this sentence exactly (without quotes): "this app can break"
    3. Save the file
    4. Close Notepad
    5. Open the saved file by double clicking it

    Most users would find 9 boxes, instead of that string.

    Similar thing happens with other strings like:

    1. "Bush hid the facts"
    2. "Bill hid the facts"
    3. "aa aaa aaa"
    4. "bb bbb bbb"

    There are many more. You can even craft such strings, if you understand what is going on.

    Let's take "this app can break" as an example and try to understand what's going on.

    The hex-codes for the string is:

    74 68 69 73 20 61 70 70 20 63 61 6e 20 62 72 65 61 6b

    Now let us assume that these 18 bytes do not represent ANSI or ASCII characters. Instead let us assume they represent Unicode characters and try to interpret the text now.

    After re-arranging them to represent Unicode characters, we get this:

    6874 7369 6120 7070 6320 6e61 6220 6572 6b61

    Click on the codes to find out what characters they represent. Each code represents a CJK ideograph! (CJK stands for Chinese, Japanese, and Korean).

    So, the whole confusion is that the codes for those 18 ASCII characters also happen to represent 9 valid Unicode characters.

    When notepad opens a text file, it tries to guess whether the byte stream represents Unicode characters. If it finds they aren't Unicode characters, it interprets them as ASCII characters and displays the content of the file. In this particular case, notepad finds the byte stream to be Unicode and hence displays them as Unicode characters.

    If you find 9 boxes, it's because you don't have CJK fonts installed on your system and hence you can't see the CJK ideographs. Instead, notepad displays them as boxes.

    References:

    1. https://www.wincustomize.com/articles.aspx?aid=117870&c=1
    2. https://blogs.msdn.com/michkap/archive/2006/06/14/631016.aspx
    3. https://www.unicode.org/charts/

    Tags: bug, notepad, unicode, windows
     
    alpha1, g0g0l and beginner_lavina like this.
  2. ambush

    ambush _RASTA_man_

    Me feeling stupid right now:eek::

    Put it in easier language
     
  3. g0g0l

    g0g0l ! SpAm

    good info. Reps..
     
  4. slashboyin

    slashboyin New Member

    @ambush: dont worry. i too didnt understand the full thing. as a matter of fact i didnt even read it.
     

Share This Page