It’s up to interpretation

Yesterday my friend Khan posted this:

(plain text source code)

Sure looks like JavaScript, so pick your favorite browser console or Node.js. Execute the statements one by one. The first one works and the second does not. What gives?

TL;DR – the second semicolon is not really a semicolon.

Source code for programs is stored in plain text files. With few exceptions, all words are in English and all characters are from the set of printable 7-bit ASCII characters. If you are crafty or copy and paste from some other WYSIWYG editor (like Microsoft Word) you can end up inserting special characters that look like valid characters.

If you run into this yourself, first try this:

  1. re-type the broken line of code by hand
  2. if the re-typed line works, delete the broken line
  3. kelp calm and carry on

Works like a charm, every time. Eric Kolve taught me that trick for solving these kinds of problems (almost 20 years ago!). But of course you want to know why, and typing is a pain, so keep reading.

Whenever every character looks perfect but the code won’t compile/run, I first open the file or copy & paste the text into Vim. I did this with the two lines Khan presented. Right away I noticed a difference in the semicolons due to the fixed-width font my console uses (“Monospace, 9pt”, apparently). Confirmed with the Vim ga command on both semicolons, revealing the first is 0x3b (a proper 7-bit ASCII English semicolon, as any interpreter/compiler would expect) and the second is 0x037e, a special character far outside the 7-bit ASCII range that looks exactly like your usual semicolon.

vim screenshot two windows tips and source
This Vim screenshot shows two windows: my notes to self on the top and the source code from Khan on the bottom. Look closely at both semicolons.

Other tools I sometimes use for these types of problems: od (“octal dump”), xxd (a hex dumper), diff (put in two different files or do the diff using Vim windows), or the unicode command (when I’m looking for more information on a single character).

GNOME terminal screenshot showing output of unicode command on two things that look like semicolons
The GNU/Linux command “unicode” reveals to us (after some careful escaping, copying, and pasting) that the second semicolon-looking thing is not a semicolon at all.

I really like the book CODE by Charles Petzold for learning about character codes and computers in general. This book is something of a gentle introduction to the kinds of material covered in computer science machine architecture classes.

Original post by Khan on Facebook (not public).

Leave a Reply

Your email address will not be published. Required fields are marked *