Cory Doctorow on Word Processors and Text Editors

Via Mogadalai, I came upon a piece by Cory Doctorow on writing. This point of his particularly appealed to me:

  • Kill your word-processor
  • Word, Google Office and OpenOffice all come with a bewildering array of typesetting and automation settings that you can play with forever. Forget it. All that stuff is distraction, and the last thing you want is your tool second-guessing you, “correcting” your spelling, criticizing your sentence structure, and so on. The programmers who wrote your word processor type all day long, every day, and they have the power to buy or acquire any tool they can imagine for entering text into a computer. They don’t write their software with Word. They use a text-editor, like vi, Emacs, TextPad, BBEdit, Gedit, or any of a host of editors. These are some of the most venerable, reliable, powerful tools in the history of software (since they’re at the core of all other software) and they have almost no distracting features — but they do have powerful search-and-replace functions. Best of all, the humble .txt file can be read by practically every application on your computer, can be pasted directly into an email, and can’t transmit a virus.

    Spot on! This particularly appeals to me because I spent over 20 minutes yesterday trying to format a 5-page Word document that had been created piecemeal by several authors, each with their own style of headings, figure captions, and personal quirks that did not flow together. That was 20 minutes spent on not creating content. I have no objection to markup, but I have every objection to using presentational markup as a proxy for structural markup.

    “I know, I’ll use regular expressions”

    Plain language: The snark can approach to within 3% of the boojum.

    Here the “3” in “3%” is an an instance-specific result, hence it makes sense to code something like the following:

    The snark can approach to within %d% of the boojum.

    But of course, the trailing % sign is not displayed because the % sign is a format specifier for printf-like functions in several languages. What do we do? Double it of course!

    The snark can approach to within %d%% of the boojum.

    Now it displays correctly! We can now copy-paste the result into LaTeX for publishing.

    But wait a minute. Why not ask the program to print the text with LaTex markup directly instead of us having to do copy-paste work? If we send the preceding output straight into the LaTeX file:

    The snark can approach to within %d%% of the boojum.

    We get the LaTeX output:

    The snark can approach to within 3

    What happened? Of course, in addition to being a format specifier in printf and friends, % is also a comment character in LaTeX, and blocks everything downstream on that line. To make LaTeX display a % character, it needs to be escaped with a backslash. Let’s escape it then:

    The snark can approach to within %d\%% of the boojum.

    Still doesn’t work, and it’s a new mode of failure now — we need both the backslash and the percent symbols in the output, and they keep getting in each other’s escape route. OK, so we do both: escape the backslash and double the percent:

    The snark can approach to within %d\\%% of the boojum.

    Finally, it works! The first backslash protects the second backslash from the printf function, and the doubled percent is perceived as a single percent sign, which makes printf display “\%” in its output, which is of course LaTeX’s input, where the backslash prevents the % from being read as a comment character, and causes it to be displayed.

    Where we wanted to be:The snark can approach to within 3% of the boojum.

    How we got there: The snark can approach to within %d\\%% of the boojum.

    For those of you who don’t know Zawinski’s thoughts on regular expressions:

    Some people, when confronted with a problem, think “I know, I’ll use regular expressions”. Now they have two problems.