What I Learned from Icon

by Glenn Vanderburg

[This is a short essay I wrote for a local meetup, where attendees were asked to come prepared to share their experiences (good or bad) with non-mainstream programming languages.]

Icon is a language designed in the late ’70s and early ’80s, primarily by Ralph Griswold. In the ’60s, Griswold was responsible for Snobol, the first programming language that was specialized for dealing with text. Icon was an effort to design a next-generation language that built on the Snobol ideas, but was more uniform and better integrated.

In many ways, Icon was the first “scripting language.” It’s a very high-level language with excellent facilities for manipulating textual data and integrating with its environment. Perhaps one reason it was never very popular is that it was so far ahead of its time; it predated the rise of Perl and Tcl (the languages that started the scripting language craze) by a few years.

For a guy raised on BASIC, Fortran, PL/I, Pascal, and C, Icon was downright weird. But it taught me a lot of things that are still useful to me today.

Nullology

When you call a function in Icon, it can do one of two things: it can return a value, or it can fail. Failure sounds similar to modern exception handling facilities, but Icon failure is different in a couple of ways. First, when an Icon function fails, there’s no indication of why it failed. Second, failure is expected. It happens all the time, and it’s an important part of the way Icon facilities work. (There is a separate, primitive error handling facility for genuine exceptional conditions.)

Many Icon functions are generators — they can return more than one value. In certain contexts, Icon will continue calling the function until it fails. So in Icon, failure really means “no more values.” For example, here is a complete Icon program that copies its input to its output:

every write(read())

The every clause means “do this until it fails”. write() writes its argument to standard output, and read() returns successive lines from standard input, finally failing when end-of-file is reached.

Having come to Icon straight from C, where failure was typically indicated by some special, reserved return value or other ad hoc mechanisms (and where those failure indicators were all too easy to miss), the concise, expressive beauty of that little program woke me up. I learned

Lesson 1: It’s important to distinguish between having something and having nothing. Whether it’s the value returned from a function or the value held in a variable, it’s important to be able to say “There isn’t one.”

That lesson helped me appreciate exceptions when I encountered them a few years later, and it also helped me appreciate other languages like Lisp, Smalltalk, Ruby, and (to some degree) Java, where variables contain references that can be null instead of holding the data directly.

Uniformity

Icon was the first language I did any serious work with that was an expression language. That is, Icon has no statements, only expressions that have result values. The first time I saw something like this:

sign := if count > 0 then 1 else -1

I was really confused at first, but I soon understood what was going on: what I was used to thinking of as an “if statement” was actually, in Icon, just an “if expression,” and it could have a result value just like anything else. And why shouldn’t it? Once I thought about it, the distinction between statements and expressions in the languages I was familiar with seemed artificial and arbitrary. Further experience with Icon proved that point. It’s certainly possible to misuse constructions like that, yielding impenetrable code. But sometimes they’re exactly what you need to do things right.

Lesson 2: Often, the things you think are fundamentally different are in fact exactly the same. Don’t assume that the things you learned in one context are universal truths.

This lesson has helped me, over the years, to quickly wrap my brain around a lot of new things, including languages, tools, programming paradigms, and platforms.

Truth

I’ve already mentioned generators and failure semantics. But once you have expressions that can return multiple values and even fail, there’s a problem: what does an expression like e1 | e2 mean? In contexts where multiple values would be useful, you might want one definition, but in traditional contexts you might want a different one, so that familiar-looking constructs look like what people are accustomed to. I won’t go into details (this isn’t an Icon class, after all) but the Icon designers were able to find a single, alternate model for such expressions that works the way you want in both contexts, so that traditional constructs and new, Icon ideas coexist without special cases. So you can do all of these things:

if (i = 1 | i = 0) then ...
if i = (1|0) then ...
every write(read("header") | read("body") | read("footer"))

and they do what you would expect. (In the last example, | works like a concatenation operator, so the result is to concatenate those three files. But the actual semantics of the operator is unchanged — all three examples use a single definition of |, combined with generators, expression failure, and goal-directed evaluation.

Lesson 3: What we’re taught about programming is just like what we’re taught about science: it’s convenient to think that it’s how the world really works, but the truth is we just don’t know. What science gives us is theories that seem to explain the world because they match all the experiments we can think of. But sooner or later our knowledge of the world expands, and we realize that the previous theory was just an approximation. Once things got fast enough and large enough, Newton wasn’t enough, and we needed Einstein. (And then they got small enough and we needed Planck and Bohr.) With single-valued functions, Boolean logic was enough, but throw generators into the mix, and you need something more.

This lesson has helped me many times when I’ve tried to deal with complicated, confusing, and sometimes contradictory business rules and requirements. Inspired by Icon’s example, I’ve often been successful at finding deeper, simpler, and more general rules that are capable of supporting all of those surface complications as mere variations. Not always, but often.

Representation

When I encountered Icon, I was a Unix user, and I had become fairly proficient with regular expressions as text patterns. Icon has an extremely sophisticated text pattern facility that is not based on regular expressions, and is actually more powerful. (With Perl 6, after several layers of additions to classic regular expressions, Perl’s regexps have finally achieved power equivalent to Icon patterns. But that’s another story.)

Languages and protocols are often specified using grammars that look like this simple grammar for arithmetic expressions:

X := T | T "+" X
T := E | E "*" T
E := "x" | "y" | "z" | "(" X ")"

(Only addition and multiplication are supported, with variables x, y, and z.)

As powerful as they are, it’s all but impossible to implement a parser for a grammar like that using classic regular expressions. (And if you somehow succeeded, it would be very difficult to maintain and extend that parser.)

But using Icon’s programmable pattern facility, you can implement such a parser like this:

procedure X()
    suspend [T()] | [T(), ="+", X()]
end

procedure T()
    suspend [E()] | [E(), ="*", T()]
end

procedure E()
    suspend [="x" | ="y" | ="z"] | [="(", X(), =")"]
end

Wow! It looks almost just like the grammar! In fact, it would be really easy to write a program to generate the parser from the grammar!

You can call it like this:

parseTree = line ? {X()}

yacc in Icon is a one-day hack.

Lesson 4: Syntax matters. Programs are easier to understand if their syntax fits the problem domain. That’s why, even though it’s a dangerous language facility that’s prone to abuse, operator overloading is essential if a language is going to be good for heavy numerical processing. Regular expressions, as powerful as they are, don’t look like anything except regular expressions.

This lesson helps me know when to write a little, domain-specific language instead of trying to wedge the domain into my existing language. (Usually this is a very easy thing to do.) It pushes me toward dynamic, malleable languages like Ruby (and, dare I say it, Lisp) in favor of more mainstream things.

(The lesson that syntax matters is also strongly reinforced — but in a negative way — every time I use XSLT. Fortunately, I may soon be able to use XQuery for all the jobs where XSLT now reigns.)