Tcl does not support linked lists
Adam Sah
UC Berkeley Dept of Computer Science
Berkeley, CA 94720-1776
asah@cs.berkeley.edu
Introduction
--------------------------------------------------
One facet of the Tcl-is-evil-no-it's-not-because-a-million-frenchmen-can't-be-
-wrong flame war has focused on whether or not Tcl supports linked lists.
Having looked very carefully at its semantics over the past two years, I
can authoritatively say that Tcl does not support linked lists, and that
while any "Turing Complete" language can be extended to support
something that behaves like a linked list, such an extension in Tcl
would either violate the spirit of Tcl (ie. hemmorage in compatibility
problems or poor performance), or violate the spirit of linked lists
(ie. incorrect big-O running-time properties).
Note that "linked lists" are not the same as "arrays" or "vectors", which
is important if you are following previous commentary I've made on Tcl
in published papers and in previous USENET posts. For example, Tcl
"list"s require O(n) lookup time, which "violates the spirit" of arrays,
but which is fine for linked lists. This article is strictly about
linked lists and their properties.
As we will see, asking whether Tcl supports linked lists is a silly
question to ask, on par with whether elephants can juggle more than 7
chainsaws. First, you'd like to ask whether it's important if elephants
can juggle chainsaws (or anything else for that matter) before getting
worked up over whether it makes sense to worry how well they perform
this task. Rather than argue the more meaningful points directly, I'm
going to stick to the stated question.
What are linked lists?
----------------------------------------
For this conversation, I wish to define linked lists as _any_ data structure
having the behavior of linked lists, regardless of how they're actually
implemented. The standard pointer-cell diagrams of Lisp-like languages
are not applicable in a language lacking references, so if you want to
restrict your definition this narrowly, then Tcl obviously fails to
provide this service. More importantly, as VHLL users, we don't care
HOW a semantic is provided, so long as the abstraction holds.
The behavior of linked lists (in a dynamically typed language) can be
defined as follows:
1. Linked lists are ordered collections of zero or more items. There is no
limit on the size of the collection.
2. Items may be of any data type, including other linked lists, etc.
3. Under insertion and deletion, the order of the (existing) elements
remains the same. Insertion and deletion require O(1) time given a
suitable offset in the ordering.
4. Lookup may require O(n) time.
5. Although some pointer-based linked-lists implementations allow more
general data structures to be built out of what appear to be linked
lists (ie. circular "lists" in Lisp/Scheme, cons-stream-based
"inifinite" lists in Scheme), I am _not_ requiring linked list
implementations to support such structures.
However, given a suitable offset 'i' in the collection, a sublist can be
created in O(1) time that fully specifies a new linked list containing
all of the items whose offsets are >= i and less than n (the number of
items in the original list). Let us call this new structure a "sublist"
and the source list the "original list".
6. sublists must share their data with the original list: if an element in
a sublist is deleted, the deletion must be reflected in the original; if
an element is modified, again it must be reflected in the original; if a
a new element is inserted into a sublist anywhere except before the
head, its insertion must likewise be made into the original (and the
remainder of the list stay the same).
One can argue that this sublist behavior is a poor choice of semantics
for data sharing in collections, but this is not the point of the
discussion (unfortunately).
Does Tcl currently provide linked lists? (eg. in its "list" type)
----------------------------------------------------------------------
No. Tcl "list"s are best described by their implementation, which is as
finite arrays of characters whose length is stored alongside the data
structure. List elements are those items demarked by continuous
non-whitespace characters in the list (or sublists demarked by
balanced curly braces). Although the strings are fixed in length, the
Tcl implementation tends to overallocate storage space, and (at least)
doubles the allocated space when the current space fills, leading to
exponential space usage at the tradeoff of exponentially fewer realloc's,
assuming the programs are filling list space at a subexponential rate.(*)
Of the checklist, Tcl violates three of the properties:
1. Insertion and deletion require O(1) time given a suitable offset in the
ordering.
FALSE - to insert into a Tcl list requires copying the items further
down the list, whose total work is proportional to the insertion
location and the list length. Furthermore, linsert and lreplace are
functional- they do not modify the original list; in fact, I'm not sure
how to modify the middle of a list without re-assigning the base list
(eg. [set mylist [linsert $mylist ...]])
note: such a limitation only seems bizarre until you ask how a
list-as-string implementation can destructively modify a mid-list
element _in_place_ without disturbing the other items. Unless you get
Really Lucky(tm), and the replacement element(s) are exactly the same
total string length of the element(s) they are replacing, then you have
to perform a lot of copying anyway.
2. Given a suitable offset 'i' in the collection, a sublist can be
created in O(1) time that fully specifies a new linked list containing
all of the items whose offsets are >= i and less than n (the number of
items in the original list). Let us call this new structure a "sublist"
and the source list the "original list".
FALSE - A sublist in Tcl requires O(n) time to create, where n is
related to the size of the desired sublist, the string lengths of the
elements in it, including any nested lists. Again, this is because Tcl
requires lists to store the _string_ data _values_ of their contents.
3. sublists must share their data with the original list: if an element in
a sublist is deleted, the deletion must be reflected in the original; if
an element is modified, again it must be reflected in the original; if a
a new element is inserted into a sublist anywhere except before the
head, its insertion must likewise be made into the original (and the
remainder of the list stay the same). Such shared updates should happen
in O(1) time once the insertion has been made in the sublist.
FALSE - sublists in Tcl do not share data with the original list.
Although creative minds might ponder using variable traces to implement
rules, and use this to update the lists which share the sublist, in
practice such solutions involve nonlinear amounts of time, primarily
due to (a) the inability for Tcl to destructively modify elements in place
and (b) the O(n) time required to look up an element in a Tcl list.
(*) Note: Since Tcl lists can only contain string values, such an assumption
is not obvious- for example, Steve Chen here at Berkeley wrote a fractal
tree growing program in Tcl/Tk that held the current tree in a Tcl list.
Since fractalized trees grow at an exponential rate if you store them
entirely as values (and not a pointer references to common
subexpressions), his program quickly bogged down in both space and time.
Can Tcl lists be made to act like linked lists?
----------------------------------------------------------------------
Not really, although you could try.
The first step would be to make a C implementation of linked lists and map
it on top of the existing list functions. For each function, you'd
parse the given "list" store it as C-style linked lists. A sublist
would then be a new wrapper with a "head" pointer to the middle of the
original's linked list.
Second, so that this structure doesn't leak memory, you'd need to add a
garbage collector to Tcl, which is fine- the Boehm-Weiser collector is
proving to be an efficient, practical mechanism [ZORN92]. However, it
should be noted that the collector would have to be modified to not look
for list references in the general heap, lest it bog down in scanning
overhead.
Third, you'd need to deal with the case where a user wants to convert
from a linked list into a normal "list", which sounds easy until you
consider dollar-sign substitution. If dollar-sign substitution
causes linked lists to be converted to normal lists, then you can't pass
them to functions except by name (ie. using upvar), because if you
passed the "value", then the sharing properties would be lost in transit.
Fundamentally, there is no obvious way to represent linked lists as
strings when items can themselves be references (say, to themselves).
The obvious hack would be to represent the links by their addresses, as
converted to string form- but then you couldn't use dollar sign
substitution to get the "value" of a list. At the end, I advocate a
similar scheme- but do understand that this falls under the category of
"extension hacks" and not something anyone would add to the base language.
If you are willing to severely restrict how users could use your "linked
lists", I'm pretty sure that some or all of these difficulties can be
surmounted. However, adding crippled features to programming languages
is generally frowned upon by users- they get a "feature" that they have
to use with great care, and which doesn't buy them very much additional
power.
Do linked lists make sense in the context of Tcl?
----------------------------------------------------------------------
Linked lists are a little strange in the context of Tcl. Tcl is what I
call a "value" language- users manipulate only values, and never worry
about anything like "pointers". Sure, it's not like you see C-style
"pointers" in Scheme, but you still have to worry about references to
shared data. For example, in
(define mylist (list 1 2 3 4 5 6 7 8 9 0))
(define mylist2 (cddr mylist))
(set-cdr! mylist2 '())
(display mylist) ; prints "(1 2 3)"
the update to mylist2 causes the transparent update of mylist. I
realize that such semantics can be useful, but remember that pointers,
even Pascal- and Scheme- style pointers, are among the trickiest aspects
of these language for new users to learn. For advanced, programmers, of
course, such semantics are very handy; one could argue that Tcl isn't
designed for newbies, but for real apps developers, in which case the
lack of references in the language is a flaw.
In some cases, it is possible to implement your data structure in C and
export it into Tcl as a set of method calls that manipulate values
keyed on the input object "name" (as "lappend" currently does for Tcl
lists). For example, for data structures requiring mid-list
insertions (one case where no clever usage of arrays can mask O(n)
insertion time), I would endorse a Tcl extension that added such linked
lists; the difference between this and the linked lists discussed above
are their sharing properties, which necessitate garbage collection, and
which make linked lists difficult to pass to functions or represent as
strings.
It doesn't make sense to push Tcl to do something that it wasn't intended
to do- even John admits that Tcl wasn't designed to write 100,000 line
scripts, and seems to disclaim its value against the user who insists on
abusing the tool. Part of these self-imposed limits is that Tcl does not
support complex data structures, and its lacking of linked lists is the
very least of its "problems" in this regard.
Don't try to teach a pig to sing; you'll frustrate yourself and annoy the pig.
References
--------------------
[ZORN92] Benjamin Zorn. "The Measured Cost of Conservative Garbage
Collection." Univ. of Colorado at Boulder Technical Report #573-92
ftp://cs.colorado.edu:/pub/techreports/zorn/CU-CS-573-92.ps.Z
-- Thanks again, -A.Sah'94 ...Adam Sah...asah@cs.Berkeley.EDU... "Things are going, well..."