Gilb’s Trap

I remember reading project management literature in the early ’90s, and really enjoying the works of Tom Gilb. The thing that sticks out the most in my mind is Gilb’s Law:

Anything you need to quantify can be measured in some way that is superior to not measuring it at all. Programmers are fond of saying “you can’t really measure that,” and Gilb’s law says that’s just a cop-outit might be hard to measure, and the measurements might not be 100% accurate, but anything is better than nothing!

It’s hard to argue with that (and I don’t really intend to). But like so many simple, absolute statements, it hides a lot of messy complexity. So much complexity that you (or your project) can fall into it and drown. There are three big traps lying behind Gilb’s law.

  1. Difficulty equates to cost. People have to work to get those measurements, the procedures may hurt productivity, etc. And if the measurements are of low quality, do they really offset the cost? Gilb’s law says that measurement is possible, and it’s always better than not measuringall else being equal. But the cost of the measurement may nullify its benefit.
  2. Whether you call it Heisenberg’s Uncertainty Principle or the Hawthorne Effect, the result is the same: measurement nearly always has secondary effects. And they may or may not be desirable. Are the people on your team optimizing for the measurement? Is that going to help or hurt your real goal?
  3. Finally: few people deal well with ambiguous numbers. No matter how many times you say that the numbers have a margin of error, or may reflect many other factors, most people look at measurements and behave as if they’re absolute.

Gilb’s law is a law, but when it hits the complexities of the real world, the result can be messy. Use it carefully.

Helplessness

Today, NASA is thinking it probably wasn’t a problem with the insulating tiles on Columbiaor at least, not one that was caused by the main tank debris that hit the shuttle on liftoff.

Nevertheless, the most disturbing thing to me about the whole affair is learning that, even if they knew there was a serious problem with the tiles, there would be nothing they could do. In such a case, I suppose there’s always the chance of a rescue mission by one of the other shuttles or by a Russian craft. But you can also think of dozens of reasons why such a thing wouldn’t be able to happen in time. So NASA says they didn’t even do an EVA to check the tiles, presumably on the assumption that it would be better not to know. It’s hard to argue with that, assuming that they really couldn’t do anything about it.

I don’t guess I’m really surprised by this. Working in vacuum and microgravity is difficult, and putting those tiles on is notoriously tricky under the best circumstances. I know they use a special adhesive, and who knows whether that would adhere or dry properly in vacuum, or in the cold? And maybe they have to be applied under high pressure. But even though I’m not surprised, it’s very troubling to think about that situation: there’s something badly wrong, and it seems trivial (missing tiles!), but the crew will die, and there’s nothing they or anyone else can do.

It points to the next big challenges of spaceflight. Somehow we have to have a cheaper, simpler way out of the gravity well, so that we can have ships that are simple enough to be repaired in space. And we need to work on technologies that make it easier to work in microgravity and vacuum: lighter, less constricting and more flexible spacesuits, as well as thrusters or other tools that make it easier to get around, and ways to gain leverage in the absence of weight and friction.

(Oh, and I’ll second what Rael said.)

Inside the mind of Jason Hunter

Jason’s too-hip AIM icon and iChat combined to give me a good laugh yesterday:

Intellectual Accuracy

I’ve been gradually coming to understand the impact of the Eldred decision, and it’s been fascinating to read Lessig’s blog during the past week or so. He points to a great piece by Doc Searls arguing that many people completely miss the point because they think of copyright as a property right. And in calling for a return to the original 14-year copyright term, The Economist makes the related point that the originators of the idea of copyright didn’t see it as a property right at all.

In this country, at least, calling copyright a property right creates some strange contradictions. In the U.S., property rights are nearly sacred, and can’t be violated by the government except in very limited circumstances, and then only on a case-by-case basis for specific items. The idea of limited terms for intellectual property rights simply doesn’t fit well into the overall view of property rights.

I’m tilting at windmills, I suppose, but I’m going to stop using the term “intellectual property.” (I’m curious, by the way, about the origin of that term and how it came into common use.) For the moment, until I hear a better alternative, I think I’ll call it “creative output” instead.

Data Is

(via my O’Reilly blog)

I know that “data” is technically the plural of “datum”. But I find it jarring when I read that “the data are transmitted” somewhere. In common usage (both speech and informal writing) that data “is transmitted.”

It’s not that “data” is singular; it’s more like a nonspecific collective noun, like “air”. It has come to meanand I’m going to really massacre the language here, just to emphasize the distinction I’m trying to make“some datums”. We say “the data is corrupt” in the same way we say “the air is polluted.”

At this point you may be thinking I’m just upset that the dictionary doesn’t agree with the way I do things. For the record, though, I’m a careful speaker and writer who usually argues for the rules people have forgotten rather than the common, often sloppy usage. This time I think the change in usage has happened for good reasons.

One reason, I think, is that “datum” is so rarely a useful word. I’m not sure why, but we rarely need to distinguish between singular and plural with respect to data; it’s almost never important to talk about a single datum.

A related reason is that it’s unclear what constitutes a datum. Is it always a bit? Or some larger group of data? (See how slippery it is? Is it reasonable to say that a datum is composed of a group of smaller data?)

My “air” analogy illustrates that problem quite well. Is a molecule of oxygen also an “air molecule”? Air is a mixture, so identifying the smallest unit of air is a tricky thing.

There are contexts, perhaps, where data are discrete and well structured so that the distinction makes sense. But in most cases, data is complex, with an almost fractal structure, and the line between data and datum is almost impossible to draw. (This paragraph is a test, by the way. Which of those sentences seemed most natural to you?)

I think it’s time to acknowledge that the old rule, in this case, is obsolete. Circumstance and usage has turned “data” into a collective, singular noun. It refers to “some data”—and in the tradition of computer science, “some” can mean “zero or more”. “Datum” can still be useful on the rare occasions where you need to emphasize a singular unit that can’t be described as a bit, byte, octet, scalar, etc.

Update: a respondent, “gojomo”, points out that the correct linguistic term for the common usage of “data” is “mass noun”. Other examples of mass nouns include water, blood, light, money, and cheese.

Update 2 (2017-05-15): I recently re-watched Guy Steele’s brilliant talk from 1998, Growing a Language (transcript here). In that talk, Steele defines “data” like this:

A datum is a set of bits that has a meaning; data is the mass noun for a set of datums.

So in 1998, it was accepted usage (accepted by Guy Steele, anyway, which is good enough for me) to treat “data” as a mass noun.

subscribe via RSS or JSON Feed