RE: [Sigia-l] RE: Data vs. Information (LONG)

From: Jonathan Broad (
Date: Fri Jan 10 2003 - 00:20:01 EST

Hello all,

Ah...I finally have both a copy of Wittgenstein's "Philosophical
Investigations" at hand and a bit of free time. Sorry for the length,
but I've been too busy to drop my .02 in here and there.

So now you get .50. :)

Anyone still following this thread will appreciate this choice passage
from the "Investigations" (para. 106):

"Here it is difficult as it were to keep our heads up,--to see that we
must stick to the subjects of our everyday thinking, and not go astray
and imagine that we have to describe extreme subtleties which in turn we
are after all quite unable to describe with the means at our disposal.
We feel as if we had to repair a torn spiderweb with our fingers."

This is in the midst of a great description of how the root of much
"philosophy" (by which he usually means confusion) stems from thinking
that words have "essential meanings" that transcend their simple
utility. ("data", "knowledge", "information"...)

Which isn't of course to say that this SIGIA-L discussion is pointless,
because we're really exploring why we have two different words for what
seems to be the same "thing"--informata, datamation. Why two? What's
the use of that? Why not one, or three?

I'd like to suggest that this...confusion...stems simply from the
ambiguity between the ideas of "signifier" and "signified" that has been
thoroughly explored during the twentieth century by the decendants of
linguist Ferdinand Saussure. (If you have trouble sleeping--go for it.)

We should agree that data and information, as we're talking about it,
both refer to "signs" of some sort, either intentional (created by
humans for a reason) or accidental.

I'm just suggesting that it makes sense to characterize signs as "data"
when we are interested in their properties as signifiers, and
"information" when we use them to learn about what they signify.

The word *apple*. Qua signifier, I note that it has five letters. Qua
signified, I'm reminded that I'm on the Atkins diet and would sorely
like to eat one. One word? No silly--one *apple*!

I think this is more or less in keeping with the definition we've
cobbled together so far, with some critiques:

> "Data is something a computer understands, which may or may not be readable
> by someone who doesn't know binary/octal/hex/etc.

Computers deal *exclusively* in data because they don't know how to
dereference signs. The magical idea that all this sound waves coming
out of our mouths "actually" refer to constellations of "real-world"
facts and feelings would never occur to a computer.

They deal with signs entirely on the basis of their "virtual"
characteristics (can't really say physical), aided by their programing
and facility with encodings and math. They can't see past this. When
they can, quickly find a man named John Connor and a bomb shelter.

But I don't think you can say "machine-readability" is a definition of
data, because it ignores the fact that just about every utterance can be
treated as data. "Everything readable by a computer is data" doesn't
mean "All data is readable by computer". It just means that data is
what computers were built to handle.

> Information is something
> humans understand, whether it comes from the Internet, a newspaper, your
> cousin Joe, or a Donald Rumsfeld press briefing on CSPAN. The usefulness of
> the information is determined by context and subjectivity. Information that
> a comet will collide with Earth in three hours and wipe out all life forms
> is probably useful for every human. Information that a man in Gabon was
> slightly injured when he was hit by a car is probably useful to the man's
> family, the driver, and possibly some Gabonian lawyers, but not to the rest
> of us who read about it."

The usefulness of signs is indeed necessary, because information becomes
informative when used. But so does data, doesn't it? So human use isn't
the right way to distinguish between information and data. So it must
just be that data is used in a different way.

For example: A bunch of scientific measurements (say, mouse body temps),
recorded in a table.

Is a single datum like "Body temperature of x at time y" information?
No, because we don't have enough context to make it informative. A
hypothesis guided the data gathering, and determined what measurements
to take--what "metadata" to collect. In that context, we see that the
information we seek isn't in the individual datum at all, but how the
aggregated data validates our hypothesis.

But what if your doctor says, after the same kind of measurement: "You
have a temperature of 101.3F."

*Now* we're talking about important information. Because we're not
concerned with the structure of the signification or aggregating it with
similar signs but with *what it refers to*. The signfied. In this case,
your body.

So data is very informative. If that sounds funny, I think it's just
because the vast majority of the signs we deal with are informational.
We only treat signs as data in particular situations. When computers
supercede us and take over the world, they would probably say that
"signification is very dataful".

So "in short", data and information refer to the same thing (signs or
whatever it pleases you to call them), but describe two modes of that
thing's use. One use being a fairly special case, but which people who
work with IT have to deal with quite often.

I know I'm already pushing the bandwidth limit here, but I'd like to
close with another favorite Wittgenstein quote (which says it all

" 'After he had said this, he left her as he did the day before.'--Do I
understand this sentence? Do I understand it just as I should if I
heard it in the course of a narrative? If it were set down in isolation
I should say, I don't know what it's about. But all the same I should
know how this sentence might be used; I could myself invent a context
for it.
(A multitude of familiar paths lead off from these words in every

"Philosophical Investigations", para. 525

Jonathan Broad

