SIGIA-L Mail Archives: Re: [Sigia-l] Information Visualization
Re: [Sigia-l] Information Visualization
From: Karl Fast (karl.fast_at_pobox.com)
Date: Tue Nov 11 2003 - 10:51:45 EST
> Likewise, IV has a few technical barriers, chief among them is
> scalability.....Unfortunately, IV feedback on large data sets is
> impossible now and will remain so in the near future.
Here is a counter example to the scaling impossibilities.
A recent paper in IPM (Information Processing & Management)
describes a visualization system that generates author co-citation
maps in real-time.
- the database has 1.26 million citation records supplied by ISI
(i.e.: Web of Science, the world's largest scientific citation
database).
- maps are generated using either kohonen feature maps or
pathfinder networks, the two most common algorithms used for
visualizing co-citation networks
- it is all done in real-time and works over the web. It's not
public (you need a password), but I have used it and it spits
back new maps as fast as Google returns results.
- I have heard that this paper actually represents work completed
two years ago....and they're way beyond this now
Certainly scaling is a challenge in many cases, but it's been
vanquished in others.
Xia Lin pioneered the use of self-organizing Kohonen maps in
information science. In his JASIS paper ('97, I think) he required a
Cray to generate maps for a few hundred documents. Now he's dealing
with over a million records. That's a huge leap forward.
For those who want the gory details:
Lin, X., White, Howard D., & Buzydlowski J. (2003). Real-time author
co-citation mapping for online searching, Information Processing &
Management, 39, 689-706.
Author searching is traditionally based on the matching of name
strings. Special characteristics of authors as personal names and
subject indicators are not considered. This makes it di cult to
identifya set of related authors or to group authors bysub jects
in retrieval systems. In this paper, we describe the design and
implementation of a prototype visualization system to enhance
author searching. The system, called AuthorLink, is based on
author co-citation analysis and visualization mapping algorithms
such as Kohonen's feature maps and Pathinder networks. AuthorLink
produces interactive author maps in real time from a database of
1.26 million records supplied bythe Institute for Scienti c
Information. The maps show subject groupings and more ne-grained
intellectual connections among authors. Through the interactive
interface the user can take advantage of such information to re ne
queries and retrieve documents through point-and-click
manipulation of the authors names.
> I've written on this here before, but bandwidth, database access and
> app server latency are insurmountable barriers when serving IV in
> large numbers, with no solution in sight.
It's not so insurmountable. At least not in this case.
That doesn't mean that the scalability problem has been universally
licked, but the problem isn't universally intractable either.
--karl
------------
When replying, please *trim your post* as much as possible.
*Plain text, please; NO Attachments
Searchable list archive: http://www.info-arch.org/lists/sigia-l/
________________________________________
Sigia-l mailing list -- post to: Sigia-l_at_asis.org
Changes to subscription: http://mail.asis.org/mailman/listinfo/sigia-l
This archive was generated by hypermail 2.1.2
: Sun Nov 30 2003 - 02:24:57 EST
|