SIGIA-L Mail Archives Subscribe/Unsubscribe | Home


Printer-Friendly Version


SIGIA-L Mail Archives: Re: SIGIA-L: Popularity in Relevance Ran

Re: SIGIA-L: Popularity in Relevance Ranking Was: understanding metadata

From: Avi Rappoport (analyst_at_searchtools.com)
Date: Wed Sep 05 2001 - 14:42:34 EDT


If Google was as unsophisticated as you say, it would not return very
good results, but it does! Here is a summary of the PageRank
algorithm:

PageRank relies on the uniquely democratic nature of the web by using
its vast link structure as an indicator of an individual page's
value. In essence, Google interprets a link from page A to page B as
a vote, by page A, for page B. But, Google looks at more than the
sheer volume of votes, or links a page receives; it also analyzes the
page that casts the vote. Votes cast by pages that are themselves
"important" weigh more heavily and help to make other pages
"important."

For more technical information on the PageRank algorithm, see the
original article at
<http://www-db.stanford.edu/~backrub/pageranksub.ps> (PostScript) or
this HTMLized PowerPoint presentation:
<http://hci.stanford.edu/~page/papers/pagerank/> and the related
articles at CiteSeer:
<http://citeseer.nj.nec.com/page98pagerank.html>.

Interestingly, this algorithm works best on the whole web or a big
subsection, such as a country. It doesn't work as well within a more
confined space such as a single web site or an Intranet, because
there is less scope to gather definitive pages on specific topics. I
guess if the Intranet is large enough...

Avi

At 10:43 AM -0700 9/5/01, Laura Norvig wrote:
>At 12:13 PM +1000 8/31/01, Rosenberg, Maryanne wrote:
>>As I understand it Google uses the indexing principles used in Science
>>Citation Indexing whereby the "value" of an item is increased by the number
>>of times it is referenced by a "peer". I personally see this wrought with
>>problems in the Internet search environment, providing only limited quality
>>information on any given subject.
>
>This approach has it's good points and it's bad points. Clearly, it
>gets around the problem of web developers who used to try to force
>pages to the top of results by endlessly repeating keywords in
>metadata tags or in "hidden" portions of the pages themselves. And,
>by bringing popular sites to the top of the results, users do get
>less frustrated because they can find *something* relevant quicker.
>
>Now for the bad. A scary kind of dilution to common denominators. If
>I want a good burger, I'm not going to rely on choosing the most
>popular burger place in the whole country. We all know what
>restaurant that would be and how the burger would taste. Yuk. So,
>the popularity approach can sometimes give false legitimacy to
>mediocre information. This exact thing happens with citation
>frequency in the context of scholarly papers. Authors feel they
>*must* cite a popular article that has been cited by everyone else
>who wrote on the topic, even if the article is not necessarily all
>that great. If they *don't* cite it, people will think they haven't
>done their homework or haven't read this "seminal" article. This
>artificially jacks up the original article's popularity even further.
>

-- 
________________________________________________________________
Search Server Industry Analysis from Search Tools Consulting
    (510) 845-2551  -- <mailto: analyst_at_searchtools.com>
Complete Guide to Search Engines for Web Sites, Intranets,
       and Portals: <http://www.searchtools.com>



This archive was generated by hypermail 2.1.2 : Sun Nov 23 2003 - 22:54:48 EST

 


www.info-arch.org
| www.asis.org/SIG/SIGIA

Subscribe/Unsubscribe | Home