Searching and Crawling

Share on Twitter
Dec 31st 2008
One Comment
respond
trackback

When I talk to my non-library friends about the federated search technology so many libraries are starting to use they all say, oh, I get it, it’s just like Google. It makes perfect sense to them.

But then I have to explain that it’s not at all like Google. As most everyone knows, Google sends out “robots” or “spiders” to “crawl” the web, looking at every web page, document, and image they can find, then reporting back to headquarters where a super-sophisticated program indexes all the words on all the pages and figures out how to rank each page for quality and relevance.

Federated search systems sold by library vendors don’t work at all that way. They don’t do anything ahead of time. Instead, they wait for someone to type in a search, then they translate the search into the simplest terms they can and send the search out to a variety of different, discrete sources such as commercial databases, open access repositories, and library catalogs. Sometimes the results stream back in and sometimes they just trickle. Some sources don’t send back any results at all. It’s no wonder then that ranking the results in terms of relevance isn’t something these products even attempt.

So my question is this: Why can’t the database vendors, from whom we license so much content, allow us to crawl their data and index it ourselves? If we could index the data ahead of time, analyze it, and rank it for relevance, we would have something much more useful to show to library users. I’d love to be able to say, when explaining our federated search product, “It’s just like Google, but you always get high quality results vouched for by major publishing companies.”

  • Share/Bookmark

This post is tagged

This post was written by Eric Hinsdale

One Response

  1. Sol Lederman says:

    Thank you for your effort in educating people about the difference between crawling and federated search. I am the primary author of the Federated Search Blog and I run into lots of people who don’t know the difference between the two. To help in the education effort,, I wrote a primer on federated search which has gotten good reviews:

    http://federatedsearchblog.com/2009/01/14/federated-search-primer-at-altsearchengines/

    The blog also has many articles that cover many aspects of federated search.

Leave a Reply

Categories