The Difference Between Indexing and Searching

Table:

The word 'supposed' is one of many that has changed in meaning over time. (It used to mean 'accepted as true', but now tends to mean 'thought to be true', although it is commonly used sarcastically.)

The word 'index' has suffered a spectacular change of meaning in the space of a few years, and this change of meaning has been brought about by ignorance. The change also illustrates the problems of entrusting too much faith in technology. Communication is a human activity, not a technological exercise.

The Macquarie Dictionary still defines the noun 'index' as:

" a detailed alphabetical key to names, places, and topics in a book with reference to their page number. "

Microsoft, on the other hand, defines an index as a computer- generated list of words within a collection of documents, where each word is mapped back to its document. Further, Microsoft claims the indexing process requires no human input.

If fact, the 'index' that Microsoft defines used to be called a 'concordance'.

But how significant is this blurring of meaning? If you are a technical communicator, or a professional indexer, the change is extremely important. The change marks a shift away from traditional writing skills to a reliance on technology. However, a traditional index is something quite different to an electronic 'concordance'.

Let's look at it a different way. There are two distinct methods of finding content in electronic text:

author-defined keyword (traditional) indexes, and
text searching.

The traditional index method involves the author defining key words or phrases, and linking those keywords to the topics or pages for which they are relevant. Traditional indexing is labour-intensive, and requires skills in indexing - predicting what a reader may be looking for, and choosing what topic to present to the user.

There is a professional association for indexers in Australia: the Australian Society of Indexers (AUSSI). That organisation defines indexing as 'the provision of locators which make it as easy as possible for someone to find what they are looking for in a large collection of information'. Further, AUSSI makes the point that indexing usually involves some kind of semantic analysis: that is, the indexer determines the meaning of the material in the collection, and finds ways to summarise and represent this meaning in an easy-to-use form which is linked to the original information.

Microsoft's Index Server software, like all search engines, does no semantic analysis. It makes no attempt to determine the meaning of the content. It just creates a catalogue of all the words, and provides an interface to allow the user to locate words (or combinations of words) in the collection of pages. Human indexers bring human reasoning and intelligence to an index. Software cannot (yet) do anything more than provide an un-intelligent list of words.

The attraction of search engines is that the 'indexing' process is entirely automatic. It is therefore cheap to implement. But if it provides an inferior product to a human-crafted index, then the economy of cheap indexing may be nothing more than an illusion.

In 2001, IDC published a whitepaper called 'The High Cost of Not Finding Information'. It highlighted the problem of information being as good as invisible if it cannot be located. The software response to that problem seems to be to provide more comprehensive word lists. Search engines now collect words from HTML documents, PDF files, PowerPoint slides, Word documents, spreadsheets, and even video and other multimedia content! And the words can be translated into other languages to make an even bigger catalogue! Surely this serves only to add more hay to the stack where the poor user is trying to find the needle!

One of the biggest and most commonly repeated mistakes of the Information Age is to fail to learn from the past. Indexes have served readers for hundreds of years, helping people find the information they are looking for. So when the huge volume of Web publishing (both Internet and intranet) makes it even more difficult to find information, shouldn't we be looking at creating better indexes, not inferior (but cheap!) indexes?

The solution to the problem of location information is a combination of reducing the amount of useless content and improving the classification and indexing of the useful content.

IDC's The High Cost of Not Finding Information whitepaper can be read in PDF format at http://www.inktomi.com/pdfs/whitepapers/Search_IDC.pdf

Meta Tags: HTML itself does not include any index tags. The convention is to use <META> tags, which are tags that appear in the code but are not displayed to the user. There is a convention of using a keywords Meta tag to nominate index keys. Some search engines have the ability to search through these Meta keywords instead of searching through the entire body of the HTML files.

The Difference Between Indexing and Searching

Collected links

Links

Training