The word 'supposed' is one of many that has changed in meaning over time. (It
used to mean 'accepted as true', but now tends to mean 'thought to be true',
although it is commonly used sarcastically.)
The word 'index' has suffered a spectacular change of meaning in the space of
a few years, and this change of meaning has been brought about by ignorance. The
change also illustrates the problems of entrusting too much faith in technology.
Communication is a human activity, not a technological exercise.
The Macquarie Dictionary still defines the noun 'index' as:
"
a detailed alphabetical key to names, places, and topics in a book with
reference to their page number.
"
Microsoft, on the other hand, defines an index as a computer- generated list
of words within a collection of documents, where each word is mapped back to its
document. Further, Microsoft claims the indexing process requires no human
input.
If fact, the 'index' that Microsoft defines used to be called a
'concordance'.
But how significant is this blurring of meaning? If you are a technical
communicator, or a professional indexer, the change is extremely important. The
change marks a shift away from traditional writing skills to a reliance on
technology. However, a traditional index is something quite different to an
electronic 'concordance'.
Let's look at it a different way. There are two distinct methods of finding
content in electronic text:
- author-defined keyword (traditional) indexes, and
- text searching.
The traditional index method involves the author defining key words or
phrases, and linking those keywords to the topics or pages for which they are
relevant. Traditional indexing is labour-intensive, and requires skills in
indexing - predicting what a reader may be looking for, and choosing what topic
to present to the user.
There is a professional association for indexers in Australia: the
Australian
Society of Indexers
(AUSSI). That organisation defines indexing as 'the
provision of locators which make it as easy as possible for someone to find what
they are looking for in a large collection of information'. Further, AUSSI makes
the point that indexing usually involves some kind of semantic analysis:
that is, the indexer determines the meaning of the material in the collection,
and finds ways to summarise and represent this meaning in an easy-to-use form
which is linked to the original information.
Microsoft's
Index
Server
software, like all search engines, does no semantic analysis. It
makes no attempt to determine the meaning of the content. It just creates a
catalogue of all the words, and provides an interface to allow the user to
locate words (or combinations of words) in the collection of pages. Human
indexers bring human reasoning and intelligence to an index. Software cannot
(yet) do anything more than provide an un-intelligent list of words.
The attraction of search engines is that the 'indexing' process is entirely
automatic. It is therefore cheap to implement. But if it provides an inferior
product to a human-crafted index, then the economy of cheap indexing may be
nothing more than an illusion.
In 2001, IDC published a whitepaper called 'The High Cost of Not Finding
Information'. It highlighted the problem of information being as good as
invisible if it cannot be located. The software response to that problem seems
to be to provide more comprehensive word lists. Search engines now collect words
from HTML documents, PDF files, PowerPoint slides, Word documents, spreadsheets,
and even video and other multimedia content! And the words can be translated
into other languages to make an even bigger catalogue! Surely this serves only
to add more hay to the stack where the poor user is trying to find the needle!
One of the biggest and most commonly repeated mistakes of the Information Age
is to fail to learn from the past. Indexes have served readers for hundreds of
years, helping people find the information they are looking for. So when the
huge volume of Web publishing (both Internet and intranet) makes it even more
difficult to find information, shouldn't we be looking at creating better
indexes, not inferior (but cheap!) indexes?
The solution to the problem of location information is a combination of
reducing the amount of useless content and improving the classification and
indexing of the useful content.
IDC's The High Cost of Not Finding Information
whitepaper can be read in PDF format at http://www.inktomi.com/pdfs/whitepapers/Search_IDC.pdf
Meta Tags:
HTML itself does not include any index tags. The convention is
to use <META> tags, which are tags that appear in the code but are not
displayed to the user. There is a convention of using a keywords Meta tag
to nominate index keys. Some search engines have the ability to search through
these Meta keywords instead of searching through the entire body of the HTML
files.
|