So you want to write procedures in XML...

This article examines the things to consider when making the decision to move from "traditional" narrative-based authoring (in rich text or HTML) to structured XML-based authoring using DITA or DocBook.

By Tony Self

Introduction

So you realise that XML is what you should be writing your new procedure manual in, but how do you go forward from this realisation? Should you be writing in DITA? Or perhaps DocBook? Or isn't XHTML a type of XML? And what tool should you be using? If AuthorIT support DITA, should you use that? Or should you bite the bullet and spend big dollars on Arbortext Epic. Or maybe you should be at the other end of the tool spectrum, and start with Notepad?

With this number of questions, the inertia to overcome in even starting an XML authoring project are great, and the learning curve is going to be steep.

Which Schema?

It is fair to say that no XML schema is better than any other. The best schema will be the one that is most appropriate to the content. So your first analysis task will be to think carefully about the content itself - not the formatting, or how you intend to deliver it - and try to picture it in a tree structure.

There are currently two dominant XML schemas designed for documentation:

Darwin Information Typing Architecture (DITA), and
DocBook

The DITA format is most appropriate to structures where there are discrete blobs of information bound together by a common theme. If you picture your content as a non-linear collection of related topics, then DITA is for you (in this instance). The DocBook format is most appropriate to lengthier texts, where the blobs of information are sequential. If you picture your content as one large document made up of chapters and sections, then DocBook is for you (in this instance).

DITA is better suited to for single-sourcing approaches, where you need to produce a Help system and a PDF user guide, for example.

If you can't make up your mind which schema is appropriate, then flip a coin. DITA and DocBook are quite similar, and it is possible to automatically transform content in one format to another with reasonable fidelity.

Note: Fidelity is a term describing the degree to which the integrity of the structure is maintained when transforming from one XML application to another. For example, transformations from XHTML to DITA have low fidelity, as the XHTML structure is too primitive to retain the structure meta-information. What used to be a <concept><title> in DITA is reduced to a <h1> in XHTML.

Over time, as you become more familiar with both schemas, it will become easier for you to identify which is the right one for your content.

What about XHTML?

It is true that XHTML is an XML application, so techically speaking, you can very easily convert your legacy HTML documents to XML by running them through an HTML to XHTML converter such as HTML Tidy. While XHTML is a much preferable format to the obsolete HTML format, it is not really an XML documentation format. It is a generic, document styling format. In other words, XHTML is cheating - it's not really what we mean when we talk about moving documentation to XML. (By the way, there's another way of cheating, and that is to move HTML into a generic DITA topic. But more on that later...)

Knowing the Schema

Once you have chosen a schema for your content, you have also chosen a path of further learning. In order to be productive in your chosen schema, you must become intimately familiar with that schema.

For example, DITA concept topics have dozens of elements, including author, note, index term, and image. Let's pick the first one, note. You can probably guess what the <note> tag is used for: marking up information differentiated from the main text, which expands on or calls attention to a particular point. But what if the note is more a warning than a note. In DITA, a <note> element can be a note, tip, attention, caution, danger, fastpath, important, or remember type. To work with DITA, you have to know that there are different note types, and understand the difference. (For example, it mighth be useful to know what a "fastpath" note is!) By comparison, DocBook also has a note element, but it is a container for a note title and note paragraphs.

Fortunately, there are good references for both DITA and DocBook, in a number of formats, including CHM, incidentally!

Cheating with DITA Topics

DITA provides a cheat's way of importing HTML content, through the generic "topic" topic type. In DITA terminology, this topic type is "untyped"; it is a generic catch all for topics that don't fit into another context-specific topic type. The elements in "topic" topics are quite bland, and include tags inherited from HTML, such as <body>, <p>, <ul>, and can be easily "mapped" to HTML equivalents.

AuthorIT's DITA support is currently limited to "topic" topic types, and consequently does not support the more powerful structuring capabilities of XML. No doubt this support will improve over time.

We won't mention DITA "topic" topic types again, as we will concentrate on the context-specific DITA topic types of "concept", "task" and "reference".

Specialisations in DITA

If you choose DITA, there's one further complication, and that's something called "specialisation". Specialisation allows DITA elements to be customised for specific purposes.

For example, DITA concept topics can be broken into section elements (<section>). If you have a need to specify for a second type of section, maybe called a <division>, you can modify the schema to add a new <division> element, and model it on the <section> element. If you write your own transformers, you can then treat the <division> element differently from the standard <section> element. But if you are using a stock-standard DITA transformer, the <division> will not be treated as an error, but will be processed as if it were a <section> tag.

Choosing a Tool

The choice of XML authoring tool will depend upon resources available to you. There are two important resources involved here:

money, and
technical expertise.

Money

There are many authoring tools on the market, with varying degrees of functionality, and with an enormous variation in price. Some tools are free, and other tools may cost tens of thousands of dollars. And even if money is no object, the best tool for you isn't necessarily the most expensive. However, having an appropriate budget will make it easier to choose wisely.

Technical Expertise

XML authoring tools fit into the categories of text editors, WYSIWYG editors, and content management systems. Text editors require the user to be very familiar with XML tagging concepts, and to be comfortable working in a code-based environment. WYSIWYG editors (although WYSIWYG is not really an accurate term, because in XML, what you get depends on what you need to see!) are closer to conventional authoring tools such as Framemaker, RoboHelp or Word. Content management systems attempt to take the XML out of the picture, and provide a form-filling, online authoring environment. Some content management systems, such as TEXTML Server, focus on file management and publishing, and don't have an inbuilt document editor; they rely on third party editors for maintaining the content.

If you have the technical expertise and orientation to work with text editors, then all the good. If you don't, then you may need to find a technical support resource in your organisation, or abandon the idea of working with text editors. Some WYSIWYG and content management systems may also require the backup of people with the expertise to keep you on the right track. Most of the tools on the market don't have associated training courses yet, so be prepared to invest some time teaching yourself the tool. If this type of learning doesn't agree with you, then finding a tool with training support should be your objective.

XML Authoring Tool Options

The XML authoring tools (suitable for documentation) on the market include:

Epic Arbortext
Structured FrameMaker
Syntext Serna
Oxygen
Blast Radius XMetaL
Cladonia Exchanger XML Editor
xmlBlueprint
xmlMind

There are many more XML editing tools that are more focussed on manipulating data stored in XML format.(Remember, XML is a set of languages for storing all sorts of human knowledge, not just text!)

As the concept of writing procedures in XML is fairly new, the tools are largely immature. So don't expect rock-solid, reliable, slick and intuitive software programs! Expect a fair deal of frustration, trouble-shooting and "hacking".

If you don't want to dive right into the deep end, and have the luxury of time to learn and experiment, a good approach is to obtain a copy of Serna (if you think DITA might be best for your project) or xmlMind (for DocBook).

The Structure of a Document

You may have never thought specifically about the nitty gritty detail of a document's structure, but now's the time to do it. A typical document contains quite a bit of metadata, and a lot more content. The metadata is information about the content, such as the author, the publication date, the title, or perhaps the price. The content is often divided into sections and chapters, and a chunk of content normally contains a heading, some paragraphs, some lists, a warning, maybe an illustration, and perhaps a table. At a granular level, a paragraph might contain the name of a control, an action that a user must perform, a system response, a technical term, and a shortcut key. Think for a moment about text that is normally bolded or italicised. Why is that text highlighted? It's not because it looked good that way, but because you wanted code blocks to look a certain way, and likewise for window names, or a buttons or menu items. When working in XML, you must think of what is special about a blob of text you want to highlight, rather than thinking about the formatting.

Both DITA and DocBook have tags intended for identifying such text. DITA has, for example, <cmdname>, <codeblock>, <cmd>, <keyword> and <option>. DocBook has <prompt>, <property>, <mousebutton> and <guimenuitem>.

The table of contents (TOC) is a piece of metadata that is obvious but not often thought of as metadata. Similarly, index keywords, and the index itself, are more pieces of document metadata. The TOC is usually generated from the structure of the document, while the Index is generated from the index keywords.

Planning a DITA Document

The basic unit of a DITA document is the topic. There are three topic types:

Concept
Reference
Task

The DITA documentation describes a "Concept" topic as one that answers the question "what is?" Concepts provide background information that users must know before they can successfully work with a product or interface. Often, a concept is an extended definition of a major abstraction such as a process or function. It might also have an example or a graphic, but generally the structure of a concept is fairly simple.

A "Reference" topic documents programming constructs or facts about a product. They are intended to provide quick access to facts, but no explanation of concepts or procedures. Reference topics are typically organised into one or more sections, property lists, and tables. The reference topic type provides general rules that apply to all kinds of reference information, using elements like <refsyn> for syntax or signatures, and <properties> for lists of properties and values.

A "Task" topic is the main building block for task-oriented user assistance, and is one that answers the question "how do I?". Task topics generally provide step-by-step instructions that will enable a user to perform a task by telling the user precisely what to do and the order in which to do it.

All topics have the same high-level structure, with a title, prolog, short description and body. But the body of each different topic type has different elements (and some common elements too).

With information stored at the topic level, there's a missing piece of the puzzle. If you have 100 individual topics, what brings them together into a "book"? The answer is the DITA map file, which is a simple list of the names and hierarchy of the topics to be included in a collection of topics. This idea suits re-use of topics for different purposes. One map file might specify the topics to be included in the new user guide, while another describes a different set of topics for the advanced user guide.

So when planning your DITA document (or suite of documents), you might find it easier to start with a list of all the topics you want to cover, and then categorise those topics into "concept", "reference" and "task" types.

Planning a DocBook Document

In the big picture, DocBook is similar in structure to a traditional manual. There is typically just one large DocBook XML file per manual, and it follows a linear sequence. Your planning might therefore be similar to how you currently plan a manual, such as by creating a document skeleton.

Docbook book

<book>
<bookinfo>
 <title>My First Book</title>
 <author><firstname>Jane</firstname><surname>Doe</surname></author>
 <copyright><year>1998</year><holder>Jane Doe</holder></copyright>
</bookinfo>
<preface><title>Foreword</title> ... </preface>
<chapter> ... </chapter>
<chapter> ... </chapter>
<chapter> ... </chapter>
<appendix> ... </appendix>
<appendix> ... </appendix>
<index> ... </index>
</book>

ddd