So you want to write procedures in XML...
This article examines the things to consider when making the
decision to move from "traditional" narrative-based authoring (in rich text
or HTML) to structured XML-based authoring using DITA or
DocBook.
By
Tony Self
Introduction
So you realise that XML is what
you should be writing your new procedure manual in, but how do you go
forward from this realisation? Should you be writing in DITA? Or perhaps
DocBook? Or isn't XHTML a type of XML? And what tool should you be using?
If AuthorIT support DITA, should you use that? Or should you bite the
bullet and spend big dollars on Arbortext Epic. Or maybe you should be at
the other end of the tool spectrum, and start with Notepad?
With
this number of questions, the inertia to overcome in even starting an XML
authoring project are great, and the learning curve is going to be
steep.
Which Schema?
It is fair to say that no XML
schema is better than any other. The best schema will be the one that is
most appropriate to the content. So your first analysis task will be to
think carefully about the content itself - not the formatting, or how you
intend to deliver it - and try to picture it in a tree structure.
There are currently two dominant XML schemas designed for
documentation:
- Darwin Information Typing Architecture (DITA), and
- DocBook
The DITA format is most appropriate to structures where there
are discrete blobs of information bound together by a common theme. If you
picture your content as a non-linear collection of related topics, then
DITA is for you (in this instance). The DocBook format is most appropriate
to lengthier texts, where the blobs of information are sequential. If you
picture your content as one large document made up of chapters and
sections, then DocBook is for you (in this instance).
DITA is
better suited to for single-sourcing approaches, where you need to produce
a Help system and a PDF user guide, for example.
If you can't make
up your mind which schema is appropriate, then flip a coin. DITA and
DocBook are quite similar, and it is possible to automatically transform
content in one format to another with reasonable fidelity.
Note: Fidelity is a term describing the degree to which the
integrity of the structure is maintained when transforming from one XML
application to another. For example, transformations from XHTML to DITA
have low fidelity, as the XHTML structure is too primitive to retain the
structure meta-information. What used to be a <concept><title>
in DITA is reduced to a <h1> in XHTML.
Over time, as you
become more familiar with both schemas, it will become easier for you to
identify which is the right one for your content.
What about XHTML?
It is true that XHTML is an
XML application, so techically speaking, you can very easily convert your
legacy HTML documents to XML by running them through an HTML to XHTML
converter such as HTML Tidy. While XHTML is a much preferable format to
the obsolete HTML format, it is not really an XML documentation format. It
is a generic, document styling format. In other words, XHTML is cheating -
it's not really what we mean when we talk about moving documentation to
XML. (By the way, there's another way of cheating, and that is to move
HTML into a generic DITA topic. But more on that later...)
Knowing the Schema
Once you have chosen a
schema for your content, you have also chosen a path of further learning.
In order to be productive in your chosen schema, you must become
intimately familiar with that schema.
For example, DITA concept
topics have dozens of elements, including author, note, index term, and
image. Let's pick the first one, note. You can probably guess what the
<note> tag is used for: marking up information differentiated from
the main text, which expands on or calls attention to a particular point.
But what if the note is more a warning than a note. In DITA, a
<note> element can be a note, tip, attention, caution, danger,
fastpath, important, or remember type. To work with DITA, you have to know
that there are different note types, and understand the difference. (For
example, it mighth be useful to know what a
"fastpath" note
is!) By comparison, DocBook also has a note element, but it is a container
for a note title and note paragraphs.
Fortunately, there are good
references for both DITA and DocBook, in a number of formats, including
CHM, incidentally!
Cheating with DITA Topics
DITA provides a
cheat's way of importing HTML content, through the generic
"topic" topic type. In DITA terminology, this topic type is
"untyped"; it is a generic catch all for topics that don't fit
into another context-specific topic type. The elements in
"topic" topics are quite bland, and include tags inherited from
HTML, such as <body>, <p>, <ul>, and can be easily
"mapped" to HTML equivalents.
AuthorIT's DITA support is currently
limited to
"topic" topic types, and consequently does not
support the more powerful structuring capabilities of XML. No doubt this
support will improve over time.
We won't mention DITA
"topic" topic types again, as we will concentrate on the
context-specific DITA topic types of
"concept",
"task" and
"reference".
Specialisations in DITA
If you choose DITA,
there's one further complication, and that's something called
"specialisation". Specialisation allows DITA elements to be
customised for specific purposes.
For example, DITA concept topics
can be broken into section elements (<section>). If you have a need
to specify for a second type of section, maybe called a <division>,
you can modify the schema to add a new <division> element, and model
it on the <section> element. If you write your own transformers, you
can then treat the <division> element differently from the standard
<section> element. But if you are using a stock-standard DITA
transformer, the <division> will not be treated as an error, but
will be processed as if it were a <section> tag.
Choosing a Tool
The choice of XML authoring
tool will depend upon resources available to you. There are two important
resources involved here:
- money, and
- technical expertise.
Money
There are many
authoring tools on the market, with varying degrees of functionality, and
with an enormous variation in price. Some tools are free, and other tools
may cost tens of thousands of dollars. And even if money is no object, the
best tool for you isn't necessarily the most expensive. However, having an
appropriate budget will make it easier to choose wisely.
Technical Expertise
XML
authoring tools fit into the categories of text editors, WYSIWYG editors,
and content management systems. Text editors require the user to be very
familiar with XML tagging concepts, and to be comfortable working in a
code-based environment. WYSIWYG editors (although WYSIWYG is not really an
accurate term, because in XML, what you get depends on what you need to
see!) are closer to conventional authoring tools such as Framemaker,
RoboHelp or Word. Content management systems attempt to take the XML out
of the picture, and provide a form-filling, online authoring environment.
Some content management systems, such as TEXTML Server, focus on file
management and publishing, and don't have an inbuilt document editor; they
rely on third party editors for maintaining the content.
If you
have the technical expertise and orientation to work with text editors,
then all the good. If you don't, then you may need to find a technical
support resource in your organisation, or abandon the idea of working with
text editors. Some WYSIWYG and content management systems may also require
the backup of people with the expertise to keep you on the right track.
Most of the tools on the market don't have associated training courses
yet, so be prepared to invest some time teaching yourself the tool. If
this type of learning doesn't agree with you, then finding a tool with
training support should be your objective.
XML Authoring Tool Options
The XML authoring
tools (suitable for documentation) on the market include:
- Epic Arbortext
- Structured FrameMaker
- Syntext Serna
- Oxygen
- Blast Radius XMetaL
- Cladonia Exchanger XML Editor
- xmlBlueprint
- xmlMind
There are many more XML editing tools that are more focussed on
manipulating data stored in XML format.(Remember, XML is a set of
languages for storing all sorts of human knowledge, not just text!)
As the concept of writing procedures in XML is fairly new, the tools
are largely immature. So don't expect rock-solid, reliable, slick and
intuitive software programs! Expect a fair deal of frustration,
trouble-shooting and "hacking".
If you don't want to dive right
into the deep end, and have the luxury of time to learn and experiment, a
good approach is to obtain a copy of Serna (if you think DITA might be
best for your project) or xmlMind (for DocBook).
The Structure of a Document
You may have never
thought specifically about the nitty gritty detail of a document's
structure, but now's the time to do it. A typical document contains quite
a bit of metadata, and a lot more content. The metadata is information
about the content, such as the author, the publication date, the title, or
perhaps the price. The content is often divided into sections and
chapters, and a chunk of content normally contains a heading, some
paragraphs, some lists, a warning, maybe an illustration, and perhaps a
table. At a granular level, a paragraph might contain the name of a
control, an action that a user must perform, a system response, a
technical term, and a shortcut key. Think for a moment about text that is
normally bolded or italicised. Why is that text highlighted? It's not
because it looked good that way, but because you wanted code blocks to
look a certain way, and likewise for window names, or a buttons or menu
items. When working in XML, you must think of what is special about a blob
of text you want to highlight, rather than thinking about the
formatting.
Both DITA and DocBook have tags intended for
identifying such text. DITA has, for example, <cmdname>,
<codeblock>, <cmd>, <keyword> and <option>.
DocBook has <prompt>, <property>, <mousebutton> and
<guimenuitem>.
The table of contents (TOC) is a piece of
metadata that is obvious but not often thought of as metadata. Similarly,
index keywords, and the index itself, are more pieces of document
metadata. The TOC is usually generated from the structure of the document,
while the Index is generated from the index keywords.
Planning a DITA Document
The basic unit of a
DITA document is the topic. There are three topic types:
The DITA documentation describes a
"Concept" topic
as one that answers the question "what is?" Concepts provide background
information that users must know before they can successfully work with a
product or interface. Often, a concept is an extended definition of a
major abstraction such as a process or function. It might also have an
example or a graphic, but generally the structure of a concept is fairly
simple.
A
"Reference" topic documents programming
constructs or facts about a product. They are intended to provide quick
access to facts, but no explanation of concepts or procedures. Reference
topics are typically organised into one or more sections, property lists,
and tables. The reference topic type provides general rules that apply to
all kinds of reference information, using elements like <refsyn> for
syntax or signatures, and <properties> for lists of properties and
values.
A
"Task" topic is the main building block for
task-oriented user assistance, and is one that answers the question "how
do I?". Task topics generally provide step-by-step instructions that will
enable a user to perform a task by telling the user precisely what to do
and the order in which to do it.
All topics have the same
high-level structure, with a title, prolog, short description and body.
But the body of each different topic type has different elements (and some
common elements too).
With information stored at the topic level,
there's a missing piece of the puzzle. If you have 100 individual topics,
what brings them together into a "book"? The answer is the DITA map file,
which is a simple list of the names and hierarchy of the topics to be
included in a collection of topics. This idea suits re-use of topics for
different purposes. One map file might specify the topics to be included
in the new user guide, while another describes a different set of topics
for the advanced user guide.
So when planning your DITA document
(or suite of documents), you might find it easier to start with a list of
all the topics you want to cover, and then categorise those topics into
"concept",
"reference" and
"task" types.
Planning a DocBook Document
In the big picture,
DocBook is similar in structure to a traditional manual. There is
typically just one large DocBook XML file per manual, and it follows a
linear sequence. Your planning might therefore be similar to how you
currently plan a manual, such as by creating a document skeleton.
Docbook book
<book>
<bookinfo>
<title>My First Book</title>
<author><firstname>Jane</firstname><surname>Doe</surname></author>
<copyright><year>1998</year><holder>Jane Doe</holder></copyright>
</bookinfo>
<preface><title>Foreword</title> ... </preface>
<chapter> ... </chapter>
<chapter> ... </chapter>
<chapter> ... </chapter>
<appendix> ... </appendix>
<appendix> ... </appendix>
<index> ... </index>
</book>
ddd
Software Reviews