Can you explain that again from the beginning? What is DITA?

DITA is a difficult thing to explain to the uninitiated. It is difficult to explain because it we expect it to be a product or a technology, when it is actually a standard and a methodology. DITA provides an approach to technical writing that embraces best practice ideals such as modularity, single-sourcing, and content re-use. The reasons for moving to DITA are business-focussed.

By Tony Self

Modular documents are efficient

Wouldn't it be good if you could write documentation in components, and then build those components into different documents depending on requirements? Such a "modular document" approach can be applied effectively in many different scenarios. The approach is very efficient, because you only have to write and maintain a piece of information once. If you make a change in one component, the change flows through to every document that uses that component. The term "re-use" is sometimes used when describing this feature.

Modular documentation is not a new idea. It was even used before computerisation, in the "Typewriter Age", within a writing methodology know as "STOP: Sequential Thematic Organisation of Publications". The "STOP" methodology called the document components "topical units of discourse"; we now usually refer to those components as "topics".

Help systems have always been built upon the concept of topics, and as Help Authoring Tools became more sophisticated, modular document features were progressively introduced. The World Wide Web is also built around the concept of topics, and the unrestricted ability to link from one topic to another means that the Web also embraces the idea of modularity.

However, in the parallel universe of print-based documentation, modularity has not been as accepted. Document formats such as Microsoft Word's are based on the document being the primary unit, not the topic.

DITA is a methodology which includes a document format, and it is designed specifically for modular documents. In other words, DITA makes modularity really simple for all types of document delivery methods, including Web, Help and print-based.

DITA has two main types of information structures: "topics" (which we understand) and "maps". Maps are simple specifications for a document, listing the topics that make up the document in the order and hierarchy in which they are to appear.

"Information typing" means more usable documents

A number of writing methodologies favour the idea of segmenting information based on its nature (and purpose). The underlying theory is that people read manuals to satisfy specific needs. In some cases, they might need to find out how to do something. In other cases, they might need to see how something works. In other cases, they might need to look up a code to enter. Rarely will someone open a manual because they want something to read.

Satisfying a reader's particular need can be achieved by separating the "how to" information from the "how it works" information from the "pure facts" information. In the Information Mapping approach developed in the 1960s by Robert Horn, there were a handful of information types, including principle, process, procedure, concept, and structure. Years later, Microsoft was using seven information types in its documentation, comprising conceptual, FAQ, glossary, procedural, reference, troubleshooting, and tutorial.

In the DITA approach, there are three "base information types": task ( "how to"), concept ( "how it works"), and reference ( "pure facts"). Perhaps surprisingly, content of most manuals and Help systems fit easily into those three simple categories. However, when those three simple types are not appropriate for the content, DITA allows for the "evolution" of new information types. If you have nothing better to do, you could create new information types yourself, but in most cases, new types are created within industries or areas of interest.

Information typing in DITA also guides you towards consistent content that embraces best practice technical writing techniques. This is made possible by the application of rules in a document. For example, when documenting a task, you have to include at least one step. If you don't include a step, the topic won't save! This enforcement of writing rules is in turn made possible by the fact that DITA is an XML-based document format, and XML was designed for this. The term used in XML for enforcing document rules is "validation".

Producing quality documentation within a DITA approach still relies heavily on your skills as an author; information types and "validation" make it easier for you to get it right every time.

Single-sourcing through semantic mark-up

The term "single-sourcing" means different things to different people. Fundamentally, most would agree that it means using the same source content to produce different deliverable products. It is an extension of the idea of modular documents to include different delivery modes; not only can the same content appear in different publications, it can also appear within entirely different media. An instruction might appear within a printed user guide, within a Help topic, on a Web page, and in an ePub. It could appear in the second level of a quick start guide, and in the fifth level of an administrator's guide.

For single-sourcing to be simple, content can't be marked up with formatting instructions. It's no good marking a topic title as Heading 2 if it might need to be marked in a Heading 4 in a different publication. Text can't be marked in 12 point if it might end up appearing on a mobile phone screen where 12 point is too large. DITA bypasses this potential roadblock to effective single-sourcing through "semantic mark-up". Instead of marking up text based on how it should look, you mark up text based on what sort of text it is. Titles are marked up as titles. Pre-requisites are marked up as pre-requisites. Steps are marked up as steps. Warnings are marked up as warnings. File names are marked up as file names.

Semantic mark-up allows the separation of content and form. The form (or style) is added, based on rules that map semantic mark-up with presentational styles, during the publishing process. The publishing process is automated... it is pretty much a one click process, once the publishing mapping rules for the organisation have been created. Want a PDF? Click PDF. Want an ePub? Click ePub. Want Eclipse Help? Click Eclipse Help.

One of the challenges for technical communicators is changing focus from form to content. It may sound easy, but it is quite a transition to move from style-based authoring to semantic authoring. The benefits are many. By automating the formatting process, you can spend more time on the words and phrasing, rather than on the fonts, alignment and numbering! This leads to better writing quality, and more consistent presentation.

DITA is community-owned

Are you sold on DITA? It will save you time, help you produce better quality documents, free up more time to spend on writing, and make your professional life easier! So where do you go to buy DITA?

This is where we need to have another shift in thinking. DITA is a standard, not a product. And it's an open source standard. That means that DITA is free: free-as-in-freedom, and free-as-in-beer. Open standards are created and maintained by a community, rather than by a corporation. DITA is "owned", if that's the right word, by the technical writing community. The standard is managed through a not-for-profit standard body called OASIS, and is guided by a group of volunteers on the OASIS DITA Technical Committee.

To adopt DITA, we need to find authoring tools that support the DITA standard. Because the DITA standard is open, you can choose from dozens of authoring tools, including FrameMaker, Arbortext, XMetaL, oXygen, Serna, XXE, DITA Storm, Xopus, and many others. You can even switch from editor to editor, mid-topic if you like! But too many choices can be confusing, particularly to the newcomer. And although DITA is free, commercial DITA authoring tools are not. They can vary in price from less than USD100 to more than USD1000.

The separation of content and form in DITA has generally led to different types of tools for authoring and publishing. Rather than choose one tool for your DITA workflow, you might need to choose two or three.

Re-use, re-use, re-use

One of my favourite pieces of DITA jargon is "WOOO: Write Once and Once Only". Nearly all features in DITA aim to reduce your workload, and one way this is done is by eliminating repetitive work. Once you have written a particular phrase, block or topic (whether that be a product or company name, warning, set of steps, topic, or chapter), you should never have to write it ever again. DITA has plenty of mechanisms for content re-use, many with exciting names such as "transclusion" and "indirection". (The unfamiliar terms disguise the fact that these features are clever in their simplicity.) You might have encountered "variables" in your current authoring environment... you can think of DITA's re-use features as "variables on steroids"!

Any type of DITA content fragment can be re-used. Paragraphs can be re-used, notes can be re-used, phrases can be re-used, terms can be re-used, maps can be re-used, index terms can be re-used, and whole topics can be re-used. This means that the idea of modular documentation can be extended way beyond the simple re-use of topics in different publications, to re-using anything that would otherwise have to be re-typed or copied. Re-use makes it much easier to keep content up-to-date, because you only have to make any change once.

You might need a CCMS, whatever that is...

Once you embrace the modular document, heavy duty single-sourcing, and re-use approaches that are integral to DITA, managing your content can become a challenge. How do you know if someone else in your team has already written a similar topic? How do you know where the product name variables are stored? How do you know which author in your team wrote a particular topic? How do you know in which publications a particular topic appears?

That management challenge can be addressed with a software tool; in this case a "Component Content Management System". You may not have seen that extra "C" in front of "CMS" before, but it means a type of CMS that can work with modular document components.

DITA may seem complex

DITA is often said to be complicated, and too complex. The current DITA standard has over 500 semantic elements... how can you be expected to remember what they are all for? DITA is different in many ways to earlier documentation approaches. That difference is a barrier to adoption.

To take advantage of DITA, you need to re-think the way you've been approaching documentation. You need to understand the principles of the separation of content and form, and be prepared to let go of the "form" part! You need to write within supra-organisational standards, and embrace the ideals of open source. You need to let go of a one-tool-fits-all philosophy, and work with a set of tools appropriate to you. You need to learn the purpose of a small number of semantic mark-up elements (nowhere near 500, by the way), and when to apply them. You need to see a documentation project as part of a library, rather than as an individual publication. You need to work at a smaller level of granularity, and understand how that allows re-use to make your life easier.

Why, you might even need to learn a bit about XML, but that really depends on what tools you choose and whether XML interests you.

Some say that DITA is restrictive, because it is full of rules and standards and validation; DITA stifles creativity, they say. I think it's almost the opposite. Think about haiku poetry. It's full of rules about syllable weight, phrases and meter. Does anyone ever say that haiku stifles creativity. Like haiku, DITA promotes creativity.

What is DITA?

DITA is a methodology and an open standard, built on XML, and maintained by the technical writing community. It makes it possible to apply technical writing best practices such as modularity, single-sourcing, and content re-use, primarily through the separation of content and form. DITA allows the publishing process to be automated, reducing the author's workload. Although the DITA standard is free, authoring and publishing tools are commercial. You may need to use different tools to work in DITA. It may seem complicated at first, but when the ideas behind it start to click in your mind, it suddenly becomes simpler. Finally, when used as designed, DITA results in better quality writing, at a lower cost.

Oh, I forgot to mention one thing. DITA stands for "Darwin Information Typing Architecture". But you didn't really need to know that!