Introduction to DITA Conditional Processing

One of DITA's primary strengths is combining discrete data chunks into cohesive documents. But it also excels at the other end of the spectrum - separating data chunks when necessary. This feature, called conditional processing, allows you to produce separate documents for different products, platforms, audiences, and more, all from the same input. This article introduces you to conditional processing and its control mechanism, metadata.

By Dave Gash

What Is DITA?

Just kidding! Every DITA-related article in the world seems to start with this section, whether it's needed or not. I'm pretty sure that if you don't know what DITA is, you aren't even reading this article. Movin' on.

DITA Metadata

Say that five times fast.

Let's first consider DITA's basic build process. A build file collects information from a ditamap file, which in turn references a group of topic files. The build file also locates a set of XSL transforms appropriate to the requested output type, and sends all this along to the DITA Open Toolkit, which collects the topics, applies the transforms, and produces the output.

That's fine when we want all the content in all the referenced topics to be included in the output, but what if we don't want all of the content? That's where conditional processing comes in, the goal being to intelligently control which topics or parts thereof end up in the output. This control is achieved using metadata.

Metadata, often called "data about data", is a characteristic or trait that helps identify, clarify, or classify an informational element. For example, an HTML paragraph tag might read <p class="dropcap">. Here, the <p> tag is the data and the class="dropcap" attribute/value pair is the metadata; it classifies the type of paragraph (a CSS class in this case) so it can be processed correctly. Or, in an XML document, a tag might read <cost currency="aud"> . Again, the <cost> tag is the data, and the currency="aud" attribute/value pair is the metadata; it specifies that the cost element should be taken as Australian dollars. Metadata is often coded as attributes, as in these examples, but not always. More about that later.

Metadata has various uses, such as workflow support, searching assistance, and index preparation, but is really good at one thing - conditional processing.The primary function of conditional processing is omitting undesired content, or "filtering." DITA provides four standard attributes to control filtering: audience, product, platform, and rev. It also provides a fifth attribute you can use to specify other properties, reasonably (if uncreatively) called otherprops. Using these attributes, you can classify everything from individual elements to entire topic groups, applying appropriate metadata to the objects to drive the filtering process.

The big benefit in terms of editing and maintenance is that mutually exclusive content elements don't have to be stored separately; you can put them all together in a single topic or map, and leave out the pieces you don't need at build time. This technique prepares the content so it can be conditionally processed, while simplifying maintenance by keeping logically related items physically together in a single source location. It's a great way to cram a lot of stuff into a small space - sort of like the Kardashian sisters.

Put 'Er There

There are three standard places you can put metadata: on individual elements, within topics, and on map references.

Element metadata is used at the tag level to apply properties by which the elements can be identified and filtered during the build. Let's say we want to customize the first step in a task by user experience level. We could use the audience attribute to attach the appropriate metadata to three versions of the same task, like this:

<step audience="novice">Plug in your PC.</step>
<step audience="intermediate">Turn on your PC.</step>
<step audience="advanced">Boot up your PC.</step>

Using this markup, we can easily produce a task topic with steps tailored to the specific audience we're trying to reach, regardless of PC expertise. (An additional version, <step audience="doofus">Box up your PC and take it back to the store.</step> may be included if necessary.)

Prolog metadata is used in topics to specify characteristics with which the topic can be filtered. If we wanted to produce a review document containing all topics written by a given content provider, we could use the <author> prolog element to identify the topic's authors, like this:

<prolog>
<author>Otto Palindrome</author>
<author>Anna Graham</author>
</prolog>

The topic can now be identified by author and filtered appropriately during the build. Note that this metadata is coded as tags, not attributes as shown in the element metadata example.

Map metadata is used at the top of the metadata food chain to apply filtering characteristics to whole topics or topic groups within maps. We could, for example, construct a single map that allows us to produce a user guide for any of several product releases by adding rev metadata attributes to the topic references, like this:

<map title="User Guide" id="userguide">
<topicref href="inst-demo.dita" rev="demo"/>
<topicref href="inst-std.dita" rev="1.x"/>
<topicref href="inst-upd.dita" rev="2.x"/>
...
</map>

We're now able to select the correct installation topic (or a set of correct topics, regardless of number or hierarchical placement) for any current product release, from the demo version to 1.x to 2.x, without creating - and maintaining - separate map files.

Testing, 1 2 3...

Here's a great joke: "What do you call a musician with no girlfriend?" [crickets chirping] Wait, that's not funny, you say, and you're right. But why is it not funny? Because it's just a setup with no punchline. In comedy, tech pubs, and most other worthwhile human endeavors, preparation is useless unless you deliver the kicker - and that's the problem with our examples so far.

Identifying unique elements, topics, and maps and applying metadata to differentiate them is only half the job. Metadata itself doesn't do anything; it just sits there patiently waiting until it's needed. To make it useful, we have to tell the build process what to do with it; that is, we have to define the filtering conditions for the build.

The ditaval file is the mechanism we use for that purpose. Like the map file and the XSL transforms, the ditaval file is read by the build and used to drive the filtering process as the output stream is created. The ditaval file essentially contains two things: conditions to be matched and actions to be taken when they're found.

Ditaval conditions are defined with the <prop> (property) element, which has three attributes: att , the metadata attribute to search for; val , the metadata attribute value to match; and action , the action to be taken when the metadata attribute value is matched. Think of it rather like a CSS rule: look for elements that contain the metadata attribute att ; if you find one, see if its value is equal to val ; if so, perform the specified action . You can include as many <prop> elements as you like, in any order; much like CSS and XSLT, it's a wonderful demonstration of declarative processing at work. Let's look at some examples.

Earlier, we added the audience attribute as element metadata to some task steps (and presumably to other elements, topics, and topic references in our content repository). Now, if we want to produce a user guide for novices, we might code conditions in the ditaval file like this:

<val>
<prop att="audience" val="intermediate" action="exclude" />
<prop att="audience" val="advanced" action="exclude" />
...
</val>

These conditions allow the novice audience elements through while filtering out the intermediate and advanced audience elements.

Next, we added the <author> element as prolog metadata to some topics, naming two contributing authors. If we want to produce a review document containing only those topics written by a single author, we can do it by excluding the other with a ditaval condition, like this:

 <prop att="author" val="Anna Graham" action="exclude" />

This will filter out Anna's topics and leave us with a complete listing of topics written or contributed to by her colleague Otto.

Finally, we added the rev attribute to some topic references in a ditamap, identifying installation topics for demo, 1.x, and 2.x software versions. When we're ready to produce an installation guide for the 2.x version, we can code ditaval conditions to exclude the others, like this:

 <prop att="rev" val="demo" action="exclude" />
<prop att="rev" val="1.x" action="exclude" />

The result will be our desired document, an installation guide for the 2.x product only, with the demo and 1.x topics filtered out. Thus, the ditaval file's <prop> element becomes the killer punchline for the clever metadata setup.

Which reminds me: "Homeless."

A Hippo in the Ointment

Now if you're ahead of me on this, and you probably are, you'll note that these examples seem to approach the document assembly process somewhat, well, backward. We don't include the elements we want, we exclude the ones we don't want. Odd as it seems, that's exactly how the "exclude" action works. Gosh, wouldn't it be nice if there were also an "include" action? Well, there is... sort of.

The pre-V1.4 DITA Open Toolkit only offered the "exclude" action (and "flag", which is beyond the scope of this article); however, as of OT V1.4, a new "include" action became available. But - if I may use a phrase with which I'm painfully familiar - "it isn't what it looks like!" For example, you'd think that the single <prop> tag below is equivalent to the two immediately above, including just 2.x content and excluding demo and 1.x content:

<prop att="rev" val="2.x" action="include" />

But you'd be wrong. Yes, given that tag, the 2.x topics will be included, but so will the demo and 1.x topics. That's because the default action for all elements, marked or unmarked, is "include." Let's say that again, because it's hugely important: the default action for all elements is always "include." Since that's the case, you might be wondering if you could at least add that third <prop> tag to the first two, just to make your intentions clear. Yes, but that's just like calling in your vote for American Idol - you can do it, but it won't make any difference.

The reason "include" doesn't work quite as intuitively as we'd like is because its primary use is for elements with multiple metadata values in the same attribute. The filtering logic for multiple values can get sticky pretty fast, so let's leave that for another article. Bottom line, "include" doesn't really do us any good in ordinary, everyday filtering, but that's really not a bad thing; read on.

For now, we can safely say there is just one absolute, immutable rule for ditaval conditions. This rule is true regardless of your DITA OT version, authoring tool, or processing environment. It's true for all maps, topic references, full topics, and individual elements, whether marked with metadata or not. It's true all the time, for all builds, in all cases. The rule is this:

Everything not explicitly excluded is included.

At first blush this rule seems restrictive, but in practical terms it greatly simplifies the process of marking up content for conditional processing. We can now approach our content with a simple plan: add metadata to anything we might want to exclude later and leave everything else alone! Because the vast majority of content in any documentation set is included in most output formats (if not, you're doing it wrong), it's obviously easier to mark up some content you want to exclude under certain circumstances than to mark up all the content you want to include under most circumstances. Sweet.

Loose Ends

But as you might guess, that's not quite everything. You can almost hear that fellow with the glass eye, cigar butt, and rumpled trenchcoat say, "There's just one more thing." (That's an old-guy joke; if you don't get it, you're too young to remember most of the stuff we old guys think is funny. Now get off our lawn.)

We know that we add metadata to DITA elements and that we add <prop> conditions to a ditaval file so the build can properly filter the elements. But there's our missing connection: how does the build process know where our ditaval conditions are? The answer is simple, if inelegant. We tell it where they are.

A build file contains a number of <property> tags (not to be confused with <prop> tags in the ditaval file), that provide the build process with required information, such as the input file location, the output file location, the output translate type, and so on. To specify the location of the ditaval file containing the filtering conditions, we just add one more <property> tag to the build file, like this:

 <property name="dita.input.valfile"
  value="${basedir}$
  {fileseparator}myprojects
  {fileseparator}UserGuide
  {fileseparator}userguide.ditaval"/>

This tag tells the build that the ditaval conditions file "dita.input.valfile" is named "userguide.ditaval" and should be found in the "myprojects\UserGuide" folder under the DITA base directory, "C:\DITAOT\" for example. The build can now load the filtering conditions from the ditaval file and apply them to the metadata attached to the various project elements.

Finally, although this article has included actual code snippets, I realize that hand-coding is so five minutes ago. Most good authoring tools now include user-friendly interfaces to the nuts and bolts of metadata, build conditions, and file locations, so that setting up and implementing conditional processing is relatively easy.

Summary

DITA is a brilliant implementation of structured authoring, incorporating single-sourcing, content sharing and reuse, and conditional processing as core technological elements. Conditional processing is at the heart of content specificity, and metadata is its control mechanism. Grasping the relationship between metadata and filtering is one of the "aha!" experiences we have along the road from linear narrative to structured authoring, a little epiphany that suddenly propels us forward in our efforts to get the most benefit from technology and makes our authoring jobs a bit easier, a lot more productive, and yes, even fun.