Introduction to DITA Conditional Processing
One of DITA's primary strengths is combining discrete data chunks
into cohesive documents. But it also excels at the other end of the spectrum -
separating data chunks when necessary. This feature, called conditional
processing, allows you to produce separate documents for different products,
platforms, audiences, and more, all from the same input. This article
introduces you to conditional processing and its control mechanism, metadata.
By
Dave Gash
What Is DITA?
Just kidding! Every DITA-related article in the world seems to start
with this section, whether it's needed or not. I'm pretty sure that if you
don't know what DITA is, you aren't even reading this article. Movin' on.
DITA Metadata
Say that five times fast.
Let's first consider DITA's basic build process.
A build file collects information from a ditamap file,
which in turn references a group of topic files. The build file also locates a
set of XSL transforms appropriate to the requested output type, and sends all
this along to the DITA Open Toolkit, which collects the topics, applies the
transforms, and produces the output.
That's fine when we want all the content in all the referenced topics
to be included in the output, but what if we don't want all of the content?
That's where conditional processing comes in, the goal being to intelligently
control which topics or parts thereof end up in the output. This control is
achieved using
metadata.
Metadata, often called
"data about data", is a characteristic or trait that helps
identify, clarify, or classify an informational element. For example, an HTML
paragraph tag might read
<p class="dropcap">.
Here, the
<p> tag is the data and the
class="dropcap" attribute/value pair is the metadata;
it classifies the type of paragraph (a CSS class in this case) so it can be
processed correctly. Or, in an XML document, a tag might read
<cost currency="aud"> . Again, the
<cost> tag is the data, and the
currency="aud" attribute/value pair
is the metadata; it specifies that the cost element
should be taken as Australian dollars. Metadata is often coded as attributes,
as in these examples, but not always. More about that later.
Metadata has various uses, such as workflow support, searching
assistance, and index preparation, but is really good at one thing -
conditional processing.The primary function of conditional processing is
omitting undesired content, or "filtering." DITA provides four standard
attributes to control filtering:
audience,
product,
platform, and
rev. It also provides a fifth attribute you can use to specify
other properties, reasonably (if uncreatively) called
otherprops. Using these attributes, you can classify everything
from individual elements to entire topic groups, applying appropriate metadata
to the objects to drive the filtering process.
The big benefit in terms of editing and maintenance is that mutually
exclusive content elements don't have to be stored separately; you can put them
all together in a single topic or map, and leave out the pieces you don't need
at build time. This technique prepares the content so it can be conditionally
processed, while simplifying maintenance by keeping logically related items
physically together in a single source location. It's a great way to cram a lot
of stuff into a small space - sort of like the Kardashian sisters.
Put 'Er There
There are three standard places you can put metadata: on individual
elements, within topics, and on map references.
Element metadata is used at the tag level to apply properties by
which the elements can be identified and filtered during the build. Let's say
we want to customize the first step in a task by user experience level. We
could use the
audience attribute to attach the appropriate metadata
to three versions of the same task, like this:
<step audience="novice">Plug in your PC.</step>
<step audience="intermediate">Turn on your PC.</step>
<step audience="advanced">Boot up your PC.</step>
Using this markup, we can easily produce a task topic with steps
tailored to the specific audience we're trying to reach, regardless of PC
expertise. (An additional version,
<step audience="doofus">Box up your PC and take it back
to the store.</step> may be included if necessary.)
Prolog metadata is used in topics to specify characteristics
with which the topic can be filtered. If we wanted to produce a review document
containing all topics written by a given content provider, we could use the
<author> prolog element to identify the topic's
authors, like this:
<prolog>
<author>Otto Palindrome</author>
<author>Anna Graham</author>
</prolog>
The topic can now be identified by author and filtered appropriately
during the build. Note that this metadata is coded as tags, not attributes as
shown in the element metadata example.
Map metadata is used at the top of the metadata food chain to
apply filtering characteristics to whole topics or topic groups within maps. We
could, for example, construct a single map that allows us to produce a user
guide for any of several product releases by adding rev metadata attributes to
the topic references, like this:
<map title="User Guide" id="userguide">
<topicref href="inst-demo.dita" rev="demo"/>
<topicref href="inst-std.dita" rev="1.x"/>
<topicref href="inst-upd.dita" rev="2.x"/>
...
</map>
We're now able to select the correct installation topic (or a set of
correct topics, regardless of number or hierarchical placement) for any current
product release, from the demo version to 1.x to 2.x, without creating - and
maintaining - separate map files.
Testing, 1 2 3...
Here's a great joke: "What do you call a musician with no
girlfriend?"
[crickets chirping] Wait, that's not funny, you say, and you're
right. But
why is it not funny? Because it's just a setup with no
punchline. In comedy, tech pubs, and most other worthwhile human endeavors,
preparation is useless unless you deliver the kicker - and that's the problem
with our examples so far.
Identifying unique elements, topics, and maps and applying metadata
to differentiate them is only half the job. Metadata itself doesn't
do anything; it just sits there patiently waiting until it's
needed. To make it useful, we have to tell the build process what to do with
it; that is, we have to define the filtering conditions for the build.
The
ditaval file is the mechanism we use for that purpose. Like the
map file and the XSL transforms, the ditaval file is read by the build and used
to drive the filtering process as the output stream is created. The ditaval
file essentially contains two things: conditions to be matched and actions to
be taken when they're found.
Ditaval conditions are defined with the
<prop> (property) element, which has three
attributes:
att , the metadata attribute to search for;
val , the metadata attribute value to match; and
action , the action to be taken when the metadata
attribute value is matched. Think of it rather like a CSS rule: look for
elements that contain the metadata attribute
att ; if you find one, see if its value is equal to
val ; if so, perform the specified
action . You can include as many
<prop> elements as you like, in any order; much
like CSS and XSLT, it's a wonderful demonstration of declarative processing at
work. Let's look at some examples.
Earlier, we added the
audience attribute as element metadata to some task
steps (and presumably to other elements, topics, and topic references in our
content repository). Now, if we want to produce a user guide for novices, we
might code conditions in the ditaval file like this:
<val>
<prop att="audience" val="intermediate" action="exclude" />
<prop att="audience" val="advanced" action="exclude" />
...
</val>
These conditions allow the novice audience elements through while
filtering out the intermediate and advanced audience elements.
Next, we added the
<author> element as prolog metadata to some
topics, naming two contributing authors. If we want to produce a review
document containing only those topics written by a single author, we can do it
by excluding the other with a ditaval condition, like this:
<prop att="author" val="Anna Graham" action="exclude" />
This will filter out Anna's topics and leave us with a complete
listing of topics written or contributed to by her colleague Otto.
Finally, we added the rev attribute to some topic references in a
ditamap, identifying installation topics for demo, 1.x, and 2.x software
versions. When we're ready to produce an installation guide for the 2.x
version, we can code ditaval conditions to exclude the others, like this:
<prop att="rev" val="demo" action="exclude" />
<prop att="rev" val="1.x" action="exclude" />
The result will be our desired document, an installation guide for
the 2.x product only, with the demo and 1.x topics filtered out. Thus, the
ditaval file's
<prop> element becomes the killer punchline for
the clever metadata setup.
Which reminds me: "Homeless."
A Hippo in the Ointment
Now if you're ahead of me on this, and you probably are, you'll note
that these examples seem to approach the document assembly process somewhat,
well, backward. We don't include the elements we want, we exclude the ones we
don't want. Odd as it seems, that's exactly how the "exclude" action works.
Gosh, wouldn't it be nice if there were also an "include" action? Well, there
is... sort of.
The pre-V1.4 DITA Open Toolkit only offered the "exclude" action (and
"flag", which is beyond the scope of this article); however, as of OT V1.4, a
new "include" action became available. But - if I may use a phrase with which
I'm painfully familiar - "it isn't what it looks like!" For example, you'd
think that the single
<prop> tag below is equivalent to the two
immediately above, including just 2.x content and excluding demo and 1.x
content:
<prop att="rev" val="2.x" action="include" />
But you'd be wrong. Yes, given that tag, the 2.x topics will be
included, but so will the demo and 1.x topics. That's because the default
action for all elements, marked or unmarked, is "include." Let's say that
again, because it's hugely important: the default action for all elements is
always "include." Since that's the case, you might be wondering if you could at
least add that third
<prop> tag to the first two, just to make your
intentions clear. Yes, but that's just like calling in your vote for American
Idol - you can do it, but it won't make any difference.
The reason "include" doesn't work quite as intuitively as we'd like
is because its primary use is for elements with multiple metadata values in the
same attribute. The filtering logic for multiple values can get sticky pretty
fast, so let's leave that for another article. Bottom line, "include" doesn't
really do us any good in ordinary, everyday filtering, but that's really not a
bad thing; read on.
For now, we can safely say there is just one absolute, immutable rule
for ditaval conditions. This rule is true regardless of your DITA OT version,
authoring tool, or processing environment. It's true for all maps, topic
references, full topics, and individual elements, whether marked with metadata
or not. It's true all the time, for all builds, in all cases. The rule is this:
Everything not explicitly excluded is included.
At first blush this rule seems restrictive, but in practical terms it
greatly simplifies the process of marking up content for conditional
processing. We can now approach our content with a simple plan: add metadata to
anything we might want to exclude later and leave everything else alone!
Because the vast majority of content in any documentation set is included in
most output formats (if not, you're doing it wrong), it's obviously easier to
mark up some content you want to exclude under certain circumstances than to
mark up all the content you want to include under most circumstances. Sweet.
Loose Ends
But as you might guess, that's not quite everything. You can almost
hear that fellow with the glass eye, cigar butt, and rumpled trenchcoat say,
"There's just one more thing." (That's an old-guy joke; if you don't get it,
you're too young to remember most of the stuff we old guys think is funny. Now
get off our lawn.)
We know that we add metadata to DITA elements and that we add
<prop> conditions to a ditaval file so the build
can properly filter the elements. But there's our missing connection: how does
the build process know where our ditaval conditions are? The answer is simple,
if inelegant. We tell it where they are.
A build file contains a number of
<property> tags (not to be confused with
<prop> tags in the ditaval file), that provide
the build process with required information, such as the input file location,
the output file location, the output translate
type, and so on. To specify the location of the
ditaval file containing the filtering conditions, we just add one more
<property> tag to the build file, like this:
<property name="dita.input.valfile"
value="${basedir}$
{fileseparator}myprojects
{fileseparator}UserGuide
{fileseparator}userguide.ditaval"/>
This tag tells the build that the ditaval conditions file
"dita.input.valfile" is named "userguide.ditaval" and should be found in the
"myprojects\UserGuide" folder under the DITA base directory, "C:\DITAOT\" for
example. The build can now load the filtering conditions from the ditaval file
and apply them to the metadata attached to the various project elements.
Finally, although this article has included actual code snippets, I
realize that hand-coding is
so five minutes ago. Most good authoring tools now include
user-friendly interfaces to the nuts and bolts of metadata, build conditions,
and file locations, so that setting up and implementing conditional processing
is relatively easy.
Summary
DITA is a brilliant implementation of structured authoring,
incorporating single-sourcing, content sharing and reuse, and conditional
processing as core technological elements. Conditional processing is at the
heart of content specificity, and metadata is its control mechanism. Grasping
the relationship between metadata and filtering is one of the "aha!"
experiences we have along the road from linear narrative to structured
authoring, a little epiphany that suddenly propels us forward in our efforts to
get the most benefit from technology and makes our authoring jobs a bit easier,
a lot more productive, and yes, even fun.