totse.com | ICML Technical Addendum

by IC Metadata Sub-Working Group

NOTICE: TO ALL CONCERNED Certain text files and messages contained on this site deal with activities and devices which would be in violation of various Federal, State, and local laws if actually carried out or constructed. The webmasters of this site do not advocate the breaking of any law. Our text files and message bases are for informational purposes only. We recommend that you contact your local law enforcement officials before undertaking any project based upon any information obtained from this or any other web site. We do not guarantee that any of the information contained on this system is correct, workable, or factual. We are not responsible for, nor do we assume any liability for, damages resulting from the use of any information on this site.

ICML Technical Addendum

Prepared for the IC

Prepared by
IC Metadata Sub-Working Group

4 October 2001

ICML Technical Addendum
Table of Contents
TABLE OF CONTENTS I
1 INTRODUCTION 1
2 BACKGROUND 1
3 COMPARISONS TO OTHER APPLICATIONS 2
3.1 SGML AND XML 2
3.2 HTML 3
3.3 INTELINK METADATA GUIDELINES 3
3.4 DOCBOOK 4
3.4.1 Easy to Understand 4
3.4.2 Customizing DocBook 4
3.4.3 Flexibility 6
3.4.4 Existing Infrastructure 7
3.4.5 ICML Applicability 7
3.4.5.1 Notation List 7
3.4.5.2 Character Sets 7
3.5 COMPARISON TO XHTML 7
4 ICML DESIGN 8
4.1 MODULARIZATION TECHNIQUES 8
4.2 EXPANSION/RESTRICTION 9
4.3 ELEMENT AND ATTRIBUTE NAMING CONVENTIONS 10
4.4 MARKED SECTIONS 11

ICML Technical Addendum
1 Introduction
This Technical Addendum is provided to supplement the evaluators of ICML v0.5 during their evaluation. It covers some background material and documents some of the research and perspectives that affected key decisions made in the ICML development. This addendum along with the release notes will be expanded in the future to become formal ICML documentation.
2 Background
The endeavor to create an IC markup language is rooted over ten years ago. A loosely federated set of IC agencies and command projects were pursuing the use of the Standard Generalized Markup Language (SGML) for improving intelligence production and pursuing some level of business process automation, primarily in the publishing world. SGML, or ISO 8879, was introduced to the IC through two activities and these served as the implementation drivers within the DoD for many years.
The first activity was the adoption of SGML by the Department of Defense (DoD) into the Continuous Acquisition and Life-cycle Support (CALS) initiative, which dealt with the logistics of passing information, mostly technical manuals, between the government and contractors working to build large military systems. The second was an Office of the Secretary of Defense (OSD) electronic publishing concept of operations that suggested the use of SGML as the basis for DoD-wide intelligence production compatibility. SGML and numerous related markup standards, such as the formatted output specification instance (FOSI) and MIL-SPEC 38784 which is an SGML application for interactive electronic technical manuals, owe their existence in large part to the DoD.
IC organizations like the Central Intelligence Agency (CIA), National Ground Intelligence Center (NGIC), and the Defense Mapping Agency (DMA), as well as the major publishing centers in the four military services pursued SGML implementations on a variety of projects. The Community Management Staff stood up an Electronic Publishing Resource center which hosted periodic SGML training and an SGML Registry for storing IC-developed SGML applications (DTDs). The resource center and the SGML applications are still available today on Intelink from the Intelink Management Office.
Over the years, numerous organizations, such as NGIC and the Joint Intelligence Center Pacific (JICPAC), continued the pursuit of SGML, but more organizations began using the most well known SGML application even today – HTML. HTML and the World-Wide Web (WWW) changed the application of markup significantly. It was now built into the infrastructure of everyday desktop tools. However, HTML is a very specific tag set developed for a very specific purpose which is the presentation of content in a web browser. This new technology clearly offered a new and improved way to get information to consumers and Intelink became that primary dissemination means for delivering intelligence content.
Many who had been using SGML recognized that its complexity and cost were inhibitors to successful implementation. Many who had been using HTML recognized that its simplicity was causing some major issues with content creation and providing a more rich set of metadata was near impossible. Despite, these findings HTML became a part of the intelligence production infrastructure and to the credit of some very dedicated people within the IC, some guidelines for implementing HTML were created. These guidelines not only dealt with best-practices, but also included metadata definitions from which an IC-wide search infrastructure was created and hosted on Intelink.
With the advent of the Extensible Markup Language (XML) in 1996, the World-Wide Web Consortium (W3C) was bridging the gap between full-blown SGML and HTML. The goal was to make a standard that was extensible, meaning that like SGML it could be used to define new markup languages, while also being easier to implement and was focused much more on delivering content to the web. The W3C stopped the development of HTML with the 4.0 version and has switched its entire effort to the development of XML, related standards, and related applications of XML, such as XHTML (the newest XML compliant version of HTML).
The introduction of XML has brought the IC to a strategic decision point about what standards and technologies to implement. It was felt that the next step in the process was the development of a standard – something that past efforts never had. To this end, the IC Metadata Sub-Working Group (MSWG) has developed ICML. ICML has superceded the JIVA KOM/GOM standards and is being developed and sponsored by the greater IC for the benefit of the greater IC.
3 Comparisons to Other Applications
3.1 SGML and XML
The W3C Extensible Markup Language (XML) Recommendation is a simplified form of its predecessor ISO 8879: Standard Generalized Markup Language (SGML). Both SGML and XML “standards” define the syntax and rules for developing your own markup language. These standards define the three components that make up an SGML or XML application: the declaration (used only for SGML and defaulted for XML), the document type definition (DTD), and the instance (the tagged document).
· The declaration has numerous settings that control the interpretation of the DTD and instance, such as: how many tags and attributes there can be, how many characters can make up a tag or attribute name, what special characters are used for what purposes, etc.
· The SGML or XML DTD defines the tags (elements) and attributes that can be used in an instance. The element names, the attribute names and values assigned to each element, and the relationships between elements are all declared in the DTD. When a DTD is parsed, it is done so against the DTD syntax rules defined in ISO 8879 or in the XML Recommendation. The declaration is also used in the case of SGML to provide further configuration instructions where the XML declaration is typically defaulted.
· The SGML or XML instance is the document itself containing both content and markup. The instance is validated against the DTD to ensure that the markup used within the document conforms to the rules for element and attribute usage. The instance parser also ensures that any markup that exists follows the syntax defined in ISO 8879 or the XML Recommendation as modified by the possible declaration.
Numerous SGML and XML applications have been built over the past 30 years for specific “vertical” industries (e.g., a group of users or organizations with a similar interest, such as the Air Transport Association or the DoD’s MILSPEC 38784 for Technical Manuals). These SGML and XML applications define the semantic tagging structure and controlled values for all types of metadata and markup within a content space. These standard applications promote interoperability within the industry or support a specific type of application, such as MathML or the OASIS Exchange Table Model.
While there are no industry standards for application naming the most conventional approach is to add some descriptor to the front of “markup language”. An acronym is them created as the application reference form. There is no guidance for the length or format of the descriptor, but the common practice is the use the “ML” addition at the end. There is also no distinction made in the application name that indicates that the application is an SGML or XML application. The last consideration that goes into a name is marketing. Most of the applications use catchy, easy to remember acronyms that are easy to sell to the industry or to a user community.
ICML is an application of XML. IC stands for Intelligence Community. The obvious goal is to get content interoperability across the entire IC. There could be both SGML and XML versions of ICML, so the generalized recognition of ICML as simply a markup language (ML) is appropriate.
3.2 HTML
One of the most common SGML applications is the HyperText Markup Language (HTML). HTML was a simple, style-based application used solely for display of content within a web browser. Originally, HTML did not expose the declaration or the DTD. The styling information associated with the markup, along with the declaration and DTD, were embedded within the web browser. When the W3C was formed and HTML was brought under its control, the HTML DTD became part of the published standard. Today’s latest version is HTML 4.0. The version number is most commonly used to identify compliance level.
Different browsers support different versions of the HTML DTD and associated extensions added on by the browser vendor. An HTML file is created in compliance with a particular web browser, a specific browser version, and browser-specific features. Browser-specific HTML extensions are one example of the problems that can arise if too many users of the application take liberties to customize the application for specific uses. The result could be partial lack of interoperability.
HTML is an application of SGML. XHTML is an application of XML (the XML version of HTML). The Intelink Metadata Guidelines are an implementation of HTML. ICML is simply a new HTML-like standard for intelligence content using XML, except ICML is more content-based than the format-based HTML or its newer XML version, XHTML.
3.3 Intelink Metadata Guidelines
In 1994 the Intelligence Community identified a need to coordinate markup within the IC. Production centers utilized markup to improve processes and for creation and dissemination of their products. To support the automated indexing of content hosted on Intelink, it was determined that a metadata standard was needed. The method defined for content providers to communicate that metadata was to store that information within the HTML file using the META element's name/value pairs. Products are then registered with the Wer'zit search engine.

Another technique used on Intelink is to produce a metadata card. The card is a separate HTML file from the actual HTML document. The card contains the crawlable HTML metadata. Among that metadata is a pointer to the actual location of the HTML document. This technique is used to reduce the amount of crawling of actual documents and to simplify access control requirements for the web crawlers.

The Intelink Management Office defined a series of name/value pairs expected to be included within every available file on Intelink or defined with a corresponding metadata card. These META elements are crawled and indexed by the Intelink AltaVista search engine or the Wer'zit search engine. The Intelink name/value pairs are called the Intelink Metadata Guidelines. These guidelines are exactly that - guidelines.

The Intelink Metadata Guidelines use HTML’s facilities to provide guidance to the Community about how to capture Intelink metadata. The Guidelines do not define tags and attributes, but only provided guidance for using the META element. The loosely controlled META element within the HTML HEAD element provides a means to introduce a NAME attribute and a VALUE attribute as shown below:
<meta name="IL.xxx" content="yyy">
e.g. <meta name="IL.title" content="(U) Arms Transfers and Technology">
The Intelink Metadata Guidelines cannot be thought of as a markup language. It is really a controlled vocabulary of sorts implemented within the HyperText Markup Language. The vocabulary is defined within the Intelink Metadata Guidelines document. The Intelink Metadata Guidelines take advantage of the only extensible part of HTML syntax – the META element. However, the contents of the name/value pairs can contain erroneous values and are not controlled within the markup language directly using real-time parsing techniques.
A true markup language implies some level of programmatic control and validation defined in the language itself. The HTML META element cannot be programmatically controlled or validated via the markup language during creation time or processing time without the development of extraneous programming. An SGML or XML application can build in the specification of required or optional named element or attribute constructs and requisite values. This level of specification and enhanced metadata structure is but one of ICML’s requirements.
ICML is not the equivalent of the Intelink Metadata Guidelines. The Intelink Metadata Guidelines are implicit and the implementation is defined as guidance in a document. ICML, alternatively, was built from the rules defined in XML and has explicit definitions, not implicit suggestions, for capturing a more robust set of metadata.
The primary document or product metadata of ICML includes the Intelink Metadata Guidelines, but also includes a more robust model that is more significant and structurally useful. All of the Intelink Metadata Guidelines have been mapped into ICML’s document/product metadata block. The metadata block of ICML is an expanded, XML version of the Intelink Metadata Guidelines implemented in HTML. The difference is in the amount of information, the richness of the structure, and the explicit definition of required and optional metadata plus specified values, all of which can be parsed during creation to ensure valid completion.
3.4 DocBook
DocBook is a DTD for computer documentation. It is suitable for use in both books and papers, and for both computer software and hardware. The DTD was certified as an OASIS Standard on 2 February 2001 after a vote of the OASIS membership. The following is excerpted from the DocBook web site:
“Because it is a large and robust DTD, and because its main structures correspond to the general notion of what constitutes a "book"; DocBook has been adopted by a large and growing community of authors writing books of all kinds. DocBook is supported "out of the box" by a number of commercial tools, and there is rapidly expanding support for it in a number of free software environments. These features have combined to make DocBook a generally easy to understand, widely useful, and very popular DTD. Dozens of organizations are using DocBook for millions of pages of documentation, in various print and online formats, worldwide.”
DocBook v4.1.2, an XML version of the DTD, is found at http://www.oasis-open.org/DocBook/xml/4.1.2/, and v4.1, an SGML version of the DTD, is found at http://www.oasis-open.org/DocBook/sgml/4.1/. Documentation for the DTD is found at http://DocBook.org/tdg/.
3.4.1 Easy to Understand
DocBook is generally easy to understand. The mention that DocBook is gaining “rapidly expanding support … in a number of free software environments” is an important point. The DTD was and continues to be primarily focused on computer documentation and, within the free software environment, it is easily understood. However, to intelligence analysts who typically don’t know the terminology used in computer documentation or – more importantly – aren’t writing about computer documentation, the DTD can be a bit overwhelming. An intelligence analyst is likely to use about 10 percent of the DocBook DTD to create a product; wading through the other 90 percent is likely to minimize productivity.
3.4.2 Customizing DocBook
The DocBook Technical Committee has created two versions of the application: full DocBook and a simplified subset. The simplified version caters to those customers who are seeking a traditional document publishing model rather than a software documentation application. A summary follows from the DocBook web site:
[The "Simplified" DocBook XML DTD is a small subset of the DocBook XML DTD. Over the years, the notion of a "simple" DocBook subset has come up many times. It has been observed that the number of elements in DocBook can be a little overwhelming to the new user. This DTD is an attempt to make a small subset of DocBook. The goals of this subset are:
· Documents written in the subset must be 100% legal DocBook documents.
· This subset for single documents (articles, white papers, etc.), so there's no need for books or sets, just 'articles'.
· The markup should be the smallest practical subset, if you need richly structured markup, use full DocBook. In particular, the subset often selects a single element from a family of related elements; for example, programlisting is provided, but screen is not.
· The DTD must work in online browsers (it's XML not SGML). It must be small enough to download more-or-less painlessly.]
The DocBook Technical Committee provides specific guidance for extending and restricting these models. Basically, the simplified version is not to be modified; instead the full version is to be used for any application customization. This means that a great deal of customization is typically required. The DocBook organization also provides guidance regarding referencing new models back to the original DocBook application. Once customizations are undertaken, the new application cannot be called DocBook anymore. If someone wants to find significant parallels between DocBook and ICML, then its OK to do so, but understand even if we made one change we can't call the new application DocBook. The following is an excerpt from DocBook: The Definitive Guide, Version 1.0.3. by Norman Walsh and Leonard Muellner:

Chapter 5. Customizing DocBook
For the applications you have in mind, DocBook "out of the box" may not be exactly what you need. Perhaps you need additional inline elements or perhaps you want to remove elements that you never want your authors to use. By design, DocBook makes this sort of customization easy.
This chapter explains how to make your own customization layer. You might do this in order to:
· Add new elements
· Remove elements
· Change the structure of existing elements
· Add new attributes
· Remove attributes
· Broaden the range of values allowed in an attribute
· Narrow the range of values in an attribute to a specific list or a fixed value
You can use customization layers to extend DocBook or subset it. Creating a DTD that is a strict subset of DocBook means that all of your instances are still completely valid DocBook instances, which may be important to your tools and stylesheets, and to other people with whom you share documents. An extension adds new structures, or changes the DTD in a way that is not compatible with DocBook. Extensions can be very useful, but might have a great impact on your environment.
Customization layers can be as small as restricting an attribute value or as large as adding an entirely different hierarchy on top of the inline elements.
5.1. Should You Do This?
Changing a DTD can have a wide-ranging impact on the tools and stylesheets that you use. It can have an impact on your authors and on your legacy documents. This is especially true if you make an extension. If you rely on your support staff to install and maintain your authoring and publishing tools, check with them before you invest a lot of time modifying the DTD. There may be additional issues that are outside your immediate control. Proceed with caution.
That said, DocBook is designed to be easy to modify. This chapter assumes that you are comfortable with SGML/XML DTD syntax, but the examples presented should be a good springboard to learning the syntax if it's not already familiar to you.
5.2. If You Change DocBook, It's Not DocBook Anymore!
If you make any changes to the structure of the DTD, it is imperative that you alter the public identifier that you use for the DTD and the modules you changed. The license agreement under which DocBook is distributed gives you complete freedom to change, modify, reuse, and generally hack the DTD in any way you want, except that you must not call your alterations "DocBook."
You should change both the owner identifier and the description.
If your DTD is a proper subset, you can advertise this status by using the Subset keyword in the description. If your DTD contains any markup model extensions, you can advertise this status by using the Extension keyword. If you'd rather not characterize your variant specifically as a subset or an extension, you can leave out this field entirely, or, if you prefer, use the Variant keyword.

By way of confirming the comments above, SAIC has evaluated the DocBook DTD numerous times for different customers. In every case, we have found that the changes to the DocBook model that are necessary to make it support a particular customer's application, especially those in the Intelligence Community (IC), require that the model be modified significantly. Because the model is focused on software documentation and built as an interchange standard, it is too verbose and too flexible, so it must be restricted to provide more control and a smaller set of model constructs. To satisfy IC customers, the model must be expanded to support specific metadata requirements and security markup. The end result of these customizations will invariably result in a highly modified version DocBook that bears little resemblance to the original.
3.4.3 Flexibility
The DocBook model allows significant flexibility because it defines element and attribute models in a very repetitive manner and a majority of the tags are available at anytime. This makes the model more complex than HTML in that it not only has the flexibility of HTML, but it also has ten times the available tags. Using the model within an authoring environment is very difficult and requires intimate knowledge of the meaning and purpose of each tag and what the formatted content might look like as a result of selecting certain tags.
3.4.4 Existing Infrastructure
One of the arguments for using DocBook is that a large supporting infrastructure is available within many COTS software products. Thus, it is a wonderful application for an organization to use to evaluate XML within their production processes. However, once the decision has been made to use XML and an authoring tool, it rarely makes sense to use DocBook "out-of-the-box". New stylesheets are commonly required to make the authoring environment easier to use, customizations to the model are typically required to reduce its confusing glut of selections, and output composers (stylesheets) must be modified or created to conform to the organization's look-and-feel. Although DocBook application infrastructures are available to help get started in this process, XML applications being developed for broad use across an enterprise commonly require a great deal of customization to satisfy the requirements of each subordinate organization. The amount of work that results usually overshadows the benefits inherent in the out-of-the-box application and frequently is more expensive in the long run than giving enterprise members an opportunity to drive a new model with their individual requirements, rather feeling burdened to stay within DocBook's boundaries.
3.4.5 ICML Applicability
The IC MSWG XML Team developing ICML are selecting structure rules that will require significant modifications of the DocBook model in order to meet the writing style, security marking, and metadata standards. Therefore, the team dismissed the DocBook model for being too generalized and too difficult to mold into the desired form. Nevertheless, IC MSWG XML team members and the Intelink Management Office’s contractor (SAIC) have taken some architectural guidance from DocBook regarding DTD module construction where appropriate. There are a number of similarities between ICML and the DocBook application. For example, the Notation and Character Entity working modules were so similar that the XML team approved the wholesale adoption of selected DocBook DTD modules.
3.4.5.1 Notation List
The previous JIVA KOM/GOM models used the W3C’s notation list, which included a notation for every accepted MIME type. XML team members noted that the IC intended to reduce the number of MIME types (i.e., graphics will all be of type A or B) and that DocBook already had reduced the MIME list to a more reasonable set that presumably met a majority of customer's needs, since the lesser set was in the DocBook application. In the absence of an approved list or recommendations, XML team members adopted the DocBook list and used its file in ICML “as is” to simply the maintenance process.
3.4.5.2 Character Sets
Because the DocBook application used the same ISO character sets that the JIVA KOM/GOM did, the MSWG XML Team adopted the DocBook character entity file as well. The DocBook character entity file includes the MathML character entities that exceed the standard publishing and special character entities. Hopefully, these files will remain synchronized with the DocBook application, because the same organization that hosts the committee for DocBook also facilitates the development of MathML. While the DocBook application in its entirety is quite verbose and overly flexible for the main content, borrowing DocBook components that make sense will make maintenance easier and more commercially viable.
3.5 Comparison to XHTML
XHTML is a W3C Recommendation intended to replace the current HTML 4.0 Recommendation with an XML application. Like HTML is an SGML application and ICML is an XML application, so too XHTML is also an XML application. XHTML provides as a base set of tags the same tags available in HTML. However, the XHTML DTD requires that documents tagged with previously known HTML tags conform to XML document standards, most importantly being well-formedness. Well-formedness is the requirement that all structures are identified with a start and end tag and are properly nested. Well-formedness is not currently a requirement of HTML, not is it a requirement in existing web-browsers unless they recognize the content as XML. If it does recognize the document as XML it does require well-formedness and could possibly even require validation against a DTD.
XHTML, like DocBook and many other XML application, does provide facilities for customizing the application (i.e. extend or restrict). With this flexibility, XHTML could be drastically modified to support IC writing styles, metadata, and document model requirements. But once these levels of customization are required, one has to ask if it retrofitting makes sense. As in the DocBook analysis, it was felt that the modifications that would be required to support the intent of ICML would be drastic and therefore retrofitting didn’t make sense. The inclusion of security requirements alone make an XHTML retrofit more complicated than not.
The obvious thing that retrofitting XHTML would provide is a familiarity with existing HTML tags. Some who are familiar with HTML might appreciate this. However, if the models for those familiar tag have indeed changed, then they probably should not preserve the same tag name to avoid confusion. For those not familiar with HTML tags, they can be a bit cryptic. The new ICML tag names are more descriptive and if one had to interact with them would presumably be easier to understand than P, OL, UL, LI, TD, B, H!, etc.
The next issue that needs to be addressed is compatibility. The intent behind XHTML is to provide some better validation of HTML, not to provide a platform for creating a completely different language. In the end, the goal of XHTML is to serve a more properly structured HTML file to the same web browsers currently in use and to have those same HTML tags recognized and displayed or processed accordingly. If XHTML is modified to support IC requirements, then the resulting file will have tags that are not HTML in nature, resulting in a need to provide some browser support in order to process them.
One of the MSWG XML team members also found out something through working with XHTML is some popular COTS products that support the Recommendation. Since XHTML is really an XML application and not an SGML application and is clearly not to be processed the same way that HTML is to be processed, it requires its own file extension of “.xhtml”. Through mime-type recognition, this is how tools know to process an XML file (.xml), an HTML file (.htm or .html), or an XHTML file (.xhtml). The point was made that this alone causes incompatibility with current tools and since a new file extension is necessary, why not just make it XML and have your own application to support it.
4 ICML Design
4.1 Modularization Techniques
ICML is being developed as an XML application using a DTD. The DTD has been constructed using standard XML modularization techniques that allows for local extension or restriction and module reuse to the greatest extent possible. ICML uses a modularization technique similar to that used in DocBook. The following figure depicts the modularization of the baseline ICML:

Figure 1: ICML DTD Modules
4.2 Expansion/Restriction
As the MSWG XML Team debated the expansion or restriction of DocBook in the creation of ICML, the group also debated the ability for IC organizations to apply similar expansion or restriction of ICML for specific needs. It was determined that it made more sense to introduce a documented method for local overrides than let different organizations develop and possibly implement different XML applications that ultimately were not interoperable.
In ICML, extensions or restrictions must be done through organization-specific DTD overrides, a common DTD practice which exploits the parsers “first-read, takes precedence” rule. This means that if DIA wants to override the model for a “title” element to not allow “emphasis”, then they just redeclare the title element in a special “dia-icml.dtd”. Any document that is parsed against the “dia-icml.dtd” will be validated against the override rule. Any part of the ICML model can be overridden in this fashion including security model restrictions, metadata selections, or main content models.
It would be preferred that all overrides to ICML that are unique to an organization be captured in the one override DTD. The only time when this doesn’t make sense is when a new document or product DTD is being created which uses ICML components, but isn’t really an override to any of the provided document/product models in ICML. In this case, the DIA wanted to create a MID product DTD, the new document DTD would be called “dia-mid-icml.dtd” which would contain the new constructs. This DTD would call the “dia-icml.dtd” to pick up any DIA-specific modifications which ensure interoperability within DIA. The “dia-icml.dtd” would call the “icml.dtd”.
The following figure depicts the modularization extension to the baseline ICML if needed:

Figure 2: ICML Local Module Override
4.3 Element and Attribute Naming Conventions
When naming element and attribute name there are many issues to consider. These issues should include: case sensitivity, length, merging of multiple word names, and contextual independence. Each of these should be decided up from in order to instill some level of consistency across the DTD, Schema, or any customization done by users.. Each of these are left to the preference of the DTD or Schema developer and there is little to no guidance for how they should be done.
Length: The typical guidance for length is to make the name useful and descriptive, but no verbose. Every character in the name (which may occur multiple time in a document) takes up one more character space in the file, meaning that extremely verbose names will create a slightly large XML file. This was probably the biggest driver in the use of one or two character names in HTML. HTML was created at a time when bandwidth was very limited, so a streamlined file was better. Bandwidth is usually not a probably any more, but the name should still be reasonable in length and meaningful to those interacting with XML.
Merging: There are multiple merging techniques also used. Merging is done when the real name of the element or attribute is made up of multiple words. An example might be naming an individual line in a mailing address. A merged name could be “addressline” , or “address.line”, or “address-line”. Another example might be naming the date that the content is valid until. A merged name could be “validtil”, or “valid.until”, or “valid-til-date”. The only restriction here is that certain characters cannot be used in an XML element or attribute name.
Context: Context independence must also be handled. In the example of the valid until date, there are multiple ways to handle the context issues. In ICML there are many different dates in the metadata block. We could have created elements for each date called <date.x>, <date.y>, and <date.z> or we could have created a <date> element which contains elements called <x>, <y>, and <z>. The <x>, <y>, and <z> method makes these tags reusable in other contexts, allows a reduction in the number of tags (if indeed they need to be used in other contexts), and allows for shorter names.
Case Sensitivity: XML is case sensitive, meaning that all element and attribute names and values must be specifically defined in the DTD and used in the instance in the proper case. Over the last 30 years, there has been no clear standard or guidance regarding this issue other than a switch from SGML not being case sensitive (unless you wanted it to be) to XML having to be case sensitive (with no options for changing it). Many of the best XML developers and trainers suggest that this issue is not necessarily a “religious” one, but is one that is very subjective and is one of complete personal preference on behalf of the DTD or Schema developer. When looking at an XML document in its raw form, the case of the tags is sometimes helpful in visually separating the tags from the text. When the name is a single word name, such as “date”, the naming options include the obvious: “date”, “DATE”, or “Date”. When the name is a multiple word name, such as “addressline”, options might also include: “ADDRESSLINE”, or “AddressLine”.
After evaluation of different options by the MSWG XML team, it was decided that ICML element and attribute names should be more descriptive than HTML, but should keep names as short as reasonably possible. Abbreviations would be used where there was common acceptance of the abbreviation. Abbreviations would not be used to simply save one or two characters from the name if it jeopardized the interpretation of the tag.
ICML names are all lower case, with the exception of those modules of ICML that were taken from other applications (i.e., OASIS table model or the IC Security DTD). Where merging is required to clarify the tags meaning, the names would be merged without punctuation or capitalization. Context independence was preserved where possible, as in the example of “dates”.
4.4 Marked Sections
Marked sections are a way of including or ignoring pieces of XML DTDs or content within a document. They are quite useful in making the XML application more flexible for use with different environments without having to have different DTDs or completely different models. Marked sections have been used at key points in the ICML XML application.
The first place these can be seen are in the “icml.dtd”. In this file, marked sections are used to enclose the callout (reference) to the other DTD modules. These sections look like the following:

<!ENTITY % dbnotn.module "INCLUDE">
<![%dbnotn.module;[
<!ENTITY % dbnotn PUBLIC
"-//OASIS//ENTITIES DocBook XML Notations V4.1.2//EN"
"dbnotnx.mod">
%dbnotn;
]]>

<!ENTITY % dbcent.module "INCLUDE">
<![%dbcent.module;[
<!ENTITY euro "€">
<!ENTITY % dbcent PUBLIC
"-//OASIS//ENTITIES DocBook XML Character Entities V4.1.2//EN"
"dbcentx.mod">
%dbcent;
]]>

<!ENTITY % icmlpool.module "INCLUDE">
<![%icmlpool.module;[
<!ENTITY % icmlpool PUBLIC
"-//USA-IC//ELEMENTS for XML Information Pool Module V0.5//EN"
"icmlpoolx.mod">
%icmlpool;
]]>

A marked section defines an ENTITY that is included (identified by keyword INCLUDE) or is ignored (identified by keyword IGNORE). The ENTITY that is included or ignored then contains some piece of DTD fragment. In the cases from above, the fragments are ENTITY declarations that call in other DTD modules. If any of these are set to IGNORE, then that section of the DTD will not be present. Removing the information pool elements would cause the DTD itself to be invalid because other element declarations in the DTD require declarations in the information pool and if not found, an XML parser will report fatal errors. In the case of the notations or character entities, removing them from the DTD might not be noticeable when the DTD is parsed. However, it would significantly affect the validity of a document that includes characters, for example, not defined in the DTD and it would likely affect an authoring tool’s capabilities as well.
Another use of marked sections can be seen in the “infopoolx.mod” file. In this file the following marked section code is used to reference the IC XML Security Module. Due to security restrictions there is one tag in the security module that makes the model FOUO. With the tag spelled out, this file cannot be transferred via electronic means on the public Internet. So, an unclassified version was created that renamed that tag to a placeholder name.

<!ENTITY % icmlsecurity.fouo.module "IGNORE">
<![%icmlsecurity.fouo.module;[
<!ENTITY % ic-security-fouo-dtd
PUBLIC "-//USA-IC//DTD for Security FOUO, Ver.1.3, 20011003 XML//EN"
"ic-security-fouo-v13.dtd">
%ic-security-fouo-dtd;
]]>

<!ENTITY % icmlsecurity.unclass.module "INCLUDE">
<![%icmlsecurity.unclass.module;[
<!ENTITY % ic-security-unclass-dtd
PUBLIC "-//USA-IC//DTD for Security Unclass, Ver.1.3, 20011003 XML//EN"
"ic-security-unclass-v13.dtd">
%ic-security-unclass-dtd;
]]>

Using marked sections, we can reference both the FOUO and UNCLASS versions of the file and turn them on or off depending on what file you want to use. The ICML DTD is shipped with the UNCLASS file included and the marked sections set to IGNORE the FOUO version and INCLUDE the UNCLASS version. Once the FOUO version is downloaded from Intelink and installed in the ICML directory, the INCLUDE and IGNORE settings in the “infopoolx.mod” file can be edited and saved. When the DTD is parsed and used in an XML tool, the full security model will be followed.
This method may be used more extensively in future releases to support referencing of different versions of DTD fragments. For example, there could be different levels of strictness applied to the metadata block at different points in a document’s life-cycle. This would allow the DTD used in an authoring environment to relax certain metadata requirements to enable the document to be produced without errors. This would be helpful in a scenario where the metadata is to be entered and validated by someone downstream and not the author. However, once the file is completed a different metadata block with more strictness could be used during the final validation step to ensure that all required metadata has been properly applied.