Technical Description

Design Considerations

XML

XML is quickly becoming the predominant format for encoding data, for many reasons: it encodes its information in plain text files; data is human-readable and self-documenting when done correctly; many applications, tools, and APIs exist for manipulating it. These are all good reasons for implementing InfoML as an XML schema.

Java

Developers creating InfoML-based applications can, of course, do so in whatever computer language they wish. However, code written in Java simultaneously supports the Windows, Mac OS, and Linux platforms with one development effort. In addition, the Java platform hosts a wealth of XML-related APIs, which will increase InfoML programmers' productivity.

Simplicity

Though InfoML was designed to be versatile, it won't gain widespread acceptance if it's too hard to use. In general, people resent having to add information they aren't interested in. For this reason, the standard infocard formats require very little: the content being captured (called the body) and one or more keywords. (Every infocard must also contain a unique ID, but software should generate that automatically.) You can store much more information in an infocard, but most of it is optional.

Combinability

InfoML would be interesting even if it only provided a standard format for storing and retrieving "units" of information. However, InfoML is far more interesting and powerful because of its mechanism for combining and commenting on existing infocards.

Since a given infocard can be pointed to by any number of other infocards, information can "exist" in several places simultaneously; when the original information is changed, all the infocards that point to it immediately have access to the new information. This is a powerful feature that makes any information stored using InfoML more useful and more valuable.

Atomicity

The content of an InfoML content card is completely self-contained. This design choice makes each card somewhat larger than would otherwise be necessary (for example, many cards will share the same author and book name). However, this design choice also makes the InfoML design more robust and versatile. People will be able to publish collections of infocards selectively, a process that would be more complicated if individual infocards were to depend on shared data stored outside the card itself.

Global uniqueness

The InfoML design necessitates that each infocard must have an identifier that is unique across all the infocards created in the entire world. The InfoML Specification requires that each ID string contains a user-provided signature string that is guaranteed to be globally unique. Users can create such a signature string based on an e-mail address or domain name that they have authority over.

Appropriateness of classification

All pieces of information are not created equal--it can be useful to distinguish an opinion from a fact, or an idea from either of the first two. For this reason, all infocards must be classified as one of six types: fact, opinion, definition, idea, narrative, or general. Users who do not wish to make this distinction can simply classify a given infocard as being of type "idea" or "general."

In addition, the body of an infocard can optionally be classified as either "exact" or not. This enables users who wish to do so to distinguish between a quotation (which is exact) and a paraphrase, synopsis, or overview (which are not). This optional classification adds to the expressiveness of InfoML.

Extensibility

No designer can anticipate all the uses to which a general-purpose design might someday be put. To increase the potential usefulness of InfoML, its design includes three levels of customization.

First, any standard infocard can be user-customized through the use of the "notes" and "special" fields standard on every infocard. People can use these fields to add information that is useful to them.

Second, developers can create enhanced infocards by attaching "add-on" fields to any standard infocard. These are fields that are permitted but not utilized by the standard infocard's format. All InfoML-friendly software should leave such fields attached to the parent infocard as it is being processed. If you imagine these add-on fields as "sticky notes" attached to a notecard, you can imagine the behavior just described as being the equivalent of leaving sticky notes on notecards even if you don't need what's written on the notes yourself.

Third, software developers will occasionally use the generalized structure of the InfoML specification (or, in technical terms, its schema) to create custom infocard formats. There's a penalty for doing this, because it is not guaranteed that an InfoML-friendly program will recognize infocards that use the custom format. In most cases, anyone who creates a custom infocard format must also create specialized software that knows how to recognize and manipulate infocards in that format. However, these people will have their jobs made easier by the availability of open-source libraries that support the creation and manipulation of both XML data and InfoML data.

Implementation Details

Sufficiently rich text

InfoML text elements do not contain styling information--text attributes such as font name, size, and color. However, to be of use in the real world, it is not enough to store plain text. In most cases, InfoML text elements support any combination of the following attributes: emphasis (italics), strong (bold), code (fixed-width), and "pre" (preformatted). In most cases, the text elements within an infocard express their content as one or more paragraphs of content.

Support for non-text information

Most text elements in an infocard can contain standard <a> anchor and hypertext-link tags, thus enabling infocards to point to any content that can be posted to the Internet. In this way, infocards can refer to graphics, sound, video, and other forms of data.

Keywords

Although any InfoML-based application can certainly be designed to search the text in the body of each card, the keyword mechanism provides a more robust method for tagging infocards with the most likely search terms. An InfoML keyword can be a word, phrase, or any arbitrary sequence of human-readable characters. An infocard can contain any number of keywords, although it must contain at least one.

The InfoML schema does not require keywords to be present; however, the standard infocard format does. The keyword mechanism is sufficiently important to the design of InfoML that any InfoML-based application should be able to handle them and that the users of such applications should be encouraged to tag each infocard they create with at least one keyword. By adding keywords to a card, its creator enhances the content in the card's body, making that content and more useful.

InfoML home page