MathML from Word in DITA For Publishers
Below is a guest post from Eliot Kimber, the main developer of DITA for Publishers. DITA is an XML standard for documents, well known for its architecture, which promotes modularization and re-use of content.
The DITA for Publishers (D4P) project provides a general Microsoft Word to DITA transformation framework. This framework makes it relatively easy to generate DITA documents from styled Word documents.
Upcoming D4P release 0.9.19 includes support for getting MathML from Word documents where the MathML equations are stored as text content as generated by the MathType plugin from MathType equations.
This makes it easy for you to use MathType in Word and produce DITA documents that include the equations as MathML markup, rather than as images. You can then use the MathML support in XML editors, such as OxygenXML's integration with MathFlow, to edit the equations in the DITA source.
The DITA for Publishers project includes a domain for including inline MathML in DITA content. In addition, DITA 1.3 will include a standard domain for including MathML, either inline or by reference to separate XML documents.
Generating DITA with MathML from Word Documents
To get your MathML into your DITA from Word, you need to set up the Word-to-DITA style to tag mapping with the details of the tagging to use for the MathML. You must specify the name of the element to contain the <m:math> element and, optionally, a parent container that represents the equation the MathML produces.
When you convert MathType equations into MathML in the Word document, the MathML goes into text with the character style name "MTConvertedEquation". The D4P Word-to-DITA transform treats each converted equation as a single paragraph with the paragraph style "MTConvertedEquation", where the paragraph contains a single <mathml> element.
You can then use this style name in your style-to-tag map to put the appropriate wrappers around the MathML markup.
For example, to use the D4P markup, you would create a style map entry like this:
<style styleName="MTConvertedEquation" containerType="d4p_display-equation" tagName="d4p_MathML" level="1" structureType="block" topicZone="body" />
Which will result in markup like this:
<d4p_display-equation> <d4p_MathML> <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="block"> <m:semantics> <m:mrow> <m:mfrac> <m:mrow> <m:mi>d</m:mi> <m:mi>x</m:mi> </m:mrow> <m:mrow> <m:mi>d</m:mi> <m:mi>t</m:mi> </m:mrow> </m:mfrac> <m:mo>=</m:mo> <m:mi>J</m:mi> <m:mo>.</m:mo> <m:mi>X</m:mi> <m:mo stretchy="false">(</m:mo> <m:mi>t</m:mi> <m:mo stretchy="false">)</m:mo> </m:mrow> </m:semantics> </m:math> </d4p_MathML> </d4p_display-equation>
To generate the DITA 1.3 markup you would use this style mapping:
<style styleName="MTConvertedEquation" containerType="equation-block" tagName="mathml" level="1" structureType="block" topicZone="body" />
Which would produce this markup:
<equation-block> <mathml> <m:math display='block'> <m:semantics> <m:mrow> <m:mfrac> <m:mrow> <m:mo>−</m:mo><m:mi>b</m:mi><m:mo>±</m:mo><m:msqrt> <m:mrow> <m:msup> <m:mi>b</m:mi> <m:mn>2</m:mn> </m:msup> <m:mo>−</m:mo><m:mn>4</m:mn><m:mi>a</m:mi><m:mi>c</m:mi> </m:mrow> </m:msqrt> </m:mrow> <m:mrow> <m:mn>2</m:mn><m:mi>a</m:mi> </m:mrow> </m:mfrac> </m:mrow> </m:semantics> </m:math> </mathml> </equation-block>
The support for MathML in the D4P Word-to-DITA transform is generic in that the MathML markup could come from any source. The transform is implemented as a two-phase process. The first phase generates an intermediate file (the "simple wordprocessing markup") file and the second phase converts the intermediate file into DITA using the style-to-tag mapping.
In the intermediate file the MathML is just normal MathML markup. This means that the MathML could come from other sources, such as from Word's built-in math markup language or from the MathType equations in other ways, once they are available. The D4P Word-to-DITA transform is completely extensible so if you needed to you could customize the MathML generation process in whatever way you required.
Rendering MathML Using the DITA Open Toolkit
Using the DITA Open Toolkit, you can render MathML to HTML and to PDF.
To render MathML to PDF using the Open Toolkit you are dependent on the XSL-FO engine you use. Both Apache FOP and Antenna House XSL Formatter support direct rendering of MathML included in the FO file. FOP requires the separate JEuclid libraries (from the Apache JEuclid project). Antenna House's support is built in.
The RenderX XEP product does not provide direct rendering of MathML, so you would need to generate images or SVG from the equations and then reference those from the FO. This wouldn't be that hard to implement (e.g., using either the Design Science MathFlow Document Composer or the JEuclid library).
Eliot Kimber is an independent consultant at Contrext Solutions focusing on DITA information analysis, markup design, and system implementation for Publishers. Eliot is founding member of the DITA Technical Committee, a founding member of the W3C XML Working Group, a co-editor of ISO/IEC 10744:1997, HyTime 2nd Edition, and a participant in the development of many other SGML- and XML-related standards. Eliot is also the founder and main developer of the DITA for Publishers open-source project. When not creating new DITA specializations, Eliot likes to skateboard, study Aikido, and feed his flock of urban chickens.