Overview
Driver for Tralics: Convert LaTeX math snippets to MathML elements
Note: Tralics is a LaTeX to XML Translator from
http://www-sop.inria.fr/marelle/tralics
This is only a driver.
Requirements
- Python 2.6 or greater
- Pexpect package
- Tralics installation
- lxml package
What does it do?
The problem this project attempts to solve is when you have LaTeX math snippets (inline or block math) and you want to convert them to MathML.
For example, you can write code like the following. You write the function get_mathstring() yourself. See the file runner.py for details.
from tralics_driver import driver
math_elems = list()
t = driver.TralicsDriver('/usr/local')
for fname, mathstring in get_mathstring():
elemstring = t.convert(fname, mathstring)
math_elems.append(elemstring)
if t.errors:
print 'ERRORS:'
print t.errors
t.stop()
Assumptions
- You have installed Tralics
- If you have custom newcommands, you put them in a file called newcommands.tex and place the file in the Tralics conf directory
- You want to cache your conversions on disk
Project Layout
The code is structured as follows:
__init__.py
Contains helper functionsunescape
andescape
to handle html/xml markup changes, and a string template to return the contents of an error.driver.py
Contains only the driver class,TralicsDriver
. The class methods are as follows:__init__(tralics_dir, options)
. Sets up locations for the tralics binary, the customized 'newcommands' file if it exists, and the cache file if it exists.start
Spawns apexpect
process and reads in the customized 'newcommands' file if it exists.stop
Closes the process and writes the cache to disk.convert(fname, mathstring)
returns the MathML string of the LaTeXmathstring
that was found in file namefname
. If possible, get the converted MathML string from the cache; otherwise, pass the string to the Tralics process. Prints a dot (.) for strings found in the cache, or a plus ('+') if the string was processed by Tralics.getmath(expr)
Passes the LaTeX stringexpr
to the Tralics process and returns the MathML string or an element string containing the filename, error message, and LaTeX string that caused the error.clean_formula(expr, result)
Modifies the element string to conform to the current MathML spec (changes attributes) or returns an element string containing the filename, error message, and LaTeX string that caused the error.handle_error(data)
Records the error data in the instances list of errors and returns an element string containing the filename, error message, and LaTeX string that caused the error.
runner.py
Contains an example of how to use the driver on a
tree of HTML files.Example
This example parses each HTML file from a target directory. It assumes that some images (those with class="math">) contain alt text containing the LaTeX markup used to create the image.
For each image, it
- passes the LaTeX markup string to the Tralics driver to get the MathML version
- creates an element from that MathML string
- and replaces the original image element with the MathML element
- writes the new tree (a copy of the original) to a new directory
So at the beginning there was one HTML tree of files that uses images for math. At the end, the original directory is untouched, and there is an identical HTML directory that contains MathML elements instead of the original images.
from tralics_driver import driver
target_dir = '/some/dir/with/html_files'
mathml_root = '/another/place/to/write/new/html_files'
d = {
'type': 'text/javascript',
'src':'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=MML_HTMLorMML'
}
mathjax_elem = etree.Element('script', attrib=d)
t = driver.TralicsDriver()
for fname in [x for x in os.listdir(target_dir) if x.endswith('.html')]:
parser = etree.HTMLParser()
tree = etree.parse(os.path.join(target_dir, fname), parser)
head = tree.find('head')
head.append(mathjax_elem)
for image in tree.xpath('//img[@class="math"]'):
mtext = image.get('alt')
if mtext:
elemstring = t.convert(fname, mtext)
try:
math_elem = etree.fromstring(elemstring)
except etree.XMLSyntaxError as e:
print e
else:
image.getparent().replace(image, math_elem)
else:
print 'No alt text %s:%s' % (fname, image.get('src'))
with open(os.path.join(mathml_root, fname), 'wb') as f:
f.write('\n%s' % etree.tostring(tree.getroot(),
pretty_print=True,
encoding='UTF-8',
method='html'))