View on GitHub

Tralics driver

Driver for tralics to convert LaTeX snippets to MathML elements

Download this project as a .zip file Download this project as a tar.gz file

Overview

Driver for Tralics: Convert LaTeX math snippets to MathML elements

Note: Tralics is a LaTeX to XML Translator from http://www-sop.inria.fr/marelle/tralics
This is only a driver.

Requirements

What does it do?

The problem this project attempts to solve is when you have LaTeX math snippets (inline or block math) and you want to convert them to MathML.

For example, you can write code like the following. You write the function get_mathstring() yourself. See the file runner.py for details.


      from tralics_driver import driver

      math_elems = list()
      t = driver.TralicsDriver('/usr/local')
      for fname, mathstring in get_mathstring():
        elemstring = t.convert(fname, mathstring)
        math_elems.append(elemstring)

      if t.errors:
        print 'ERRORS:'
        print t.errors
      t.stop()
    

Assumptions

Project Layout

The code is structured as follows:

Example

This example parses each HTML file from a target directory. It assumes that some images (those with class="math">) contain alt text containing the LaTeX markup used to create the image.

For each image, it

So at the beginning there was one HTML tree of files that uses images for math. At the end, the original directory is untouched, and there is an identical HTML directory that contains MathML elements instead of the original images.

      from tralics_driver import driver
      target_dir = '/some/dir/with/html_files'
      mathml_root = '/another/place/to/write/new/html_files'
      d = {
        'type': 'text/javascript',
        'src':'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=MML_HTMLorMML'
        }
      mathjax_elem = etree.Element('script', attrib=d)

      t = driver.TralicsDriver()
      for fname in [x for x in os.listdir(target_dir) if x.endswith('.html')]:
        parser = etree.HTMLParser()
        tree = etree.parse(os.path.join(target_dir, fname), parser)
        head = tree.find('head')
        head.append(mathjax_elem)

        for image in tree.xpath('//img[@class="math"]'):
          mtext = image.get('alt')
          if mtext:
            elemstring = t.convert(fname, mtext)
            try:
              math_elem = etree.fromstring(elemstring)
            except etree.XMLSyntaxError as e:
              print e
            else:
              image.getparent().replace(image, math_elem)
          else:
            print 'No alt text %s:%s' % (fname, image.get('src'))

        with open(os.path.join(mathml_root, fname), 'wb') as f:
          f.write('\n%s' % etree.tostring(tree.getroot(),
                                                          pretty_print=True,
                                                          encoding='UTF-8',
                                                          method='html'))