Visions of Aestia

05 Apr 2005

Implementing VTD-XML in Python

Filed under: General, Python, XML — JBowtie @ 11:13 am

I’ve been making decent progress in my implementation of VTD-XML.

Currently I can do the following:

  • Auto-detect UTF-8 and UTF-16, switch encoding when declaration found.
  • Parse all entity types except PIs.
  • Match elements on name and/or namespace.
  • Navigate through elements: go to root, parent, first child, next sibling. That’s enough to evaluate 9 of 13 XPath axes.
  • Correctly execute two of the four examples included in the Java package.
  • Get the first text or CDATA child of an element.

Major pieces still missing:

  • Can’t enumerate attributes or their values (they’re parsed, just not available through API yet).
  • Can’t handle mixed content gracefully.
  • Not yet correctly enforcing well-formedness.
  • No pythonic interfaces yet - this will be needed for the “real” API.
  • No performance metrics yet - this is really needed to determine if the implementation is compelling.

The code also needs actual unit tests instead of relying on the examples, and I need to look at Uche’s Python and XML torture tests for more useful examples and API idioms.

One Response to “Implementing VTD-XML in Python”

  1. Jimmy Zhang Says:

    XimpleWare just released a new version of VTD-XML. The improvements of this
    version are

    * Rewrote the core parsing routine for modularity and improved performance
    * Significantly improved XPath Evaluation performance
    * Increased maximum UTF-8 document size to 2GB (w/o namespace)
    * Added Buffer reuse option to further improve core XML parsing performance
    * Various bug fixes and code quality enhancement

    I would like to personally invite you to take a look at the new
    release and welcome any suggestions.

    Cheers,
    Jimmy Zhang

Leave a Reply

Powered by WordPress