Implementing VTD-XML in Python
I’ve been making decent progress in my implementation of VTD-XML.
Currently I can do the following:
- Auto-detect UTF-8 and UTF-16, switch encoding when declaration found.
- Parse all entity types except PIs.
- Match elements on name and/or namespace.
- Navigate through elements: go to root, parent, first child, next sibling. That’s enough to evaluate 9 of 13 XPath axes.
- Correctly execute two of the four examples included in the Java package.
- Get the first text or CDATA child of an element.
Major pieces still missing:
- Can’t enumerate attributes or their values (they’re parsed, just not available through API yet).
- Can’t handle mixed content gracefully.
- Not yet correctly enforcing well-formedness.
- No pythonic interfaces yet - this will be needed for the “real” API.
- No performance metrics yet - this is really needed to determine if the implementation is compelling.
The code also needs actual unit tests instead of relying on the examples, and I need to look at Uche’s Python and XML torture tests for more useful examples and API idioms.
February 6th, 2006 at 9:20 pm
XimpleWare just released a new version of VTD-XML. The improvements of this
version are
* Rewrote the core parsing routine for modularity and improved performance
* Significantly improved XPath Evaluation performance
* Increased maximum UTF-8 document size to 2GB (w/o namespace)
* Added Buffer reuse option to further improve core XML parsing performance
* Various bug fixes and code quality enhancement
I would like to personally invite you to take a look at the new
release and welcome any suggestions.
Cheers,
Jimmy Zhang