Visions of Aestia

05 Apr 2005

Implementing VTD-XML in Python

Filed under: General, Python, XML — JBowtie @ 11:13 am

I’ve been making decent progress in my implementation of VTD-XML.

Currently I can do the following:

Auto-detect UTF-8 and UTF-16, switch encoding when declaration found.
Parse all entity types except PIs.
Match elements on name and/or namespace.
Navigate through elements: go to root, parent, first child, next sibling. That’s enough to evaluate 9 of 13 XPath axes.
Correctly execute two of the four examples included in the Java package.
Get the first text or CDATA child of an element.

Major pieces still missing:

Can’t enumerate attributes or their values (they’re parsed, just not available through API yet).
Can’t handle mixed content gracefully.
Not yet correctly enforcing well-formedness.
No pythonic interfaces yet - this will be needed for the “real” API.
No performance metrics yet - this is really needed to determine if the implementation is compelling.

The code also needs actual unit tests instead of relying on the examples, and I need to look at Uche’s Python and XML torture tests for more useful examples and API idioms.

One Response to “Implementing VTD-XML in Python”

Jimmy Zhang Says:
February 6th, 2006 at 9:20 pm
XimpleWare just released a new version of VTD-XML. The improvements of this
version are

* Rewrote the core parsing routine for modularity and improved performance
* Significantly improved XPath Evaluation performance
* Increased maximum UTF-8 document size to 2GB (w/o namespace)
* Added Buffer reuse option to further improve core XML parsing performance
* Various bug fixes and code quality enhancement

I would like to personally invite you to take a look at the new
release and welcome any suggestions.

Cheers,
Jimmy Zhang

M	T	W	T	F	S	S
« Mar				May »
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

05 Apr 2005

Implementing VTD-XML in Python

One Response to “Implementing VTD-XML in Python”

Leave a Reply

Links

Programming

Role-playing