Step 3 of our refactoring is moving from an XML parser to an RDF parser. This can get a bit tricky, so we’re going to a make a few assumptions to avoid outlining every possible scenario.
Let’s assume that your load/save logic is reasonably isolated, and looks something like this:
class order:
def load(self, node):
self.id = int(node.id)
self.customer = customer(node.customer)
self.product = [product(p) for p in node.product] # a list of products
Here we’re relying on something vaguely-DOMish to traverse down the XML tree and creating objects that correspond to the various elements and attributes we encounter. Type inference could easily happen if some sort of schema description existed.
Now, RDF parsers don’t use the DOM as such, because they may be pulling together information from multiple documents. Instead, most parsers use the triple store as their interface.
Conceptually, this is quite simple. Each object we’re going to create has an rdf:ID, and it may be serialized across multiple documents (using rdf:about to add to the original XML). The fields we are serializing are either references to other objects (rdf:resource links) or simple types (element/attribute values).
All you need to know for this step is that you can get the fields values by issuing a query - the rest will attend to itself. Since we no longer have a context node, the id will need to be passed in to our load function.
class order:
def load(self, store, id):
self.id = id #this will be the rdf id, so it is a URI like \"order#17\" instead of an int
self.customer = store.query(id, \"order.schema#customer\", None)
self.product = store.query(id, \"order.schema#product\", None) # a list of products
See how similar this is? Instead of assuming that the interesting elements/attributes are children of the current node, we ask the RDF parser for the children of the current node. This is why RDF can be distributed - the parser hides the fact that some of the data may once have lived in another document.
Now, I have deliberately left calling the object constructors to the store.query( ) method. Most RDF stores can use schema or type information and do the right thing. However, some stores cannot create objects and only return the rdf:ID of the children we are interested in.
In this case, the code becomes:
class order:
def load(self, store, id):
self.id = id #this will be the rdf id, so it is a URI like \"order#17\" instead of an int
self.customer = customer(store.query(id, \"order.schema#customer\", None))
self.product = [product(p) for p in store.query(id, \"order.schema#product\", None)] # a list of products
Which is even more like our original sample.
One final note - some RDF parsers may require your document’s root element to be rdf:RDF. Live with it or find a more liberal parser.
In part four we’ll get into that type information deferred from this step, show a helper class to create objects of the correct type, and start looking at the more interesting things we can do now that we are RDF-enabled.