Visions of Aestia

17 Jan 2005

Refactoring to RDF, step 3

Filed under: Programming, PlanetRDF — JBowtie @ 2:35 pm

Step 3 of our refactoring is moving from an XML parser to an RDF parser. This can get a bit tricky, so we’re going to a make a few assumptions to avoid outlining every possible scenario.

Let’s assume that your load/save logic is reasonably isolated, and looks something like this:

class order:
    def load(self, node):
        self.id = int(node.id)
        self.customer = customer(node.customer)
        self.product = [product(p) for p in node.product] # a list of products

Here we’re relying on something vaguely-DOMish to traverse down the XML tree and creating objects that correspond to the various elements and attributes we encounter. Type inference could easily happen if some sort of schema description existed.

Now, RDF parsers don’t use the DOM as such, because they may be pulling together information from multiple documents. Instead, most parsers use the triple store as their interface.
Conceptually, this is quite simple. Each object we’re going to create has an rdf:ID, and it may be serialized across multiple documents (using rdf:about to add to the original XML). The fields we are serializing are either references to other objects (rdf:resource links) or simple types (element/attribute values).
All you need to know for this step is that you can get the fields values by issuing a query - the rest will attend to itself. Since we no longer have a context node, the id will need to be passed in to our load function.

class order:
    def load(self, store, id):
        self.id = id #this will be the rdf id, so it is a URI like \"order#17\" instead of an int
        self.customer = store.query(id, \"order.schema#customer\", None)
        self.product = store.query(id, \"order.schema#product\", None) # a list of products

See how similar this is? Instead of assuming that the interesting elements/attributes are children of the current node, we ask the RDF parser for the children of the current node. This is why RDF can be distributed - the parser hides the fact that some of the data may once have lived in another document.
Now, I have deliberately left calling the object constructors to the store.query( ) method. Most RDF stores can use schema or type information and do the right thing. However, some stores cannot create objects and only return the rdf:ID of the children we are interested in.
In this case, the code becomes:

class order:
    def load(self, store, id):
        self.id = id #this will be the rdf id, so it is a URI like \"order#17\" instead of an int
        self.customer = customer(store.query(id, \"order.schema#customer\", None))
        self.product = [product(p) for p in store.query(id, \"order.schema#product\", None)] # a list of products

Which is even more like our original sample.

One final note - some RDF parsers may require your document’s root element to be rdf:RDF. Live with it or find a more liberal parser.

In part four we’ll get into that type information deferred from this step, show a helper class to create objects of the correct type, and start looking at the more interesting things we can do now that we are RDF-enabled.

Leave a Reply

Refactoring to RDF, step 2

Filed under: Programming, PlanetRDF — JBowtie @ 11:26 am

You might want to review Part 1 before proceeding.

Our document fragment last looked like this:

<customer rdf:ID=\"customer#12″>
  <name>John Smith</name>
</customer>
<order rdf:ID=\"order#17″>
  <customer rdf:resource=\"customer#12″ />
  <product rdf:resource=\"product#45″ />
</order>

Remember, this is a fragment. It assumes there is a root element declaring the RDF namspace.

Step 2 of our refactoring is to add RDF-style namespaces. Technically we do not need to do this, and there are plenty of existing RDF namespaces you might want to use, including FOAF, RSS, and Dublin Core.

An RDF-style namespace is just like any other XML namespace, except by convention it ends with a hash(#) sign or slash(/).

<o:customer rdf:ID=\"customer#12″ xmlns:o=\"http://example.org/order.schema#\">

Just in case it hasn’t clicked, the reason for this convention is to allow us to write RDF that describes the elements. When we combine the namespace and local name, we get an RDF ID - now we can write our schema in RDF!

<rdfs:Class rdf:ID=\"http://example.org/order.schema#customer\"
    xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\">
  <rdf:type rdf:resource=\"http://xmlns.com/foaf/0.1/Person\">
</rdfs:Class>

You’ll note both styles of namespace here - the RDF Schema namespace ends with a hash sign, while the FOAF namespace (http://xmlns.com/foaf/0.1/) ends with a slash. This example says that the customer element has type foaf:Person.

Actually specifying and using type information is something we’ll cover in step 3; for now the important thing is that we’ve added a namespace. Any existing serialization code will need to be updated to handle the namespace. Here’s our fleshed out sample showing the root element.

<root xmlns=\"http://example.org/order.schema#\"
      xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">
    <customer rdf:ID=\"customer#12″>
      <name>John Smith</name>
    </customer>
    <order rdf:ID=\"order#17″>
      <customer rdf:resource=\"customer#12″ />
      <product rdf:resource=\"product#45″ />
    </order>
</root>

To recap:

  • Step 0: Make sure your XML parser understands namespaces
  • Step 1: Replace any existing ID values in your XML with RDF-specific IDs. That is, the value becomes a URI and the attribute becomes rdf:ID, rdf:resource, or rdf:about.
  • Step 2: Add RDF-style namespaces. The namespace URIs should end in a hash(#) or slash(/). All elements and attributes should be in an RDF-style namespace.

Leave a Reply

Powered by WordPress