Thinking about RDF-lite
In a lot of ways, I think RDF finds itself in the same place SGML was a decade ago. It’s extremely powerful, poorly understood by developers in general, and mostly sees limited or extremely vertical application.
And I put that down to the same factors; it’s a little too difficult to write a consumer application. With SGML you had just a little too much wiggle room in the spec; as a result it was really hard to write a good parser. With RDF/OWL, the open world assumption and lack of a unique name assumption combined make it very difficult to write normal business apps.
So, I imagine for a moment that I am writing a RDF-lite spec. What needs to happen?
- Lists need to not suck. Keep your LISP out of my serialization format. Either let me specify that order matters in the schema (allowing my data to be implicitly ordered when serialized) or give the equivalent of xhtml:ol to order it with.
- Formally include provenance and temporal labelling in the model without requiring reification. There’s no reason I can’t have optional who and when parameters that default to “source document” and “now”, respectively.
- Following on the above add a unique name assumption to the model by default, and allow me to turn it off in my schemas (and/or override it in my reasoner).
- Add a closed world operator to the model that can be turned on in queries.
There’s probably more, but in my mind this stuff fixes a lot of problems that people run into when building real-world applications. Those who need to full expressive power can always use the full, more powerful model.
Lists we know are a problem because everybody and his brother keeps “fixing” RSS with proprietary extensions. The truth of the matter is that people find XML’s implicit ordering is too convenient to not use.
Per my previous post on context, it’s pretty clear that we need to track sources for all real applications, so we might as well add it to the model. And time is a big problem because nobody explicitly includes it in their ontologies until they’ve been burned by it. So add that to the model; it’ll be that much less painful when we discover that people can change their names over time.
The lack of a unique name assumption is really powerful, allowing us to infer all kinds of useful relationships. But it blows big holes in attempts to work with real-world data. Actually what we want to do in most domains is “assumes-true”; presume that names are unique unless explicitly told otherwise. This follows the priniciple of least surprise because this is what most of us do in reality.
The open world assumption is something you can’t realistically turn off for RDF as a whole; doing so effectively removes one of the biggest strengths. However, there are plenty of domains where reasoning in the context of a closed world assumption can produce material benefits; document validation immediately springs to mind.