RDF as database
Yesterday at a job interview, someone asked me about O/R mapping tools. It’s not really a problem I’ve actually given a lot of thought to recently, and I realized why this morning. I promise part 4 of the refactoring next post.
Relational databases are going to be dead in five years. OK, not really dead, everyone has way too much invested, too useful for small problems, etc. But the real action will be in RDF-as-database.
As others have pointed out over on Planet RDF, RDF can be seen as a database. You have primary keys (the rdf:ID) and foreign keys (rdf:resource) and lots and lots of relationships. But RDF is far more flexible, automatically distributed, and allows your data to become agile. Add to that multiple serialization formats, the ability to evolve ontologies and schemas, and the fact that you can use AI reasoners to do your data mining and deduce new properties and relationships, and we have a winner.
Now, this is a bold prediction, so let’s try and back it up with a little more detail.
One - XML allows for the natural expression of hierarchies and parts explosions; this is one of those things that relational models have trouble with. My little refactoring series hopefully shows that moving from pure XML to RDF is trivial.
As more and more of the world’s data moves towards XML, tools for indexing and querying it are becoming more and more powerful. These same tools can be used for RDF serialized as XML.
Two - Agile development methodologies are permeating languages such as C# and Python. But as our code becomes more flexible and sees higher rates of change, the relational models have difficulty keeping up. Refactoring the database is complicated by the general rigidity of the model, the need to transform data and stored procedures using DDL, and the difficulty in distributing changes.
RDF+OWL, on the other hand, makes it very easy to evolve your data. Data can be added and removed incrementally, the usual formats are text-based, meaning diff and patch and version control systems all play nicely with it, and the metadata becomes just more data.
Three - Currently, data warehousing and data mining require specialized tools and knowledge. RDF allows general-purpose reasoners to work with data anywhere on the web, extracting new relationships and spotting trends. Thanks to OWL, domain knowledge can now be written down in a form that allows a general-purpose program to make use of it - and that definition can also evolve over time.
Four - SPARQL is close enough to SQL that a lot of developers will be able transfer their hard-won skills to the new medium. Yes, you need to learn to think in triples, but if you already think in terms of JOINs that’s not a big leap.
Five - The triple translates nicely into relational space. One to three tables, depending on your favorite representation, a handful of stored procedures and you can use today’s engines. Serialize as XML and you can use tomorrow’s engines. And in a few years high-performance triple-based engines with native SPARQL support will be in wide use.
RDF has some natural advantages that make it contender. But it is OWL and SPARQL that give it real traction, because those allow the data people to transfer their current skills into the new space. And once you have your head around the RDF/OWL combination, you finally understand just enough to actually make effective use of an AI - maybe those will be a wide-spread reality this time around.