Visions of Aestia

20 Mar 2006

Why we need explicit temporal labelling

Filed under: PlanetRDF — JBowtie @ 11:50 am

Some good feedback on my rdf-lite post from Jeen Broekstra and Seth Ladd.

In Jeen says:

Anyway, whether the solution works well in all cases or not (and I’m sure there are other modeling solutions possible for the example I just gave), a tacit assumption in general seems to be that in order to incorporate both provenance and time in the RDF model, we need to extend from triples (subject, predicate, object) to not just quads but quints (subject, predicate, object, source, time) or something like that.

Perhaps I should have been more explicit. The RDF model already allows for reification to allow us to make assertions about a statement; and certainly most stores provide a context that can be leveraged for this information. I’m not saying we need to extend the model for these reasons.

The reason I asked for temporal labelling is that in the real world, no-one explicitly models time intervals for all properties; yet almost all data actually varies over time.

For example, consider dc:title. Roughly 99.9% of the time, we have a triple like so:

:article dc:title "I like Cheeses"

However, on the web, titles change all the time. I may look at that article tomorrow and see:

:article dc:title "I like Cheese"

In the current model, I would end up with two titles for this article. While technically correct, it is intuitively wrong - and that difference is what holds back RDF for most developers. They expect to see a single title with the updated value.

In the real world, people do not update their models when data starts changing. They update their document instances to have the new, current values. That’s why developers need version control and that is what all RDF consumers really need to handle.

What I’m suggesting is that we build versioning of statements directly into the model. That we make it easy to say:

:article dc:title "I like Cheeses" [-'19-Mar-2006']
:article dc:title “I like Cheese” [’20-Mar-2006′-]

I know that from a technical point of view this is not needed. I’m saying from the point of view of a real-world developer this kind of addition makes it far more simple to correctly specify the kind of data that consumers need to derive intuitively correct entailments. It’s an artificial constraint on the logical model to make the whole thing more tractable for non-KR people. If you combine this with the suggestions in the temporal RDF paper (PDF), you get better data, more completely specified. You replace the need to explicitly model changes over time (as Seth seems to suggest) and/or manage contextual data with the ability to temporally constrain the scope of an assertion.

Time affects every ontology in unexpected ways. Even the sample wine ontology is making assertions that can change over time; the assertions about what is a French wine changes over time because political borders are not static. And the ontology as written doesn’t account for the fact that corrections make take place or mistakes make (oops, this year’s batch of chardonnay is actually a mislabeled Reisling). For an RDF-lite I want a super-simple way to fix this; and the simplest way I can think of is to build date ranges into the basic model.

Powered by WordPress