URIs are essential
At first, I wasn’t sure what point this post from Phil Dawes was trying to make. Was he saying that local identifiers scale?
So, I had a bit of think on it, and concluded that he’s not making the argument he thinks he’s making.
His assertion that some properties are time dependent does not logically mean that global identifiers don’t scale. It means that he does need to think more about his model; the same would be true in a SQL database or OO program.
Ways to improve his model abound in RDF; let’s look at a few:
- He could design a more specific URI. He alluded to this; it means a URI like id:2006/03/07/philDawes
- He could add a date range to his model to indicate when the value was known to be valid.
- He could actually use the context information available in most RDF stores.
Next he makes the point that amibugity is inevitable; I agree with this because ambiguity is a feature of the RDF model; an extremely desirable one that reflects the messy reality of communication.
The fact is that a URI like id:PhilDawes maps unambiguously to a specific individual, rather than any of the thousands of people that share that name (although a real URI based on his email or domain would be a better example URI). Sure, I can say ambiguous things about him, but that’s a function of the vocabulary I choose for properties.
I think the blind men and the elephant is a good example here. The blind men agree they are describing the same creature, but they have different information available and therefore conflicting accounts. If they each publish an RDF document describing the elephant, we have one creature (the elephant’s URI) with multiple values for some properties and presumably divergent properties. Untangling this mess is left to the hapless user.
HOWEVER, the beauty of RDF is that the sighted man can come along and create another RDF document, and use the same URI to describe the elephant, resolving the conflict. He can also look at the existing properties and create a unified, consistent description using a standard ontology; which uses URIs to unambiguously identify the properties, which the OWL developer can use in his inference rules…
As for Phil’s belief that you can use database primary keys without worrying about global namespace collision…how do you do that again when a record has different ids in different databases? Oh, you prepend a namespace so I know which database I’m talking about? And you map between the IDs by creating a dictionary (see OWL again for functional propeerties, equivalence, etc)?
All that said, Phil is right when says you need to track context - see my previous post. But that doesn’t lessen the usefulness of the global identifier; if anything, that global identifier makes it far easier to spot issues with data that has come from multiple places.
So to get back on message - Phil makes a very good argument that context is required when working with RDF. Stores that throw away information on the origin of data make it hard to work with data. But he doesn’t make a good case for his contention that global identifiers don’t scale.