Essential RDF context
I always have too many projects on; today is no different. In addition to my semantic wiki prototype, I’m also working on a little app I’m calling Diogenes (for an XTech 2006 presentation I probably won’t be giving, as they seem to have far too few speaker slots).
Diogenes uses assumption-based logic and the GnuPG web of trust to try and determine the truth of RDF assertions. And that ties into today’s topic, because these assertions come from various sources.
Right now, I suspect most reasoners out there pretty blindly accept any given RDF. And there are plenty of places that don’t even process the RDF; they simply aggregate it for further distribution.
Unfortunately, that’s not really useful at all for real applications that are going to get popular. As soon as you move into the distributed world, you have all kinds of liars and hackers giving you mistaken, inconsistent, misleading, and/or outright wrong data. So at minimum we need to track the source of a given statement, if only so we can blacklist them in the future.
So, what do I consider an essential context for statements in an RDF store?
- The source - who made the assertion? Note that a reasoner is its own source for inferred statements.
- Timestamp - when was the assertion made? Because people edit pages, correct data, and sell domains. Here I mean the moment the data was picked up for entry into the store, not the actual authoring date (though, if available, that might be interesting too).
- Trust - well, this is still being researched, but raw info such whether the data was signed, over a secure link, or encrypted can be helpful when trying to figure out why your wine agent ordered spam-pill-of-the-month instead of a nice merlot.
Now, there’s no reason that these values couldn’t be expressed as triples, and in fact I would also expect to be able to export the information as triples for reporting or aggregation. But since we need them for every statement, it makes more sense to make them properties recorded by the store.
The reason I call them “essential” is that I think any RDF application needs to record these values. Once data is in the store and you start reasoning with them, these values will be the only reliable way to correct issues without dumping and recreating the whole store.
Finally, any advanced reasoners will want to include “truth value” as part of the context. Values Diogenes assigns are ‘unknown’, ‘true’, ‘false’, ‘assumes-true’, and ‘assumes-false’. A fuzzy-logic reasoner would probably have assign values between 0 and 1. Keeping false and assumes-false data in the store can be handy for certains kinds of constraints and proofs. More research needed.
March 8th, 2006 at 7:36 pm
[…] Essential RDF context is now required reading for anyone building the web of trust. Sometimes I feel the issue of who created the assertion is quietly swept under the rug. That information is extremely important, for I would want to tell my reasoner to trust statements about diseases from the Deptment of Health over Joe Bob any day. […]
March 17th, 2006 at 1:48 pm
[…] Per my previous post on context, it’s pretty clear that we need to track sources for all real applications, so we might as well add it to the model. And time is a big problem because nobody explicitly includes it in their ontologies until they’ve been burned by it. So add that to the model; it’ll be that much less painful when we discover that people can change their names over time. […]