RDF vs control
Lately I’ve been struggling with a bit of a dilemma. I’m not sure there is an answer or even a consensus that it’s a problem, but I’ve been thinking about it nonetheless.
If we’re serious about bringing about the Semantic Web, there’s a couple of problems that we will have to contend with, some obvious, some less so.
The first issue we wil face is that liars, scam artists, advertisers, and zealots of various persuasions are going to start contaminating our machine-readable data. In other words, we need to find a reasonable, easy-to-implement solution to the trust problem before we are drowning in useless sea of data.
Currently, people who filter their RDF (if they do so at all) use blacklists, whitelists, or spam-processing code. But as the amount of machine-readable data reaches epic proportions, all of these mechanisms start to break down. We need to well and truly distribute the work and build the processing in at the parser level, or we will never get a handle on it. I mean, what good are software agents going to be if you ask them to restock the wine cabinet and they order herbal supplements?
Even assuming we can eliminate spam, there are other, more subtle problems that creep in. People will lie on their FOAF files (or even serve them up selectively) to attract potential dates or deflect attention. RDF feeds will end up carrying propoganda or advertisements. Wikipedi-type wars will rage (where two sides make contradictory assertions). Triplestores will fill up with inconsistent, misattributed data.
There’s also the issue of sensitive data. Personal information may be serialized into the wrong files. If your bot wrongly sucks up my tax ID number, how do I ask it to forget it or not disclose it? And if I can make that request, what keeps me from asking it forget or prevent disclosure of public information, like a Senator’s voting record?
Secrecy and privacy are already under serious threat due to data aggregation. What happens when an autonomous software agent discloses information under court seal? What happens when a computer intelligence is able to infer the identity of a protected witness or victim?
As long as “real” AI is still 20 years off, we can (and have) deferred thinking about these issues. But once we have powerful and reasonably autonomous reasoners harvesting triples and drawing conclusions, the data becomes a black box. We no longer keep track of where the data comes from, how connections are made, or get involved in weighting or filtering information. Instead, we start relying on the computer to do it for us - in fact, we need the machine to do the filtering because otherwise we end up completely overwhelmed by the mountains of data.
I can’t manually process the amount of spam or spam comments I get anymore. I get so much e-mail, I don’t even have time to manually sort it anymore; if I did, I’d never read it - as it is I only cope by scanning the subject lines in the pre-sorted folders. I need a general purpose AI available to me in the next 10 years, because I am barely keeping up with the things I care about as it is. I need people to start publishing machine-readable metadata, or they will become invisible to me. I need planet aggregators and categorized posts. Like most democratic citizens, I need information about various candidates summarized and their positions analyzed, because I don’t have the leisure time to sift through the raw material and cannot rely on the media to do so reliably anymore.
But that circles back to my original issues. How do I know which sources of information to trust? How do I track trustworthiness over time? How do I verify information? How do I detect and weed out mistakes and falsehoods? How do I know when to throw out my assumptions? How do I find bugs in a reasoning engine, and what do I do when multiple reasoners disagree?
Look - these are all the same issues we are struggling with in regards to people, and it’s silly to think we can solve it definitively anytime soon. Just - before we start relying too heavily on our software, we should build in what safeguards we can. I know one day the computer will know more than me, be able to reason more rigourously than I can, write and maintain programs that I couldn’t touch, and be off having refined conversations with other AIs (probably in some n3-derivative language). When that day comes, I want it to be able to exercise critical thinking and have some thought for humanity’s welfare.