This week in the MetaStore team we've added inferencing using an OWL 1 Ontology.
An ontology is a statement of what there is, and how things fit together. Eg,
* In Ingenta-land, there are Articles, they have Keywords and Authors, they are published in Journals. Every Book must have at least one Author.
* In My-lunch-land, there are sandwiches, chocolate bars, cups of tea. Every sandwich must have exactly two and only two slices of bread.
Inferencing, in this context, means guessing new facts, based on known facts, using logical rules in an ontology.
So. If we have a database with the fact that John Steinbeck wrote Sweet Thursday, and an ontology which says that being an author is the inverse of having an author, then a computer can, all on its own, reason that Sweet Thursday was written by John Steinbeck. Super eh! Hal here we come.
Here's the machine-readable version:
SweetThursday foaf:maker JohnSteinbeck .
JohnSteinbeck foaf:made SweetThursday .
At the beginning of this project, we were very excited about OWL. We planned to mine new information out of our scholarly research data set. For example, if Author Bob wrote article A, and article B, and Author Bill collaborated with Bob on C, and wrote D on his own, perhaps A and D are related. Or was that B and D? My brain hurts.. either way, you get the picture...
The problem with was, as usual with our project, scalability. The Jena inferencer choked at 11 million triples... 190 million away from our full load.
Last week, Priya and I came up with a practical solution to this problem: a two stage approach.
1. Guess what bit of the model you might be intersted in, and hold a bit of it in memory. (Techie detail: I implemented this using a SPARQL CONSTRUCT query like this, and stored them in a Jena model.)
2. Give that to the Jena Inferencer to chew on, instead of the big fat data set.
Obviously, the success of this approach depends on how good your guess is in 1.
So, last week, I knew that this article was written by J Bhatt; this week, I know that he wrote all these too. Last week, I knew that this article was about bananas. This week, I know that so are all these.
1. Web Ontology Language (There is some explanation to do with Winnie the Pooh for why it isn't 'WOL'.. basically, it just sounded foolish).