I was at the first SemTechBiz conference in London recently. I'll come onto some of the detail in a moment, but here's a quick snapshot of the some of what's going on:
- Neil Wilson from The British Library spoke about releasing the British National Bibliography as Linked Data
- Jeni Tennison from The Stationery Office spoke about publishing government organograms as Linked Data (although I can't say I particularly cared for their visualisations), Amanda Spencer from the UK National Archives spoke about the role of Linked Data in the Government Web Archive and the National Health Service Innovation Centre spoke about using Linked Data in the analysis of their statistics
- A lot of the examples were actually in relation to BI or DSS applications, rather than to more obvious web applications. So Steve Harris from Garlik spoke about using RDF for their financial risk management application and the increased flexibility it gave them over a RDBMS approach, Olivieer de Duve from EarlyTracks spoke about categorising and filtering large quantities of financial services feeds and Baart van Leeuwen from Netage spoke about using semantic technologies to provide critical information to the Amsterdam Fire Service
- John Domingue of The Open University demonstrated semantic tagging of video content and how this could be used to drive advertising. They also demonstrated taking text snippets as a means of seeding a search for related content
- Marie Wallace from IBM talked about her work on sentiment analysis and mining Facebook and Twitter to uncover customer service and marketing data
- Carlo Trugenberger from InfoCodex talked about semantic classification of multi-lingual web content, with particular reference to his work on mining PubMed data on behalf of pharmaceutical clients.
Onto some of the detail then. Let's start with the presentation given by Madi Solomon from Pearson on how they are using concept extraction tools to auto-categorise their content assets, using mappings from existing taxonomies like DBpedia. The result of this project was that they gained a more more granular view of their content (down to the level of individual images) and understand better relationships between content items (for example, understanding what educational content they had that fitted into certain subjects and certain school learning objectives for different age groups). This gives them a new platform to create bundles, enhanced materials and apps from which new business models could be derived.
The next two presentations that leapt out were from John O'Donovan from the Press Association and Jem Rayfield of the BBC. Jem spoke about how they wanted to create a website for the World Cup that had individual pages by player and team - the problem being that this would place an enormous burden on their content editors, who would have had to manually assign each article to the appropriate subset of the 776 webpages this would create. The solution was to create an ontology around these concepts and house it in a RDF Triplestore that would automatically publish articles to the appropriate pages, an approach that both reduced the internal content management overheads and increased site usage given how it appealed to the things their audience were really interested in. As you'd expect, this had a lot of resonance for me as I routinely see search queries in our client's sites that corresponds precisely to this sort of ontolological approach; about certain types of industry or economic concepts in certain countries, or certain types of structures built in certain types of environment. An approach that talks about these sorts of realworld concepts maps best onto best practice for both user experience and search engine optimisation.
My counterpart at Epimorphics, Ian Dickinson, gave a presentation about semantic user experience patterns. Most of this chimed very directly with my own thoughts on the matter. Ian correctly noted that semantic approaches can lead to a much more fluid user experience that needs to be reconciled with the simplicity and coherence of traditional information architecture. Some of his suggestions for this included mapping free form tags onto existing shared vocabularies (Commontag looks like one way to do this and it would be nice to see more happening with it).
There were a couple of areas I would have liked to have heard more about. Foremost amongst these is Facebook new version of Open Graph that facilitates what they call 'frictionless sharing' (i.e the automated recording and mining of a user's total browsing history across any site that supports the protocol). This does map onto a lot of what we've recognised about the semantic web i.e. that it's not enough to have detailed knowledge of the site's content without being able to filter through a lens of a user's preferences and interests. Also, given Facebook's continued strong support for RDFa, it would have been useful to have explored in more detail how that is likely to impact on Google and Microsoft's support for the schema.org microdata standard. The only speaker to really engage with this in any detail was Martin Hepp, creator of the GoodRelations ecommerce vocabulary. Martin spoke of how the web can in many respects be regarded as a giant data shredder; companies typically do hold data in structured data but web publication essentially means releasing it was as unstructured text, and noted that GoodRelations is in the process of being aligned with the schema.org microdata standard. This probably makes sense in a SEO context where a lightweight model for surfacing relatively simple concepts is paramount and there is no need to generate either rich ontologies or any form of detailed taxonomies. [Editor's note: Martin got in touch to clarify some of the statements made about GoodRelations which you can see in the comments box below.]
Ecommerce is certainly an obvious use case here, as are corporate websites. Conversely, sectors like academic publishing, enterprise document management, government data, financial services and news sites will inevitably need a more heavweight approach, with data being exported to a variety of formats including RDFa and Microdata. For example, Jeni Tenison of The Stationery Office has been looking at options for combining RDFa and Microdata vocabularies so that dual export options can be offered, building on the existing set of mappings between RDFa and Microdata from schema.rdfs.org.