A few weeks ago, Google, Bing and Yahoo unveiled a new standard for making webpage data more easily interpretable. Their structured data standard certainly offers a significant jump from a context where their engines can match the keywords contained in documents and rank them according to metrics like link and keyword density, to one where the engines are beginning to understand what the documents are actually about.

To illustrate, their standard can be used to encode values like publisher, author, date, type of content (e.g. is it a scholarly article, a podcast, a news article, a map, a review, a video or an image), type of webpage (e.g. is it an image gallery, an about page, a checkout page, a search results page or a contact page), type of webpage element (e.g. header, footer, navigation, advertising) type of organisation, types of place as well as other things like price offers, reviews, ratings or addresses. You an even mark up things like whether an image is a painting or a photograph. The result of all of this is that the search engines will now be able to tell you who wrote an article, what sort of article it is, when it was published and what it's about before you even leave the search results page.

Which is fine as far as it goes. The problem is that is also represents a volte-face from the standards previously used by Google and Yahoo in particular and from the official W3C specification. RDF has long been the W3C's preferred route to marking up semantic elements and RDFa accordingly formed the basis of Google's previous rich snippets (and Facebook's Like button for that matter), but the new standard has instead fast-tracked a draft element of the HTML 5 microdata specification into production. The W3C were apparently not consulted about this. This is what the three search engines had to say:

"Focusing on microdata was a pragmatic decision. Supporting multiple syntaxes makes documentation for webmasters more complex and introduces more overhead in terms of defining new formats. Microformats are concise and easy to understand, but they don't offer an open extensibility mechanism and the reuse of the class tag can cause conflicts with website CSS. RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption. Microdata is the most recent well-known standard, created along with HTML5. It strikes a balance between extensibility and simplicity, and is most suitable for building the schema.org. Google and Yahoo! have in the past supported both microformats and RDFa for certain schemas and will continue to support these syntaxes for those schemas. We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes."

The counter-balancing argument to this is that structured data lacks scalability; you can use it to tell search engines relatively simple things but it's difficult to use it to build rich and detailed taxonomies of the kind the publishers we deal with are increasingly interested in; which is precisely why the W3C held back from incorporating microdata in their released specification for HTML5 and why a debate has now ensued as to whether structured data should be regarded as something that will accelerate the development of the semantic web or retard it. What it does clearly mean is that the use of structured data is going to become critical to search engine optimisation. Beyond that, time will tell.

What do you think of this development? Let us know your thoughts in the comment box below.

HTML5 logo

Update: Ingenta will be attending two semantic web events in New York & London in September, organised by Mediabistro.   They both promise to provide a great overview of the applications of   the semantic web for businesses such as publishers, and we’ve included   more information below. Hope to see you there.

Semantic Web Media Summit
The Semantic Web is here and revolutionizing the media industry! Join us   at Semantic Web Media Summit, September 14 in New York City and learn   how the Semantic Web is changing media production, consumption, and   monetization. The event gathers semantic technology and media experts   including Mike Dunn (Hearst Interactive Media), Rachel Lovinger   (Razorfish), Evan Sandhaus (The New York Times Company), and Mike Petit   (OpenAmplify) who will share how the Semantic Web works and what it is   doing to transform the media business.

Semantic Tech & Business Conference
Semantic Web Technologies are being used today and creating new   opportunities to revamp and build your business. Don’t miss your chance   to get ahead of the competition! The Semantic Tech and Business   Conference will be held in London on 26-27 September 2011 and will take a   look into how companies are successfully integrating semantic   technologies and linked open data into their business plans. With two   tracks over two days, business and technology experts will explain the   inner workings of the Semantic Web and how you can take advantage of it   in your enterprise and web-based systems.