This week we piece together expert opinion on the latent potential of the semantic web, looking at the barriers to progress and which experts are consistently creating ways to overcome them.

In a recent review, Jack Stilgoe documents Michael Nielsen’s frustrations at what he sees as the inequitable censorship of the semantic web, which still favours internet juggernauts such as Google, over information scientists and medical researchers. Meanwhile, Steve Hamby provides a low-down on three technologies (Semantic Technology, Cloud, and Natural Language Processing) and John P. Mello Jr. explains how the semantic web can lay siege to the army of paid posters which he contends are ‘poisoning the internet’.

Reinventing Discovery by Michael Nielsen – review (The Guardian)

Nielsen is one of a growing band who believe that there is a mine of untapped knowledge online. He describes the potential of the “semantic web”, built from data rather than words, and explains how information scientists are starting to detect new patterns in this data. This is how Don Swanson, with no medical training, discovered a link between migraines and magnesium. It is how Google is able to track the spread of flu by analysing search terms and to translate our web pages using its vast quantities of linguistic data…”We have an opportunity to change the way knowledge is constructed. But the scientific community, which ought to be in the vanguard, is instead bringing up the rear.” His prescription is pragmatic, more carrot than stick.

Top Three Technologies to Tame the Big Data Beast (The Huffington Post)

The first technology needed to tame Big Data — derived from the “memex” concept — is semantic technology, which loosely implements the concept of associative indexing. Dr. Vannevar Bush’s is generally considered the godfather of hypertext based on the associative indexing concept. His 1945 article in The Atlantic Monthly“As We May Think,” was published as one of the first articles addressing Big Data, information overload, or the “growing mountain of research.”  …The Semantic Web, paraphrased from a definition by the World Wide Web Consortium (W3C), extends hyperlinked Web pages by adding machine-readable metadata about the Web page, including relationships across Web pages, thus allowing machine agents to process the hyperlinks automatically. The W3C provides a series of standards to implement the Semantic Web, such as Web Ontology Language (OWL)Resource Description Framework (RDF)Rule Interchange Format (RIF), and several others.

Based on two recent IBM articles derived from their CIO Survey, one in three CIOs make decisions based on untrusted data; one in two feel they do not have the data they need to make an informed decision; and 83 percent cite better analytics as a top concern. A recent survey conducted for MarkLogic asserts that 35 percent of respondents believe their unstructured data sources will surpass their structured data sources in size in the next 36 months, while 86 percent of respondents claim that unstructured data is important to their organization. The survey further asserts that only 11 percent of those that consider unstructured data important have an infrastructure that addresses unstructured data.

Research: Paid Posters Poison the Internet (PC World)

In China, they’re called the Internet Water Army: legions of people paid to flood online hangouts with postings and comments (primarily for marketing purposes). And according to academic researchers, they’re degrading information quality on the Web…Detecting infiltration of a website by paid posters with automated systems that use semantic analysis can be effective, the researchers noted. “The reason why the semantic analysis improves performance is that online paid posters often try to post many comments with some minor edits on each post, leading to similar sentences,” they explained. “This helps the paid posters post many comments and complete their assignments quickly, but also helps our classifier to detect them.”