By Byron Russell

Long before ‘to google’ became a verb, the indexing and search of web content was somewhat random – you had to know the exact wording of a website’s title to find what you were looking for. One of the first search engines, Archie (1990) was basically a directory – a curated FTP site hosting an index of downloadable directory listings.

Jump forward to 2016, and web exploration in the US and Europe is dominated by Google/ Google Scholar, Bing and to a lesser degree Yahoo – all basing searches on indexing and crawling, and all performing three basic tasks:

  • Searching the Internet on the basis of keywords
  • Maintaining an index of the words they find, and where they find them
  • Allowing users to look for words or word combinations found in that index

As searches are centred on words – and not IP licence type, a problem is immediately created for providers of Open Access content. Ask any Open Access publisher, and they will tell you that one of their biggest challenges – if not the biggest – is discoverability. And here the most popular search engines are only partially helpful.

Let’s take an example. Trying to find scientific articles for “Mendelian dominance” is easy. Trying to find which of those articles is Open Access is a great deal trickier. Run a Google search on “Mendelian dominance open access” and the first two hits are for one publisher – the OMICS Group. And that doesn’t tell you what kind of Open Access licence the content is published under. Changing the search to “Mendelian dominance CC-BY”, and the first hit takes you to Wikipedia. Google Scholar itself enables you to specify search by article / case law, author, publisher, language and date and so on – but not IP licence type.


If a researcher is specifically looking for Open Access content, as will increasingly be the case, they can of course go to a directory (Archie again!) such as DOAJ, but that is far from exhaustive and is not even fully searchable – it lists over 10,700 journal records, but only 6,800 are searchable at article level.

Even the broader classification of Open Access is unclear. The Creative Commons definition of Open Access in the sense of “non-commercial” is fuzzy – the website defines it as: “[not] primarily intended for or directed toward commercial advantage or private monetary compensation”. The 2009 Creative Commons report on defining non-commercial alone runs to 119 pages. Is an article published under a CC-BY-NC-ND license as “open” as one published under a CC-BY-NC-SA licence? And how is any search engine to identify, differentiate and flag between these various licences?

So the challenge inherent with the discoverability of any Open Access content lies in the complexity of its own licensing arrangements. There are no fewer than 30 (six licence types each with five versions) plus edge cases, ranging from the most typical CC-BY to CC-BY-NC-SA. Add to this the different modes of access (Gold, Green or self-archiving, and delayed, moving-wall Open Access), not to mention hybrid (mixed subscribed and Open Access content) and flipped journals (journals converting from subscriptions to Open Access) and it is little wonder the web’s spiders are in a spin.

Open Access journals continue to grow rapidly – according to the Mare and Wabe’s STM Report of March 2015, Gold Open Access (including without APC) now represents 11-12 per cent of scientific articles, Green at least another 12 per cent, and delayed access perhaps another 5 per cent. Despite this rapid expansion, researchers still have to really dig around before they find what they’re looking for in a fully and freely accessible, re-usable form; Open Access has yet to have its own Google moment.

 Byron Russell
About Byron Russell

As Head of ingentaconnect, Byron provides overall leadership and management of the commercial activities for our publisher-facing CMS product, ingentaconnect, spanning 280+ publisher clients and over 25,000 registered libraries. As a senior manager within the Group Sales and Marketing Division, Byron is responsible for the business development of the ingentaconnect service and for managing its Account Management and Client Support teams in the US and UK. Byron has an extensive background in publishing management, primarily in the education space, and was actively engaged in the business development of Macmillan’s class-leading English Campus and Discover China programmes and CUP’s English360 LMS prior to joining Ingenta.