The latest interviewee in our What is a Publisher Now? series is Gemma Hersh, Policy Director of academic publishing giant Elsevier. Next month Gemma will appear at the ALPSP International Conference in a session called 'Welcoming the Robots', in which she will discuss the currently hot topic of data mining. Gemma was instrumental in Elsevier's recent launch of its new policy on data mining. Ahead of ALPSP we caught up with her to learn more about data mining, wits potential importance and what publishers are doing to address it.

1. Let’s imagine we just met. Tell us a bit about you and what you do for Elsevier.

I am Policy Director at Elsevier, which means I am part of the Access & Policy team (formerly known as Universal Access) and looking at the company’s corporate policy in areas such as open access, text and data mining, research data, MOOCs and a whole host of other bits and pieces. I look at the policy landscape and assess what policies we need to develop or continue to evolve to ensure we continue to serve our customers effectively.

2. You came to Elsevier after working in government and then at the Publishers’ Association. Why does a company like Elsevier need someone with your skills and expertise?

Elsevier needs people who can build strong relationships with policy makers at a global level, who can understand and engage effectively with this community and with policy makers in universities and libraries and translate what is happening on the external policy front into meaningful business terms. We need to be able to understand different perspectives, and to constantly be thinking about how policies are shaped, how they might need to evolve, and how we can continue to assist the research community. It is really important that we are able to communicate our point of view clearly, and to listen to feedback from others, to come to a position that works for everyone. That’s what I’m here to try and do: listen, communicate, understand and analyze, in support of our wider goal to support researchers and further science.

3. Since you first joined the initiative you’ve been involved is launching Elsevier’s new policy on text mining. Can you give us a précis of what text mining is, and why publishers need a policy on it?

Text mining is a relatively new but niche research tool, that helps researchers make sense of vast quantities of information. Text and Data Mining (TDM) can be applied to find new patterns in the data, or to answer specific research questions, or simply to pull out specific words or details very quickly from a massive amount of information. Text mining involves the application of a tool to research content. Before this tool is applied, researchers need access to the material they wish to mine; as a first step, this has to be downloaded in bulk and put into the right format to enable text mining to take place.

Publishers first and foremost want to make sure that they are meeting the needs of researchers. In Elsevier’s case, and thinking specifically about TDM, this means making sure researchers have a great experience whenever they use our platforms, for whatever research purpose they have. So, we want to make sure that when a researcher is downloading vast quantities of articles for text mining, that someone else reading that same content doesn’t have their service altered in any way. That’s why we have invested in a special platform specifically for text mining – and have set up all elements of our policy in such a way so as to support researchers (more on this later, in response to other questions).

It is also important to bear in mind that the legal landscape for TDM is changing and sometimes unclear. Elsevier has taken the position that text mining should be made available to researchers for non-commercial research purposes and at no extra charge, because we think this is the right thing to do in support of science and research. Because of the vast amounts of copyright material being downloaded and then mined, and the fact that copyright law differs globally, a license-based approach enables us to provide clarity to researchers about how they can make the most out of our research; what they can and cannot do with content; and gives us the confidence that content is being used for bona fide research purposes.

4. Why do researchers want to mine the content that Elsevier publishes?

ScienceDirect offers journal articles and book chapters from nearly 2,500 journals and 26,000 books – 11 million pieces of content. You may also have seen that the 61% of our journal impact factors increased between 2012 and 2013 and that many of our journals occupy the top position by impact factor in their subject category. In other words, we have a wealth of high quality content that researchers want to use to further their research. However, we recognize that a lot of researchers want to mine across publishers and that’s why we are proud to be one of the first signatory members of CrossRef Text and Data mining, which enables researchers to access and mine content from a number of publishers, through a bespoke CrossRef API.

5. Now that they’re able to do it, what might they be able to achieve? Could it lead to some startling new discoveries?

The researchers I’ve spoken to have used TDM to further their research in lots of different ways.  For example, was developed using our ScienceDirect API for TDM.  However, TDM does still seem to be quite a niche activity, with other researchers focusing more on research data and how the access, discovery, sharing and use of raw research data can have a transformative impact on their work.

6. Your solution is API-based. Why did Elsevier choose this solution over the other options?

An API is a common means of delivering vast quantities of content without compromising the stability of a platform for other uses. That means an API is perfect for text and data mining , as TDM requires the downloading of vast quantities of content on which TDM tools will be applied. An API also has other technical features that are useful for developers and which someone far more technically-minded than I could explain better, but which also in turn allows us to add some additional features for researchers. For example, we make content delivered through our API available in XML format – the preferred format for TDM – and we provide each user/developer with an API key, so we can offer then one-to-one support if they need it.

7. At the ALPSP conference this year you’re going to talk about a potential mismatch between the rhetoric and reality of text mining. We’re told that demand to be able to mine scholarly content is huge but publishers are actually reporting low uptake. Has that been your experience?

Yes. TDM is a niche activity, but the rhetoric around TDM is both heated and vociferous. Unfortunately TDM is often used as the Trojan horse with which to argue for copyright reform – and by that I mean weakening of the copyright framework.

We want to support the development of this new research tool based on researcher feedback, and our policy was developed following a TDM pilot with the research community. We also continue to solicit feedback and evolve our policy in response to this.  We want to make sure we have tools and services at researchers’ disposal, as and when they want to use them.

8. And do you think the presence of an API will help to raise demand, or is it more about managing the requests you get already?

Our API is so that we can respond to researcher requests for TDM and deliver the content needed, without compromising the platform stability for other users. The API also helps us provide miners with ‘extras’ such as XML file formats. Whether it will raise demand remains to be seen but we are less focused on raising demand than on providing whatever tools researchers may need  - however many of them choose to use it.

9. Okay, last question. In 15 words or fewer, what is a publisher now?

I can only speak for Elsevier, but we see ourselves as partner of the research community, developing and providing high quality content, tools and services to support researchers and in support of science.

The ALPSP International Conference takes place from 10-12 September at the Park Inn Hotel and Conference Centre in Heathrow, London. To book tickets and for more information visit the ALPSP website.