This is not shameless plugging, honest. I genuinely am excited about the fact that we can now digitise content i.e. you give us your hard copy, we give you all-singing, all-dancing, kick-ass active PDFs. (Previously we worked from PDF or PostScript). This is such a cool development because it enables us to start putting reeeaaalllly looooonnnnggg backfiles online in cases where the publisher would previously have been stumped by lack of digital content – we have one such customer (not sure if I'm allowed to say who) who is now able to put a 100-year backfile on IngentaConnect. Duuuude.
Now, of course I know that JSTOR have got stuff online that goes back way longer, as have all sorts of others – but let's not throw the oranges in with the apples. To my knowledge, content in archives like JSTOR's is "flat", i.e. it doesn't sing, dance or kick anybody's ass. Which is why I'm so excited that we now Have The Power to add sexy features to ancient content (without resorting to plastic surgery). I can still remember trawling through dusty old indexes when I was at University, desperately wishing that it was easier to trace a relatively little-known artist's correspondence through the volumes of early C20th art journals, without having to write out reference lists, return to the OPAC, painstakingly look for each reference, write out its location, order it from the store, retrieve it some time later (do I imagine that it could be days?), and start the process again. It's why I fell in love with reference linking when I started working at CatchWord, and it still informs the passion I have for electronic publishing & associated technologies today.
Anyway. So. It's pretty simple really. We are scanning hard copies and running them through Readiris, which uses OCR to extract the text and embed it back into the PDF in a format that can then be processed through our proprietary metadata extraction software (known, oddly, as LemonElf.) At that point, Bob's pretty much your uncle – our reference activation within PDFs is one of our oldest and most successfully honed & refined products; it's one of the key differentiators between us and other e-journal hosts. (I'd like to be able to claim that we were the first to do it, but those pesky physicists got there first. Ain't that just the way ...). I'm hoping to gain a current publisher customer's permission to open up an article for demonstrating just how cool the digitisation is, both in terms of the quality of the scanned PDF, and the success of the metadata extraction & reference activation. I'll update this post with a link to that as soon as I can. In the meantime, I feel a celebratory pint or two coming on. Woo.