Contrasting Search Engine Returns and Indexing of the LRA
Posted by gazjjohnson on 6 July, 2012
Repository metrics are upper most in my mind at the moment, as I’ve co-authored a paper for Open Repositories 2012 on the subject. But they’re also in my mind due to some work I’ve been doing with the LRA lately.
A bit of background first. A couple of months ago we upgraded the LRA and shifted the server and underlying platform it runs on. There have been a few issues, nothing devastating mind you, that myself and my wonderful techs have been working to resolve. One issue that’s niggled at me as manager of the service is that the hits we seem to be getting recorded via Google Analytics were ~75% down on where they were before the change.
While we did discover we were missing a bit of code on some the pages which helped restore some of the recorded traffic, we’re still >40% down on where we have been for the past few years. While I’m still trying to answer the question “Were the readings before abnormally high or are the readings now abnormally low” I’ve been digging around to try and ID where the issue might lie. Certainly traffic from search engines is the most significantly reduced element.
So today I’ve run an analysis using the most popular items on LRA in recent months and run them through 4 search engines that regularly do point readers to the repository. The publications were as follows:
- Financial Development, Economic Growth and Stock Market Volatility: Evidence from Nigeria and South Africa Ndako, Umar Bida
- The propagation of VHF and UHF radio waves over sea paths Sim, Chow Yen Desmond
- Social inclusion, the museum and the dynamics of sectoral change Sandell, Richard
- Writing up and presenting qualitative research in family planning and reproductive health care Pitchforth, Emma et al
- Facebook, social integration and informal learning at university: ‘It is more for socialising and talking to friends about work than for actually doing work’ Madge, Clare et al
- Pragmatic randomized trial of antenatal intervention to prevent post-natal depression by reducing psychosocial risk factors Brugha, Traolach S. et al
- The challenges of insider research in educational institutions: wielding a double-edged sword and resolving delicate dilemmas Mercer, Justine
- An efficient and effective system for interactive student feedback using Google+ to enhance an institutional virtual learning environment Cann, Alan James
- The Development of Nurture Groups in Secondary Schools Colley, David Rodway
Mobile technologies and learning Naismith, Laura et al
- An evaluation of forensic DNA profiling techniques currently used in the United Kingdom. Graham, Eleanor Alison May
- Twitter and Public Reasoning Around Social Contention: The Case of #15ott in Italy Vicari, Stefania
There is a good mix of items in the above selection, including some items that aren’t available any where else. I performed three basic searches
- The full article title
- The first four significant (non-stop) words of the title and first author’s surname
- Author’s name alone
The results were as below.
Google Scholar aggregates together hits with the same title as one return, normally pointing to the published version. This means that where this happens unless you open up the other hits, you don’t spot the LRA. So for example Eleanor Graham’s paper is listed as 1*2 – that is the first hit was this paper, but the LRA link was the second in the sublist.
What have I inferred from this? Well it seems for the most part these search engines are indexing the LRA still. Given these are popular papers, I’d expect to see them returned as very highly relevant results. Some particular observations with respect to searching for Open Access publications on the LRA:
- Google: Appears very good for tracking down OA papers with full title and partial title and author. Terrible though for searching for an author’s paper by name alone.
- Google Scholar: Okay for searching OA papers with title or title and author name, but not as good as vanilla Google. Also very good at obfuscating the availability of an OA version of a paper beneath a publisher link. Surprisingly though better than Google at retrieving an author’s papers with just their name (but given the more focussed collections that Google aims to search, this is perhaps to be expected).
- Scirus.com: Brilliant with title and title plus author name at finding OA papers. The best of the four I used in tracking down items by author name alone too. Without a doubt the best of the bunch (in this rough and ready test)
- Bing: intermittently good at times and poor in others in retrieving papers. Worse than both vanilla Google and Scholar, and much worse than Scirus. However, had some successes in identifying papers with a high relevance ranking by author name alone at times when the other three search engines could resolve them.
In conclusion if you’re looking for open access publications I would use Scirus.com first and foremost, but avoid Bing unless you’re hitting a total dead end (or just have an author name) and use the Google Family of search engines with care. As for the LRA, looks like we are indexed by most of these (although I’ve questions about Bing’s totality of coverage).