The complexities of Chemical Information
Posted by katiefraser on 29 June, 2010
In May I visited the London headquarters of the Royal Chemistry Society in Burlington House to attend an event entitled ‘Chemical Information for the Chemist and Non-Chemist’. As I’m new to the world of Chemical Information (albeit armed with my knowledge of information resources and an A level in Chemistry) I’d been looking out for a session to expand my knowledge and this seemed perfect. For those interested, the slides are available on the CICAG (Chemical Information and Computer Applications Group) website – just click on ‘previous meetings’, but here I wanted to talk a little bit about what I learnt about chemical information in general at the event.
Since I started getting up to speed with Chemical Information resources I’ve been fascinated by the unique search mechanism of molecular structure. The majority of chemistry-focused databases cross-reference the literature with molecular structures.. This means you can draw a molecule, and then search for articles referring to it. As David Walsh (whose presentation has informed much of my thoughts in this particular post) noted at the event, the naming of chemicals changes constantly according to fashion, the property of the chemical that a particular scientists wants to emphasise, and according to commercial concerns (for example, using trade names, or local laboratory numbers). Drawing the chemical allows you to by-pass a large number of these problems.
This seems like the perfect search tool! Surely a mechanism allowing such exact searching means that the core information professional’s toolkit – define your keywords, perform the search, alter keywords, perform your search, iterate until satisfied or exhausted – seems almost redundant? Well, unfortunately for simplicity, but luckily for making information professionals feel useful, this isn’t the case. A lot of the time there’s reason to search for something that’s either more or less specific than a molecular structure.
For example, when patents are registered for chemicals they usually use something known as the Markush structure – a molecular diagram which records certain key aspects of a compound, but allows for certain points on that structure to be substituted by a variety of different sub-structures. This indicates that a lot of the time one exact molecular structure can be too specific. On the other hand, sometimes a molecular diagram is not specific enough. For example, the stereochemistry studies at the arrangement of atoms within a compound. When compounds with the same molecular structure are arranged differently, this can give two apparently identical compounds different chemical and physical properties.
These different degrees of specificity have interesting implications for the type of keyword generation that needs to happen in searching for chemical information. In a lot of subject areas I’d advise looking to see what’s available in the literature before deciding how specific to be in search terms: in a little studied area you tend to go quite wide and gather in a lot of related literature; in a widely studied area you can afford to be quite specific. However, in chemical information you can define up-front whether you’re interested in a wide group of compounds or just a very specific isomer and use this to inform your search. The downside being that the beautifully simple molecular structure search isn’t always the one you want.
Over the summer I’ll be thinking more about how the different kinds of information used in Chemistry affect the way it can be taught, and learning more about the different kinds of notation that are used. I’d highly recommend looking at David Walsh’s slides, entitled ‘What Makes Chemical Information Different?’ from the event to get a good overview of many of the different types of notation used. However, I think cramming all of these into a one hour session might make students cry!