Rebecca Whelchel
Chemical Informatics
Project Report
Objectives:
The purpose of our project was to make data from a crystallographic source available on the web, and to store pertinent data in a database that is easily searchable.
The Data:
The pages of Groth’s Krystallographie were scanned by IU Bloomington students and posted on the web. The IUPUI students were responsible for creating the database containing information from and about the pages of the book.
Our first task was to find the standard English IUPAC names for the compounds chosen from one of the Krystallographie volumes. German-English chemical dictionaries were first used to translate parts of the chemical names into English. For example, “jodo” in German was found to mean “iodo” in English. This was one instance where we could not really guess the English translation. Luckily, the use of numbers, letters (o=ortho, p=para…) and prefixes (di-, tri-…) were the same in English as they were written in German, so the structure of each molecule was easily found. The IUPAC names and verification of the structures were found using the existing databases of Beilstein and Scifinder.
The CAS numbers for each compound were found in a similar fashion. Including the CAS numbers in the database allows the user to use this number to identify a compound that may have a complicated name in a quick way. This also allows the user to search other databases for information related to this compound. Physical properties and publications may be information that can be easily found with a CAS number.
Also included in the database were SMILES codes for each compound. SMILES codes allow the user to view a graphical representation of the compound. We used the JME applet presented to us in class to draw the structures of the molecules and learn their SMILES codes. This same applet could be used to view structures by submitting the SMILES code from the database into the program. We had originally wanted to design the database so that one might search for structures or substructures using the smiles code. This presents a problem though because there can be many different codes for each molecule. Someone searching for a molecule would have to use the same applet we used to draw a structure and obtain a code, and then search for it in the database. We decided that searching for structures and substructures using SMILES was probably not worth it.
We also provided a web address for the page where each molecule was displayed in the scanned pages. Because we had divided up the project into jobs of locating different data for each compound, this was a tedious exercise. It would have been much easier if there had been a directory page on the Internet telling which molecules were on which pages. Nevertheless, by including these page addresses in the database, the user may access the original pages of the book along with the crystallographic data found there.
The Database:
My personal part of the project involved the creation of the database. The excel spreadsheet we created was easily imported into Access, but really is useless unless the user knows how to use the program. To extract certain information from the database, one must submit a query. One may choose which field one wants to search, and whether they want exact matches or just entries that contain part of a piece of data. For example, one could search for a specific name of a compound, or all of the compounds in the “English name” field that contain “nitro”. Again, if one does not know how to do this, it cannot be useful. Another useful feature of Access is that one may view only one or a few of the data fields associated with a compound. So if you don’t want to see all the data for benzene, you can just specify that you want to see the name and the CAS number only. One may also sort data alphabetically or by increasing or decreasing number. To view the table, click on the table icon and select sheet 1. This all of the data retrieved from the different sources. To view the queries, click on the queries tab and then select an individual query. The names indicate whet the search was for. Once in the query, click on the picture in the toolbar that has a blue triangle on it. This allows you to see the view containing the qualifications I selected for the search.
I also contributed greatly to the organization and design of the project. Since I was in charge of the database, I basically dictated what we should include, and what information would be useful for searches. I took a leadership role in assigning the data collection jobs to different group members. We thought that assembling the database could be difficult, and I was the only one who had ever used Access. I did participate in the matching compounds to their web pages as well.
The Conclusion:
Overall, I can see that this project could eventually turn out very nice if the database in Access or another searchable database could be put on the Internet. Also, the web page could be very nice if the pages were more easily accessed, perhaps by a listing on page links online. Hopefully this information can be made readily available online so that many more people may access the information in Krystallographie.