Project 2 – Explore the ExPASy and KEGG databases.
Instructions
These instructions are going to walk you through finding an assortment of information regarding your protein and using some of the protein manipulation tools available on certain bioinformatics web sites. I encourage you to look around these sites beyond the explicit instructions. There are many useful tools for manipulating, comparing and otherwise probing protein sequences and structures on these sites. You will be able to use some of these sites to help you with other problems on your problem sets, even if I don’t explicitly send you to the Internet to carry out the assignment.
Have handy the following information about your protein before you start: Name, E.C. number (format: #.#.#.#), Swiss-Prot Accession Number (i.e. P#####).
1. Go to the ExPASy web page (http://www.expasy.ch/). This acronym stands for Expert Protein Analysis System and is one of the primary sites that annotates and cross-references the various databases in which protein information is stored. To make the best use of these sites, you need to understand their limitations. ExPASy, for instance, focuses solely on protein sequences and you will have to go elsewhere (NCBI) if you have nucleotide data.
2. Your first task is to find your protein in ExPASy. You will find it by several routes. The first is by using the name of the enzyme.
3. From the main ExPASy web page, click on Swiss-Prot and TrEMBL.
4. Notice that at the top of this page you have a quick search box that will allow you to find specific pieces of information. On the lower part of your screen, you have several different search options for more advanced and refined searching. Enter the name of your enzyme into the quick search box.
5. Click on the enzyme from the appropriate organism from the subsequent list.
6. You are now at a page called NiceProt for this particular protein. Does the accession number match the one you were expecting? We will come back to this page in a moment if it was the correct enzyme.
7. Click on the back button twice to get back to the quick search window.
8. Enter your accession number into the quick search. It should take you directly to the NiceProt page. Print out a copy of the NiceProt page for future reference (Note: there is a button at the top labeled PRINTER FRIENDLY VIEW. You may want to consider pressing it before you try to print.). Notice at the bottom of this page there is a full protein sequence. To the right is a button labeled fasta format. The fasta format is used by many programs, so remember how to access this file format. Also notice that there are a variety of tools available at the bottom of this page. These are some of the same tools available to you in step 13 below.
9. Click on the button in the menu bar to return to the ExPASy home page.
10. Now, click on the link called ENZYME.
11. You are taken to a page where you have several additional search choices. Here you can enter either the E.C. number (you must use the tab key to go between the four boxes) or the enzyme name. Enter your E.C. number and press Search.
12. A new page called the NiceZyme page appears. Print out a copy of this page. How does the information on this page differ from that on the NiceProt page you looked at a few minutes ago? Why is there no sequence information on this page? There is a section at the bottom called Swiss-Prot. What do you see in this section? Click on one of the hyperlinks in this section. Where does it take you?
13. Return to the ExPASy home page (Use the back button or go to http://www.expasy.ch/). Spend a few minutes exploring some of the proteomics tools accessible from the main ExPASy web site (right column). Look in particular at the sites listed under identification and characterization, primary sequence analysis, and secondary structure prediction. Submit your protein to a few of these analyses by using the Swiss-Prot number and see what you learn about your protein.
14. Next, we will go to the KEGG database (http://www.genome.ad.jp/kegg/kegg2.html), listed under class links as metabolic pathways. You could get there directly from ExPASy using the links from the NiceZyme page.
15. Press Search object in pathway maps.
16. In this box, you would be able to search for a variety of different things. Enter the E.C. number for your protein (enter just the numbers in the format “X.X.X.X”) and press exec.ww
17. Take note of the pathway(s) in which your enzyme participates. Depending on the enzyme there may be only one listing or there may be several. Click on the hyperlink to bring up the actual pathway map. We will look at a number of these pathways later in the semester.
18. Your enzyme will be colored red on the pathway. All of the E.C. numbers and intermediary metabolites are hyperlinks. Press on the link to your enzyme. What information is here? We will explore metabolic diseases in a few moments. Look toward the bottom for a line entitled Disease. MIM:#####. This link will take you directly to the database for genetic diseases. Press the back button to return to the map.
19. Now, press on the circle that represents your starting material. A structure of the compound will appear. If this compound is used in several different metabolic reactions, you will be able to see that on the page that appears. Press the back button and return to the metabolic map.
20. You can superimpose a lot of information on the pathway by using the pull down menu at the top left. For instance, click an hold on the pull down menu that reads reference pathway. Select Homo sapiens. Then press Exec. Some of the boxes are now shaded. Is your enzyme shaded? Click on the shaded box to see information on this homolog. Clicking on the back button will take you back to the pathway. To what use could you put the knowledge that metabolic pathways between two organisms sometimes differ?
21. Now, to see if there are any genetic diseases associated with your enzyme, go to the OMIM database is the on-line database of Mendelian Inheritance in Man (http://www.ncbi.nlm.nih.gov/Omim/). You may have already found one or more of them in step 18.
22. Click on Search the OMIM database.
23. Enter the name of your enzyme in the search window and press return.
24. The computer should respond with a list of citations to your enzyme. Use this information to answer question below. Some of the proteins will have very extensive links while others will be pretty minimalistic. For anyone who wants to see how much information there is on a well-studied genetic defect, explore phenylalanine hydroxylase (EC: 1.14.16.1), the enzyme responsible for PKU.
25. Take notice of your enzyme throughout the course. You may be called on to tell us something interesting about your enzyme during our discussions.
Thought questions to help you reflect on the exercise and analyze the output files:
a) Compare and contrast the information available on the NiceZyme and NiceProt pages and explain when you might one to consult one rather than the other.
b) In three independent digestions, subject your protein to cleavage with trypsin, chymotrypsin (high specificity) and cyanogen bromide. Use the tool PeptideCutter found on the ExPASy home page (ExPASy --> proteomics tools --> Identification and characterization --> peptidecutter). Turn in the cleavage map with your problem set. For one peptide (10 amino acids or longer) draw out the complete structure, calculate its molecular weight and determine its net charge at pH 2, pH 7 and pH 9.
c) What chemical reaction does your enzyme catalyze?
d) Does your enzyme require a cofactor for catalytic activity?
e) In what metabolic pathway does your protein participate?
f) Is there a disease associated with a mutation of your protein? If so, what is it? If there are multiple, describe one. Include information on what that disease is, what the mutation is that causes it and whether the disease is related to altered activity, altered expression levels or improper regulation of the activity.