STN commands: You need to know basic STN commands in order to do the searches in I590. Examples of those can be found at: http://www.indiana.edu/~cheminfo/cciim03.html Use the files on that page from the: Tips for Searching the CA & Registry Files (CAS8721-0894). Note especially the "Display Scan" option to view modified answers at no charge. REMEMBER THAT THE STN SYSTEM WE ARE USING COSTS REAL MONEY TO SEARCH, AND THE COST IS NOT COVERED BY YOUR STUDENT TECHNOLOGY FEE. You can build your structures anytime, but the search must be performed when the STN computers allow the Academic Program passwords to access the LREG and LCA files: http://www.indiana.edu/~cheminfo/31-35.html
One way to gain access to the STN International system is via telephone lines and a modem. Another way to access the STN system is via the Telnet program on the Internet, using the address STNC.CAS.ORG or STN.FIZ-KARLSRUHE.DE. If using STN Express with Discover! or STN on the Web, the connection to STN is performed by the software.
All commercial systems that charge for online searching of their databases require a loginid and a password. For the STN Academic Program via the Internet, the logon sequence via modem or Telnet would be as below. User input is indicated in bold. "(CR)" means hit the "Enter" key.
Logging onto STN's CAS ONLINE Academic Program via Telnet:
telnet stnc.cas.org (CR)
(CR)
Welcome to STN International! Enter x: i (CR) (1)
LOGINID: dummyid (CR)
PASSWORD: ######### (Enter the password and a CR) (2)
TERMINAL (ENTER 1, 2, 3, OR ?): 3 (CR) (3)
* * * * * * * * * Welcome to STN International * * * * * * * * * *
[News messages appear here.]
=> file lreg (CR)
[Searching occurs here.]
=> log y (CR)
(1) The "i" indicates that we are entering with a restricted access Academic Program account, accessible after 5:00 PM on weekdays and certain weekend hours. Users with full access enter "x" at this point.
(2) The LOGINID will appear on the screen, but the password (we hope!) is masked by the #########.
(3) Terminal choices are:
Once in the STN system, the prompt is: =>
The STN Messenger search software assumes that you are a novice searcher if you spell out the entire command words. Some commands have single letter equivalents which, if used, signal Messenger that you do not want to be prompted for any information the system needs to complete your search. In this case, it will DEFAULT to system-defined parameters--what the computer assumes you want to do in the absence of explicit information to the contrary.
The five basic STN commands, with single letter equivalents in parentheses where appropriate, are:
=> S PARMENTER C?/AU (CR)
See "Ways to Narrow Your Answer Set in the CA File" for CA File examples of the use of the language field, the document type field , and the publication year field.
For the CA File, the Basic Index includes:
For the Registry File, the Basic Index includes:
STN assumes that multi-word phrases are to be searched using the (W) operator in the absence of explicit positional or other Boolean operators.
TRUNCATION is the search technique that allows the searching of more than one form of a word with a single command. In many cases where subject searches are concerned, we are looking for topics that involve words built on a common root word, or that have some other variations that are easily signaled to a computer by means of a special symbol. Truncation tells the computer to form an answer set consisting of all records that contain words with the characters input for the search, but also reocords that contain related words with suffixes (or, in some cases, prefixes or variable characters at a given point in the word).
Truncation can occur at the left end or the right end of a word stem or within the word. STN now allows all three types of truncation in the CA File Basic Index. The limit of terms that can be gathered in a set by truncation is 30,000 stems. For left truncation the search term must have at least four characters.
On the STN system, truncation symbols are:
| Symbol | Function | Example |
|---|---|---|
| exclamation point (!) | Exactly one character | cataly!e |
| hash mark (#) | One or no character | alcohol# |
| question mark (?) | Any number of characters | ?therap? |
As noted in the table, the # sign can be used at the end of a word to pick up both singular and plural forms of a word. Another way of accomplishing the same thing on STN using the command language option is to enter SET PLURALS ON at the system prompt. Both left- and right-hand truncations are allowed with the "?". See:
for other examples of truncation in STN.
There are limits to the number of terms that can be gathered into a set using truncation. Therefore, caution must be exercised in using truncation to prevent too many search terms (or unexpected words) from entering the answer set.
We have already seen the expand technique profitably used in author searching. It is also a very useful option in subject searching, especially since it allows us to determine whether the search term we are considering is actually used in the system. In addition, keyboarding errors that have gone undetected may be revealed in an expand list. For example, in the STN CA file, the following list appeared when "organomagnesium" was expanded in the Basic Index at the time of the search:
| Set # |
# of Answers |
Variant Spelling |
|---|
E1 1 ORGANOMAGNESIATE/BI E2 1 ORGANOMAGNESIATES/BI E3 823 ORGANOMAGNESIUM/BI E4 1 ORGANOMAGNESIUMALUMINUM/BI E5 2 ORGANOMAGNESIUMOXANE/BI E6 59 ORGANOMAGNESIUMS/BI E7 1 ORGANOMAGNETIC/BI E8 1 ORGANOMAGNETISM/BI E9 1 ORGANOMAGNSIUM/BI E10 1 ORGANOMANGANATE/BI E11 1 ORGANOMANGANATES/BI E12 74 ORGANOMANGANESE/BI
Note that E9, the one document in the file with the misspelled term "organomagnsium" would probably be missed in a subject search if not spotted in the expand list, so the search statement to pull all of the variants into one set in the CA file would be:
For the online CA File on STN, the preferred terms are searched with the field labels "CT" for phrases or "CW" for words. Thus, a search for parasympathomimetics would find in the printed CA Index Guide that the preferred phrase to search is Cholinergic agonists. The online CA file search using command language would then be:
=> S CHOLINERGIC AGONISTS/CT
In an online search, it is important to include the CAS standard abbreviations and acronyms since the abbreviations are used in preference to the full terms in the online records, hence, in the Basic Index of the CA File.
One can always issue the DISPLAY IND command to see how a particularly relevant document has been indexed and then input relevant indexing terms to broaden or narrow a search. Look especially for abbreviations such as DETN or DEGRDN. These are used in preference to the full terms such as determination, degradation, etc. in indexing CA. See CAS Standard Abbreviations and Acronyms. On the STN system, it is now possible to use the command SET ABBREVIATION ON to automatically check if there are CAS abbreviations used for the search terms you input. If so, the system automatically searches those forms. If SET PLURAL ON is also in use, the plural forms of the abbreviations will also be found. Users of SciFinder and SciFinder Scholar do not need to worry about such subleties because the search algorithm authomatically makes allowances for such variants.
Look at a sample record from the CA Student Edition on OCLC, paying particular attention to the index terms and the use of abbreviations.
ROLES are CAS indexing terms assigned to every indexed substance and to controlled index terms for classes of compounds. The use of roles began to be appplied to the new online CA File records with v. 121 (July 1994). They were then applied retrospectively to all CA File records by means of a computer algorithm. Originally there were 38 specific roles and 7 broad super roles. They substantially expand the indexing terms that were used prior to their introduction. The role terms give a more precise link to the substance. For example, it is now possible to specify not only that you want the preparation of the substance, but also that the preparation be a synthetic preparation, as opposed to industrial manufacture. In the past, there was no distinction made in the use of the term "Preparation" in such cases. Nevertheless, it is still possible to search in the CA File for all manner of preparations of a substance or a group of substances found in the Registry File by appending a "/P" to the answer set number from the Registry File (or for a single substance, by appending a "P" directly to the Registry Number in a CA File search), e.g.,
=> SEARCH L2/P (where L2 is an answer set from the Registry File)
or
=> SEARCH 494-12-2P (where 494-12-2 is the CAS Registry Number for Flavan)
Roles must be attached to an L# answer set formed in the Registry File if used in conjunction with that L# to search the CA File. An example of the use of the role code "SPN" (Synthetic Preparation) is:
=> FILE REGISTRY=> S FULLERENE/CNS
L2 3287 FULLERENE/CNS
CNS is the chemical name segment field designator on STN.
=> FILE CA
=> S L2/SPN OR FULLERENES/SPN 5347 L2 35422 SPN/RL 206 L2/SPN (L2 (L) SPN/RL) 1759 FULLERENES/CT 35422 SPN/RL 108 FULLERENES/SPN (FULLERENES/CT (L) SPN/RL) L3 248 L2/SPN OR FULLERENES/SPN
The Roles can be viewed in an online thesaurus to see the role hierarchies and definitions. They are currently used in the CA and CAplus files and in the CASREACT and MARPAT files.
To ensure that the CAS Role Indicators are in agreement with the current focus and direction of chemistry, the following key changes to new and modified Role Indicators were made in late 2001. New Roles have been added:
The Registry File is the largest single source of chemical names in existence. It can be searched by a trade or common name for a substance (CN), by its CAS Index Name (CN) or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching.) The Basic Index of the Registry File includes both chemical name fragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters in order to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a period before and after the Greek part of the name. Examples of chemical name searches in the Complete Chemical Name Index (/CN) or the Chemical Name Segment Index (/CNS) of the Registry File are:
=> SEARCH ISATIN/CN
=> SEARCH .ALPHA.-METHYLBENZOIN/CN
=> SEARCH ACETYLSALICYLIC ACID/CN
=> SEARCH IMINO/CNS
Since there is a fee to search terms in the Registry File, it is best to check the name by first expanding it in the relevant index. Often, the combination of a molecular formula search and a Chemical Name Segment search is an effective way to retrieve a substance when the molecular formula alone has many isomers.
An example of such a chemical name search in SciFinder Scholar is below. Note that in the SciFinder Scholar system, the search will work with or without the periods around the "alpha," but in STN command-language searching, the dots are mandatory.
Since there is a fee to search terms in the Registry File, it is best to check the name by first expanding it in the relevant index. Often, the combination of a molecular formula search and a Chemical Name Segment search is an effective way to retrieve a substance when the molecular formula alone has many isomers.
Since the information in Chemical Abstracts is classified into 80 major subject sections, the section numbers and codes can actually be used on STN with the CA Classification "CC" field in subject searches to assist in limiting a search. For example, works dealing primarily with enzymes are found in section 7 of the weekly Chemical Abstracts. Other documents are assigned to one of the 80 subject categories divided into the following gross categories:
| Section Name |
Section Code |
Section Numbers |
|---|---|---|
| Biochemistry | BIO/CC | 1-20 |
| Organic Chemistry | ORG/CC | 21-34 |
| Macromolecular Chemistry | MAC/CC | 35-46 |
| Applied Chemistry & Chemical Engineering | APP/CC | 47-64 |
| Physical, Inorganic, & Analytical Chemistry | PIA/CC | 65-80 |
Thus, a strategy that included in an online search on STN:
would have the effect of limiting the retrieved documents in answer set L4 to those dealing with enzymes (found in section 7 of the printed CA) or more broadly, those a biochemical nature found anywhere in section 1-20 of the printed product.
STRUCTURE SEARCHING allows a search to be run using the chemical structure as input. The searches are generally run against online chemical dictionary files, such as STN's Registry File. Depending on the type of structure search allowed by the system, the complete molecule or any compound containing the structure of the molecule will be retrieved as an answer set. The retrieved structures may include salts, isotopically labeled substances, mixtures, and structures in which the drawn structure is contained as a subset of a larger structure.
Unlimited substitution of the input molecule may be allowed at free sites on the molecule (a FULL SUBSTRUCTURE SEARCH) or substitution may be limited to certain sites (a CLOSED SUBSTRUCTURE SEARCH). On the STN system, once an answer set is formed in the Registry File, it can be crossed over to the CA or other files to conduct further subject searches of the compounds thus isolated in a structure search. In these cases, it is actually the CAS Registry Number for the compounds that is being searched in the crossover files. Note that it is now possible to conduct a search that takes into account the stereochemistry of the chiral centers and double bonds. Stereo searching can be performed in the Registry File and the Beilstein File on STN or on the Beilstein CrossFire system. Finally, MARKUSH STRUCTURE SEARCHING, an important technique in patent searches that allows for considerable variablility in the structures retrieved, is another option in some files.
There are many reasons to do a substructure search, among them:
In combination with other types of searches, structure searching is a very powerful complement.
Over 30,000,000 registered small molecule substances appear in the Chemical Abstracts Service Registry File. All of those have been registered since 1965, but, of course, not all of the compounds in the Registry File were discovered since that date. In fact, there are many compounds in the Registry File that have no new information on them in the CA or CAPlus Files (that is, in the literature from 1967 onward). However, most of the millions of compounds in the Registry File have their Registry Numbers linked to the to databases on the STN system. The LC (File Locater) field of a Registry File record tells in which databases on STN the Registry Number is found. In addition to the Registry File, structure searches can be conducted in such databases on STN as BEILSTEIN, CASREACT, and others. A similar file locater function is included in other chemical dictionary files, such as NLM's ChemID.
There are several types of structure searches possible in the Registry File, as well as different options for views of the molecules and different methods of inputting the structure. SciFinder Scholar masks to a certain extent the relationship between the Registry File and the CAPlus File, CASREACT, and other databases intertwined with its software.
Once the structure is built and the answer set retrieved, the search proceeds as it does with compounds identified by name or molecular formula searches. The structure search can be further refined with additional structural features or by limiting it with other parameters. Once refined, the references can be retrieved that have the Registry Number of the compounds in their indexing.
In traditional, command-driven structure searching, when logging on to STN, the choice of terminal determines what type of view of the molecule you will see. If one selects option 3 at the prompt:
TERMINAL (Enter 1, 2, 3 OR ?)
the structural depictions will be encoded with regular punctuation symbols found on a computer keyboard. Thus a double bond might be indicated by a ":" or a "=". With the proper telecommunications software, selecting option 2 will depict the structures as true graphical representations. That is the default option when using STN Express with Discover! (front-end software that allows the building of the structures offline).
The following types of structure searches are possible on STN:
With SciFinder Scholar, one of two true structure searcing options is available, depending on whether the Substructure Search Module is included in the version of the software. The basic SciFinder Scholar search covers an exact and family search. The SSS module allows the fuller search options. (A similarity search has recently been added to the options available in SciFinder and SciFinder Scholar, but this is based on a different principle than the structure searches.)
There are actually several stages of a Registry File structure search. The first stage involves a screening of the huge file for compounds that have the requisite substitutents and other features, without regard to their position on the molecule. The much more computer-intensive iteration stage involves an atom-by-atom, bond-by-bond look at the candidate molecules isolated in the screen search. Since this stage requires so much of STN's computer resources, there are limits on the number of compounds that can be looked at during the iterative stage. A sample search must be run on approximately 5% of the file, after which a prediction of whether the full file search will run to completion is given. Assuming the prediction is favorable, the full file search can be compared to the structure. Otherwise, the structure must be modified to be able to run to completion. With SciFinder Scholar, there is some built-in intelligence that offers to "autofix" a molecule that might give the system trouble. It is also wise to preview the SciFinder Scholar search to see what kinds of substances might be retrieved with the structure as drawn.
The "old-fashioned" way of building structures on the STN system is to use alphanumeric commands to gradually create the molecule. There are front-end programs such as STN Express or STN on the Web that can be used to draw a graphic depiction of the molecule offline and upload it to STN once the connection is made. Of course, SciFinder or SciFinder Scholar have a structure searching option. Nevertheless, it is instructive to see the original commands used to draw the molecule and the options for assigning parameters to the structure. When building the structure online via commands, it is advisable for cost reasons to build it in the cheap LREG file. Once complete and an L# is assigned to the structure query, you can transfer to the more expensive Registry File to run the search.
These are the basic steps that must be followed to create the structure online on STN using command language:
At this point, an L# is assigned to the structure query you have created. Once the Registry File is entered, the structure search is initiated with the SEARCH L# command. An example of the structure building process using commands on STN and a Type 3 (alphanumeric) terminal setting is seen here.
The Graph command builds the basic outline of the molecule. This can be a cumbersome process for larger molecules. Hence, there are alternatives. One way is to start with the Registry Number of a known substance that is similar to the compound of interest. Once the STRUCTURE command is given, you are prompted to:
ENTER NAME OF STRUCTURE TO BE RECALLED (NONE):
At this point, you could enter a Registry Number or, if you have built another structure in this session, the L# for that query structure. Another alternative is to enter a code for the pre-drawn systems used in creating structures. Rings of size 5 to 12 ring atoms can be created simply by inputting the appropriate number at the prompt. Other pre-drawn options include STEROD (steroids) and ADAMAN (adamantanes).
If starting from scratch, the two basic options for the GRA command are to draw a chain (c) or ring (r) followed by a number indicating the size of the chain or ring. Thus, GRA c3 builds a chain of 3 atoms, and GRA r6 builds a 6-membered ring. The structures appear on the screen with carbon atoms as the default nodes, and unspecified bonds. All nodes are numbered, so further commands to the system utilize the node numbers for appropriate actions.
One potentially confusing use of the GRA command occurs when two nodes are to be connected. Intuitively, this would seem to involve the BON command because we want to form a bond between the two atom nodes. However, BON is used only to modify an unspecified bond created with the GRA command. Thus, if we wanted to create a 14- membered ring, one way to do it would be to GRA c14, then GRA 1-14. That puts the necessary link between the two end nodes (although some other moving of the atoms would be necessary to make it appear reasonable on the screen).
The NOD command takes the form: NOD # symbol where the # refers to the number of the node in the molecule and the symbol is defined either by regular symbols for the elements or by special node symbols understood by the STN system. The latter include such things as "X" to represent any halogen, "M" to represent a metal, or "Gk" (where k represents a number from 1 to 20) to indicate a node which can vary according to your defintion of the possible symbols (done with the VARiable command). There are also a number of SHORTCUT SYMBOLS for groups such as methyl "ME" or tert-butyl "T-BU".
There are four GENERIC GROUP SYMBOLS:
By issuing the GGC (Generic Group Category) command, these symbols can be further limited by type, for example, linear "LIN" or low carbon (6 or fewer carbons) "LOC".
Finally, it may be necessary to define a node as potentially being in either a ring or a chain. This is done with the command NOD # rc. Since the system assumes by default that the node is only to exist in the environment drawn, it is necessary to override the default with the rc specification when it is ok for an end node to be in either a ring or a chain in a substructure answer set.
For a tautomer, the following environment must exist:
where:
The central atom is: C, N, P, As, S, Se, Te, Cl, Br, or I.
At least one hydrogen, hydrogen isotope, or charge is on 1 or 3.
It is also possible to specify that a bond is only a ring bond or only a chain bond by defining it as BON rs or BON cd, for example. By default the system will assume that the bond is only to be part of a compound that has the environment in which it is drawn.
CASREACT: In addition, common functional groups in the reactants, reagents, and products are searchable with a name labeled as /FG. For example,
=> S PRIMARY AMINE/FG
or
=> S TRIHALIDE/FG.RCT
ROLES: used to describe the information that deals with the substances indexed in a document. One of the super roles is PREP (Preparation), which has more specific roles:
The PREP super role is equivalent to the STN CA/CAplus File search => S L#/P. The same results would be found with the strategy: => S L#/PREP
Roles must be appended to a Registry File answer set if used with an L#. However, they can be applied both to L#'s which may contain one or more substances and to individual CAS Registry Numbers or General Subject Index terms for classes of substances. For example, => S 91-56-5/SPN would find a laboratory-scale preparation of isatin.
It is also possible to label a substance with the role RCT (Reactant) in order to limit the answer set to references where a particular reactant is used, as in:
=> S L# AND 91-56-5/RCT (Note that the L# in this case may not be from the Registry File.)
In the CA File on STN, a convenient way to find all kinds of ways of preparing a substance is to search the CAS Registry Number, either directly or by crossing over from the Registry File. A "P" appended to the search strategy results in the search being limited to the items of interest. For example:
=> S 91-56-5P
or
=> S L#/P
In the second case, the "L#" (answer set) would have resulted from a search in the Registry File that found one or more compounds. It could represent a group of related substances from a substructure search. The "/" is required before the "P" in the CA/CAplus file search when using such a L#.
MISCELLANEOUS COMMANDS: SET REG OFF This command allows you to suppress the automatic REGISTRY search and crossover initiated by REG1stRY when a CAS Registry Number is entered in the CA/CAplus family of files. REG1stRY is, by default, ON. Simply enter SET REG1stRY OFF at any arrow prompt to suppress REG1stRY. When SET REG OFF is used, the CAS Registry Number is searched in the BASIC INDEX (/BI). When you search terms in other REG1stRY fields, e.g. chemical name (/CN) or molecular formula (/MF), SET REG OFF does not affect the automatic REGISTRY search and crossover. See HELP SET REG for more details.