Relational protein structure database (SCAN3D)

Introduction.

WHAT IF has a relational protein structure database on-line available. The command SCAN3D activates the menu that operates this database. I used the name SCAN3D to honor Steve Gardner who called his very fancy relational database program 3DSCAN; eons ago he and I discussed this methodology for a long time over a lot of beer. SCAN3D allows you to search in the database for sequence, secondary structure, 3D-structure, accessibility, etc. characteristics. The last part of this chapter is an early draft of an article explaining why SCAN3D is so very special...
The commands are roughly divided in five groups:
1) Database inspection commands.
2) Scanning (database search) commands.
3) Logical operations (making relations).
4) Evaluation of results (listings, graphics).
5) Other commands.
Since there are many options, only a limited set is originally active in this menu. Use the command MORE to activate all options.

The main principle of this database is that you search for fixed length stretches of amino acids that have certain relations between all their stored parameters. These found stretches are stored in groups. These groups can be combined using logical operations like AND, OR, XOR, etc. They can also be visualized at the graphics. The length of the groups searched for can be set from 5 till 35 (with the SETLEN command).

The experienced user will see that there is some overlap between the groups described in this chapter and the DG*** groups described in the structure fragment chapter.

Scanning the database

The relational database in WHAT IF does not use a fancy language like SQL for its queries. Instead, the user has to make the relations. The general principle is that you look for one characteristic, and store the result in something called a group. Thereafter you search for a second characteristic, store this result in another group, etc. Then you make the relations by means of logical operations on these groups. The normal logical operations AND, OR, XOR, etc are available. This section describes what all you can ask the database. The next section describes the usage of these logical operations.

Introduction

You can fully flexible get any sequence out of the database. See the last part of this chapter about how to set the length of the stretches searched for, about how to define groups of amino acids with a common name. See the previous section for inspection of the sequences in the database.

Search for sequence fragments (SEQUEN)

The command SEQUEN will make WHAT IF ask you N times (N is the length of the stretches searched for) 'Give the amino acid at position *'. For every position in the search string you can give just return, meaning that every amino acid is allowed there. You can also type one or more amino acids in three letter-code. If you type more than one amino acid, this means that every typed amino acid is acceptable at this position in the stretches to be found. You may also mix in one or more of the so-called self-made amino acids (see last section of this chapter).

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for secondary structure (HELSHT)

The name of this command is a little bit to humble. It does not only allow you to search for helix and/or sheet, but also for turns and coil. If you type HELSHT, WHAT IF will loop over all positions in the search string, and for every position prompt you whether this position should be HSTC*. You can now give any combination of these 5 characters. If you give for example H at position 3, all stretches that will be lifted from the database will be helical at the third position. If you give CT at position 6, all stretches that will be lifted from the database will either be turn or coil at the sixth position. If you give * at any position, then every type of secondary structure will be allowed at that position.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for phi-psi combinations (PSIPHI)

The command PSIPHI will make WHAT IF loop over the length of the search string, and for every position prompt you for the limits of phi and psi at this position. Here you have to give 4 values, all in the range -180.0 till 180.0. The first two are the lower and upper limit for phi, the last two are the lower and upper limit for psi.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for omega values (OMEGA)

The command OMEGA will make WHAT IF loop over the length of the search string, and for every position prompt you for the limits on omega at this position. Here you have to give 2 values, both in the range -180.0 till 180.0.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for backbone angles (ANGVAL)

The command ANGVAL will prompt you for one of the backbone angles: N-Ca-C, Ca-C-N, Ca-C-O or O-C-N (called 1,2,3 or 4 respectively) for each residue in the group. You can give ranges between 0 and 180 degrees.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for conserved residues (SCNCNS)

The command SCNCNS makes WHAT IF prompt you for the degree of conservation at each position in the search profile. Conservation 0 (zero) means that every residue is allowed and possible at this position. Conservation 100 means that only 1 residue is found at this position. Conservation is measured by HSSP.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for position in the chain (SCNPOS)

The command SCNPOS makes WHAT IF prompt you for the absolute and fractional position in the chain. The absolute position indicates where in the chain the first residue of the search stretch is allowed to be. Use negative numbers to indicate a distance to the C-terminus. The fractional distance (values from 0.0 till 1.0) indicates in which part of the protein the first residue of the search stretch is allowed to be.

Examples: To find all C-terminal helical stretches one would combine a HELSHT run with SCNPOS with absolute range -1 till -2 (leaves one residue free at the end), and fractional range 0.0 till 1.0.

To find all Cysteines in C-terminal domains, one would combine a SEQUEN search with a SCNPOS search with absolute range 80 till 1000, and fractional range 0.5 till 1.0.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for backbone h-bonds (SCNHBO)

The command SCNHBO will make WHAT IF ask you for every backbone nitrogen and oxygen in each residue in the search stretch whether it should be hydrogen bonded. If you answer yes, it will ask for the secondary structure type of the residue it is hydrogen bonded to. (Here, as usual, you can answer H,S,T,C or any combination of these, or *)

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Search for side chain chi-angles (CHIVAL)

The command CHIVAL makes WHAT IF prompt you for the number of the side chain chi-angle. You can give here 1 till 5 for chi-1, chi2, chi-3, chi-4, chi-5, respectively. WHAT IF will then loop over the length of the search string, and for every position prompt you for the limits on the requested chi angle. If the database amino acid does not have this chi-angle (like there is no chi-4 in alanine...) then this database hit is not acceptable. The only way to accept everything is by just hitting return when the defaults of -180 180 are suggested. If you actually retype -180 180, every amino acid that possesses this chi-angle is acceptable. If you just hit return, every amino acid is acceptable at this position. If you dont use the suggested default, you have to give 2 values, both in the range -180.0 till 180.0.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Searching for accessibility values (ACCVAL)

The command ACCVAL will make WHAT IF loop over the length of the search string, and for every position prompt you for the limits of the surface accessibility of the residue at this position. Here you have to give 2 values, which are the lower and upper limit for total surface accessibility for the residue at that position. Read the chapter on accessibility calculations about some pitfalls with accessibility values.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Searching sequences using Dayhoff matrix (DAYHOF)

The command DAYHOF will activate the sequence search option that accepts only amino acids which according to the Dayhoff scorings matrix are worth more than a certain number of points. The following scorings matrix is being used:

  V L I M F W Y G A P S T C H R K Q E N D
V 5 2 2 1 0-1 0-1 0-1-1 0-2-1-1-1-1-1-2-2
L 2 5 2 3 2 0 0-2-1-2-1 0-3-1-1-1 0-2-2-2
I 2 2 5 2 0 0 0-2-1-2-1 0-2-1-2-2-3-2-2-3
M 1 3 2 5 2-2-1-2 0-2-1 0 0 0-2-2 0-2-1-1
F 0 2 0 2 6 3 3-3-2-2-1-2-3 1-2-3-3-3-3-2
W-1 0 0-2 3 6 3-2-2-3 0-1-1 0 0-2-1-2-3-3
Y 0 0 0-1 3 3 6-3-2-3 0-2-2 1-1-2-2-1-1-2
G-1-2-2-2-3-2-3 5 0 0 0-1-2-1 0 0-1 0 0 0
A 0-1-1 0-2-2-2 0 5 1 1 0-2 0-1 0 0 1 0 0
P-1-2-2-2-2-3-3 0 1 5 0 0-3 0 0 0 0 1-2 0
S-1-1-1-1-1 0 0 0 1 0 5 2-1 0 1 0 1 1 2 0
T 0 0 0 0-2-1-2-1 0 0 2 5-1 1 0 0 0 1 0 0
C-2-3-2 0-3-1-2-2-2-3-1-1 6 0-2-3-3-3-2-2
H-1-1-1 0 1 0 1-1 0 0 0 1 0 5 2 1 1-1 1 1
R-1-1-2-2-2 0-1 0-1 0 1 0-2 2 5 2 2 0 0-2
K-1-1-2-2-3-2-2 0 0 0 0 0-3 1 2 5 1 1 1 0
Q-1 0-3 0-3-1-2-1 0 0 1 0-3 1 2 1 5 2 1 1
E-1-2-2-2-3-2-1 0 1 1 1 1-3-1 0 1 2 5 1 2
N-2-2-2-1-3-3-1 0 0-2 2 0-2 1 0 1 1 1 5 2
D-2-2-3-1-2-3-2 0 0 0 0 0-2 1-2 0 1 2 2 5

This means that if you request a aspartic acid at a certain position in the search string, and say that the the score should be at least 2 points, that glutamic acid, asparagine and aspartic acid are acceptable at this position.

You will be prompted for the average Dayhoff scoring value first. This is simply the average of the scores for all positions in the search string. Thereafter you will one by one be prompted for the residue at each position in the search string, and its minimal Dayhoff score. If a certain residue is allowed to be anything, just give any residue and -100 or something very negative for the requested minimal score.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Searching for turns (TURNTP)

The command TURNTP allows you to search for turns of a certain type. Turns are defined as a reversal of the chain direction over four residues. These residues are called I, I+1, I+2, I+3. Normally a hydrogen bond is found between residue I, and I+3. WHAT IF will prompt you for the turn type. You should give one of the following names:
I IP II IIP VIA VIB VIII IV
according to the nomenclature of Wilmot and Thornton in J.Mol.Biol, (1988) 203, 221-232 (where P stands for ' or prime).

The following limitations will now be placed on the phi and psi angles of the residues I+1 and I+2:

                      PHI1 PSI1   PHI2 PSI2
I   : TYPE I    TURN ( -60  -30    -90   0)
IP  : TYPE I`   TURN (  60   30     90   0)
II  : TYPE II   TURN ( -60  120     80   0)
IIP : TYPE II`  TURN (  60 -120    -80   0)
VIA : TYPE VIA  TURN ( -60  120    -90   0)
VIB : TYPE VIB  TURN (-120  120    -60   0)
VIII: TYPE VIII TURN ( -60  -30   -120 120)
IV  : TYPE IV   TURN ( ALL OTHERS)
Type IV is only there for completeness. Using it means that you get nearly the whole database as result.

A turn consists of four residues. If your search fragment is longer than four residues, you will be asked to indicate which residue in the search fragment should be the first of the four turn positions. Obviously, this position in the fragment where the turn starts cannot be one of the last three positions in the fragment.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Looking for hydrophobic moment values (SCNHYD)

For a full description of hydrophobic moment calculations see the chapter on this item.

WHAT IF has all hydrophobic moments for all proteins in the database on-line available for repeat angle 100 degrees and window width 7.

The command SCNHYD will make WHAT IF loop over the length of the search string, and for every position prompt you for the limits on the hydrophobic moment at this position. Here you have to give 2 values. All values are allowed. Normal values fall in the range 0.0 till 0.5.

You will then be asked to give the `mismatch` parameter. This mismatch parameter tells WHAT IF how many positions in each hit are maximally allowed to be different from what was requested.

After the very fast search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Finding atomic contacts (SCNCON)

This option is absolutely amazing. However, it is relatively slow, and not very user friendly. The idea is to find atomic contacts. It allows together with the other SCAN3D options to look for example at all buried glutamic acids for which the O-epsilons are not in contact with a basic nitrogen.

In order to use this option, you have to type a lot. For every position in the search string, you will be prompted for the amino acid(s). If you give one amino acid, you can use the subsequent question about which atoms to use to specify individual atoms. If you give multiple amino acids, you can when asked for the atoms, only give `SIDE-CHAINS` or `BACK-BONE`. After the amino acid is known, the same questions as above will be repeated for the residues with which there should be a contact. The same kind of answers as for the residues searched for should be given. The last question per position in the search string is the contact distance. This is the distance between the atom centers minus the two Van der Waals radii. So for just touching atoms, give zero. Since the database does not contains hits where this distance is larger than 1.5 Angstrom it is useless (but not fatal) to give very large numbers. You can also give negative distances to detect `bumps`.

The last thing you will be prompted for is the database range. Just hit return to use the whole range. If you take the full (app. 100 proteins) database, then the average search will take roughly 20 seconds CPU on a VAX workstation.

After the search WHAT IF will tell you how many hits it found and ask you in which group you want to store these hits. The suggested default is just the first free group available. Be aware that there are at present only 10 groups allowed to be active at one time. If you do not want to store the hits, just give group number 0.

Finding fragments with internal contacts (NEACON)

This option allows you to search for fragments that have at least one internal contact of a certain type.

The option NEACON makes WHAT IF prompt you for the position in the fragment of the first residue. It will than ask for the sequence distance of the second residue. If, for example you answer these questions with 2 and 8 respectively fragments will be searched for that have a contact between the residues 2 and 10.

You will be asked to give a contact type and a cutoff distance. The allowed contact types are BB, BS, SB or SS, in which the first B or S stands for the first residue in the fragment, and the second B or S for the second residue in the fragment. The cutoff distance is, as usual, the distance between the Van der Waals radii that is still allowed for to atoms to be called in contact.

E.g. if you use 2 and 8 respectively for the residues, and SB for the contact type, fragments will be searched for that have a side chain atom in residue 2 in the fragment in contact with a backbone atom in residue 10.

Cysteines and cys-cys bridges (SCNCYS)

The command SCNCYS makes WHAT IF prompt you for the cysteine status for each of the positions in the search group. At every position you can give one of the following:
 * (asterix) if this residue is completely free.
-2 if this residue should not be a cysteine.
-1 if this residue should be an unpaired cysteine.
 N if this residue should be a paired cysteine.
N can be zero if you don't care how far down in the sequence this cysteine should be. If you give for example 4, that means that you are going to search for cysteines that are paired with a cysteine that is four residues further in the same sequence. So, this is one of the exceptions where zero is valid input...

Relating groups with logical operations

Creating groups

How to create groups is explained earlier in this chapter. The things one can do with groups should have been explained first, since what can be done with them determines in a sense how one wants to go about making them. A group, also sometimes called subgroup, is a set of peptides all having the same fixed number of amino acids in them. Such a group is the result of searching through the database. Upon searching, the program finds a certain number of hits. All hits are stored, and the user can look at them. A very simple group would for example be: all stretches of 8 amino acids with an alanine in it. This would generate a group with several thousands of hits in it.

these groups can then be combined by means of logical operations.

These operations are AND OR NOT XOR. The user has to type SANDOR, to be able to use one of these options.

The user has several options available to look in or at groups. These options are SHOGRP (shows all groups made) SHOHIT shows hits in a group). Also the option INIGRP is available to clean groups. SETLEN can be used to vary the length of the stretches searched for.

Combining groups by logical operators (SANDOR)

After the command SANDOR is given, the program responds by asking for the number of the first group. Thereafter you will be prompted for the second group, both times you should give a number from 1 to 10, being the number of one of the groups you generated earlier. You will then be prompted for the number of the group that will receive the result. If this result group is already in use, you get the choice to over-write it, or to make another choice. Then you will be prompted for the logical operation. Here you should give one of the following:

AND OR NOT XOR
These operations do the following:

AND creates a new group consisting of all hits that both groups on which it operates have in common.

OR creates a new group which consists of all hits that are present in at least one of the two groups on which it operates.

NOT creates a new group consisting of all hits that are present in one of the two groups on which it operates, but not in the other.

XOR creates a new group which consists of all hits that are present in the first of the two groups on which it operates, but not in the second. (I don't think that this operator will be used very often).

Logical operator (AND)

AND creates a new group consisting of all hits that both groups on which it operates have in common.

Logical operator (OR)

OR creates a new group which consists of all hits that are present in at least one of the two groups on which it operates.

Logical operator (NOT)

NOT creates a new group consisting of all hits that are present in one of the two groups on which it operates, but not in the other.

Logical operator (XOR)

XOR creates a new group which consists of all hits that are present in the first of the two groups on which it operates, but not in the second. (I don't think that this operator will be used very often).

Inverting a group of hits (SCNINV)

The command SCNINV will prompt you for an input group number and an output group number. These may be the same. It will then invert the input group, and store the result in the output group. If you do a logical OR on the input and the output group of this option, you get the whole database back.

Evaluation of results

There are several ways of inspecting the hits found. They are divided in two categories, listing them at the normal terminal, or showing them graphically.

Showing groups (SHOGRP)

The command SHOGRP shows you which groups have been generated so far. Also shown is how each group was generated, and how many hits there are in each group.

Showing hits in groups (SHOHIT)

The command SHOHIT causes WHAT IF to prompt you for the number of the group. You should then give the number of one of the earlier generated groups. Don't worry if you forgot the group numbers, if you type something wrong, WHAT IF will at worst tell you so, but it will not crash. After the group number has been accepted, you will be prompted for the range of hits. Just give the number of the first and the last hit you want to see. Now be ready to use the no scroll option on your terminal, because for all requested hits the program will show you the protein in which the hit was found, the sequence numbers of the hit in this protein, the actual sequence, and the secondary structure of this stretch as determined by DSSP (see appendix D). In case you are looking at a group of fragments created by one of the DG*** options, you will also get the RMS deviation between the C-alpha coordinates of the hit and the stretch on top of which it was fitted. However DG*** groups can not yet fully be mixed with 'normal' groups.

Resetting the groups (INIGRP)

If you want a fresh start, or for any other reason you want to get rid of all the groups you have created so far, you can use the command INIGRP. Be careful with this command, it works immediately, and it is irreversible. Although, you can of course always create all removed groups over again.

Displaying hits.

There are several ways to display hits.

Displaying hits one by one (SCNGRA)

The command SCNGRA makes WHAT IF prompt you for the number of a group. Thereafter you will be asked how many hits you what to see. At present you can give maximally 100 hits. These hits will be coloured red, all superimposed on the first structure (which will sometimes look strange), centered at the present center of the screen as a movie. Click the MOVIE+ and MOVIE- boxes at the right-hand side of the screen to flip through the movie, or use the LOOPER command to loop through the movie automatically.

Display all hits on top of each other (SCNGRL)

The command SCNGRL makes WHAT IF prompt you for the number of a group. Thereafter you will be asked how many hits you want to see. At present you can give as many hits as you wish, but strange things will happen if all these hits together have more than 2500 amino acids in them. These hits will be coloured blue till red as function of their position in the hit list, all superimposed on the first structure (which will sometimes look strange), centered at the present center of the screen, and placed in a MOL-item. You will be prompted for the number of the MOL-object, and the name of the MOL-item.

Display all hit environments (SCNGRN)

This option displays the results of the SCNCON option.

WARNING! This option only works as expected when the middle residues of all hits in a group are the same amino acid type (all alanines, or all cysteines, etc.).

The command SCNGRN makes WHAT IF prompt you for the number of a group. Thereafter you will be asked how many hits you want to see. At present you can give as many hits as you wish, but strange things will happen if all these hits together have more than 2500 amino acids in them. These hits will be coloured blue till red as function of their position in the hit list, all superimposed on the first structure (which will sometimes look strange), centered at the present center of the screen, and placed in a MOL-item. In order to superimpose the structures you will be asked which atoms to use for superpositioning. The parameter setting menu can be used to determine which parts of the hit and its environment will be shown.

The option needs a lot of input. You will be prompted for the atoms in the center residue that should make the contact, for the atoms in the central residue that should be used for superpositioning, for the neighboring residues, and for the atoms in the neighbors that make the contact. This seems somewhat redundant because you typed all this already for the previously run SCNCON option, but I have great plans for these options in the future, and when those are ready, you will understand why.

You will (after roughly 10 seconds CPU on a VAX workstation) be prompted for the number of the MOL-object, and the name of the MOL-item.

Display the environment of a hit (SCNGRE)

The command SCNGRE makes WHAT IF prompt you for a group and a range of hits. It will also prompt you for the atoms in the central residue to be used for superpositioning at the screen, for the atoms in the central residue to be shown, etc. This is the same procedure as for the SCNGRN option.

After all the input, all hits will be shown at the screen, with their entire environments present.

This option is not yet tested.

Other commands

Changing the group length (SETLEN)

The length of the stretches of amino acid that is being searched for, is always a fixed number. I know that that is not optimal flexible for the user, but I could not work out a method with flexible group lengths that could work just as fast as it can do now. If you do not like the length of these stretches, you can use the SETLEN command. Execution of this command will cost several seconds because WHAT IF has to read many new pointer files.

It is possible to do logical operations on groups that have stretches of different length in them. The program only looks at the first amino acid. If these are the same, meaning that it is the same amino acid at the same location in the same protein, then those stretches are for WHAT IF the same.

Defining an extra self made amino acid (SETEAA)

The command SETEAA allows you to create one extra self made amino acid. This one is called the user defined self made amino acid. You will be prompted for a three letter code under which this self made amino acid should be known. This three letter code should of course not be one of the existing real or self made amino acids. After the name is accepted, you will be prompted for the names of the amino acids which make up this user defined self made amino acid. Here you can only give the twenty real amino acids. If you later would like to change this user defined self made amino acid, you can just use the SETEAA command again. The first thing that SETEAA does is, removing the old user defined self made amino acid if it exists.

Listing the self made amino acids (SHOEAA)

The command SHOEAA shows you all presently set self made amino acids. It also shows the user defined self made amino acid if there is one defined already. The following self made amino acids are predefined:

BIG  TRP + TYR + PHE + HIS + ARG + LYS + MET
SML  GLY + ALA + SER
POS  ARG + LYS
NEG  GLU + ASP
POL  ARG + LYS + GLU + ASP + GLN + ASN + HIS
If you want more, you should ask Gert Vriend, but ask very friendly, because it means at least an hour of work.

Getting hits in the soup (SCNUSE)

the command SCNUSE makes WHAT IF prompt you for a group and a hit number. It will then lift this hit from the database, and store it as a separate molecule at the end of the protein range of your soup.

Contact searches (DGCONT)

The command DGCONT allows you to search for pairs of residues that have the same spatial relationship as the pair you give it as example. You will be prompted for a central residue. For this residue you will have to tell which atoms should make the contact with the still to be given neighboring residue. You will also have to give the atoms to be used for superimposing the database hits on the central residue in the soup that you gave. Thereafter you are prompted for the neighboring residue and for the atoms in this neighboring residue that should have a contact with the indicated atoms in the central residue. The last information needed is the contact distance. A contact is considered if the distance between two atoms is less than the sum of this contact distance and the Van der Waals radii of the two contacting atoms.

WHAT IF will now loop over all residues in the database that are of the same type as the central residue given. It will for each of these database hits superimpose (only using the atoms marked for superimposing) this residue on the central one, and apply the superposition transformation on the whole molecule in which the database hit resides. If there is now (in the rotated and translated database protein) a residue of the same type as the given neighbour residue approximately at the same place in space as the indicated neighbour, then this pair will be marked as a hit.

Don't worry about the stupidity of this algorithm. In reality it works a little bit different, but that is way to difficult to explain.

All hits found are stored in a group, send to the MOVIE area, and upon request send to a mol-item. This is since the neighboring information is not stored in the group, so if you later want to look at this contact group again, you will have to redo the whole option.

'Approximately being at the same place' is defined as the average distance between the equivalent atoms being less than a certain cutoff. The default value is 4 Angstrom. Use the PARAMS option to change this cutoff.

Screening with the Dayhoff matrix (DOSCAN)

The command DOSCAN can be used to get hits out of the database that give a minimal score against a stretch of residues in the soup when compared using the Dayhoff matrix.

You will be prompted for a range and a minimal score. All stretches in that ranges will be compared with all stretches in the database. Every time that a hit is found that gives when compared with the stretch in your molecule no mutations that score below the Dayhoff cutoff given, one is added to the protein in which that hit was found. At the end a list with the number of hits per protein is shown.

Saving all groups (SAVGRP)

The command SAVGRP will save all present groups in a file. You will be prompted for the name of the file in which to store the groups.

Restoring groups from file (RESGRP)

The command RESGRP will restore the groups from a file. You will be prompted for the name of the file from which to restore the groups. This file must of course be created earlier with the SAVGRP option.

Parameters (PARAMS)

The command PARAMS will, as usual, bring you to the menu from which you can change the parameters for the SCAN3D related options. The following parameters are available:

What to show of the hit itself (HITBBF)

The SCNCON and related options in the SCAN3D menu allow for an elaborate scheme of visualization possibilities. Some flags refer to the central residue (the hit self), others to the contacting residue(s) (the environment)

HITBBF determines what to show of the hit itself.

0 = Show the three middle amino acid of the hit completely
1 = Show the whole hit completely. all aa, all atoms
2 = Show backbone of full hit plus side chain of center
3 = Draw only the tagged atoms for the center of hit
4 = Draw only side chain of middle aa of hit

What to show of the hits environment (HITNAF)

The SCNCON and related options in the SCAN3D menu allow for an elaborate scheme of visualization possibilities. Some flags refer to the central residue (the hit self), others to the contacting residue(s) (the environment)

HITBBF determines what to show of the contact partner.

0 = Show the residue completely
1 = Show only the side chain
2 = Show only the main chain
3 = Draw only the tagged atoms

SCNCON on one residue or whole stretch (SCONTP)

The CONTYP parameter is rather futuristic. It allows you to put constraints on contacts in the SCNCON option on multiple residues at the same time. This will very often lead to no hits to be found at all. Put this flag at 1 to enable multiple constraints.

Flag to determine where SCNGRN hits should go (HITOUT)

The SCNGRN option puts contact hits at the screen. If you set the HITOUT flag to 1, the hit administration will also be written in a file.

Statistics on groups (SCNSTS)

The command SCNSTS brings you in the menu that allows for the evaluation of groups.

Bring you in the SCAN3D parameter menu (PARAMS)

As usual PARAMS brings you to the menu to modify SCAN3D parameters. SCNSTS and SCAN3D share this parameter menu.

Activating more commands (MORE)

Not all commands are immediately active in the SCAN3D menu. By typing MORE, more commands will be activated. (Use LESS to deactivate the extra commands again).