GROMACS related commands (GROMACS)

Introduction.

The main menu option GROMACS brings you to the WHAT IF interface ("W(H)AG") to a popular molecular dynamics package called GROMACS. From this menu you can run several GROMACS related commands, ranging from a simple energy minimisation to starting a full scale molecular dynamics simulation including an arbitrary number of ligands, which from now on generically are called drugs. To run MD with GROMACS via the WHAT IF interface, three programs need to work together, and understand each others atom order and nomenclature:

WHAT IF understands proteins and strictly uses the coordinates as provided in the PDB-file as well as IUPAC nomenclature and atom order.

GROMACS uses its own, non-standard atom names and atom order. Also, GROMACS cannot deal with every atom type (e.g. metal ions), and it has some quircks like not understanding that aspartic acid can be (must be) protonated on the O-delta-1, or not knowing that histidines can be deprotonated. Also, GROMACS requires very elaborate descriptions, called topologies, for most ligands.

PRODRG sets up the drug topologies required by GROMACS for both the GROMOS96 and OPLS forcefield (support for CHARMM and AMBER99 is planned). PRODRG uses its own atom naming rules, atom order, and also changes the coordinates of the molecule to be parametrised. WHAT IF cares for these changes and updates the drug pdb files accordingly. Bound ligands are a special topic and much development has gone into that.

W(H)AG, finally, includes all individual topologies into a global system topology, sets all neccessary parameters and starts GROMACS. BR The basic philosophy behind the GROMACS menu is, that WHAT IF checks your structure(s), adds missing atoms and, through PRODRG, calculates topology parameters for any molecule, that GROMACS doesn't know about yet. After that, WHAT IF creates a W(H)AG control file, by default called "GROMACS.SCR", which the user can interactively edit. The file contains the options needed to properly set up, e.g., the protonation states of the titrable groups or which drug topologies to use. Many more parameters can be set and are described in detail below. The control file is then parsed by W(H)AG, the selected task will be set up and a subsequent GROMACS run will be started. After completing the run, W(H)AG performs some standard analyses on the trajectory. WHAT IF can read the results back in. Normally the result is an energy-minimised structure that can overwrite the coordinates in the soup, or a molecular dynamics trajectory that can be viewed or analysed.

Note: If you want to work with GROMACS, you need to use the proton-TOPOLOGY file for WHAT IF prior to calling WHAT IF. You do that by copying the file TOPOLOGY.H from WHAT IF's dbdata directory to the local directory, in which you currently work.

GROMACS fails bitterly on proteins with chain-breaks. So WHAT IF will not even send things with chain breaks to W(H)AG/GROMACS. In future versions there might be an option of loop modelling ro remedy this limitation.

Remember to be self-critical about the outcome of MD simulations and do comparison runs! Also make use of the built-in automatic analyses, which are performed at the end of each run. The files in question start with the prefix "SUBDIR/analysis.whag" (see Analyses/Trouble Shooting).

The combination of WHAT IF, PRODRUG and, through W(H)AG, GROMACS allows you to avoid common pitfalls when setting up MD runs as well as opening up a very fast way (WHAT IF and W(H)AG are completely scriptable) of investigating mutation effects, effects of salt concentrations and drugs (etc.). W(H)AG also creates run input files for as many CPUs as specified, to easily start your simulations - manually - on your favourite cluster (see NCPU).

The main options

The main options in the GROMACS menu are FASTEM, FASTSA, and FASTMD. These perform an energy minimisation, a simulated annealing and a molecular dynamics run, respectively.

The FAST** options start by extensively checking all kinds of things such as missing side chains, wrong bond lengths, positions of hydrogens in the total hydrogen bonding network. In case of ambiguous or unsolvable situations you will get warnings, which should be carefully checked. Make sure to double-check in cases where you get warnings that a given issue is assumed to be "ok" !

The FAST** options assume that your whole SOUP is one normal situation. If you have superposed things, residues/waters/etc from multiple PDB files, multiple docked drugs, or other things that should not be present in a PDB file, the FAST** options might fail.

The FAST** options will only use the first structure of NMR ensembles!

Energy minimisation (FASTEM)

The command FASTEM will cause WHAT IF to prompt you for a range of residues. After extensive checking of this range, GROMACS will be started to perform that minimisation. You will be warned for many thinkable and unthinkable problems in your coordinates that would preclude a successful GROMACS run, and suggestions will be made about how to fix those problems. Many problems will even be fixed automagically.

Where possible, default parameters are being used. See the GROMACS parameter menu for a description of those parameters and for possibilities to change them.

    OUTPUT FILES:
           OUT_EM.PDB
           Prep.log
           EM.log

Energy minimisation by simulated annealing (FASTSA)

Simulated annealing can be switched on by specifying SATYPE. This can only be "single" at present. By specifying SANPNT, SATEMP and SATIME you can determine how many levels of temperature you want to reach and at what absolute runtime. Ex.: SANPNT:3; SATIME 0 100 1000; SATEMP 500 400 298. This will set up an SA run starting at 0ps with temperature SATEMP(1)=500K, arriving at SATEMP(2)=400K at SATIME(2)=100ps and continuing to reach SATEMP(3)=298K at SATIME(3)=1000ps;

Sometimes, a molecule gets stuck in a high-energy local minimum in energy space. In such cases it might help to shake the molecule a bit. Simulated annealing is just the 'right' way to perform that little shaking.

The command FASTSA will cause WHAT IF to prompt you for a range of residues. After extensive checking of this range, GROMACS will be started to perform a simulated annealing run. You will be warned for many thinkable and unthinkable problems in your coordinates that would preclude a successful GROMACS run, and suggestions will be made about how to fix those problems. Many problems will even be fixed automagically.

Where possible, default parameters are being used. Again, see the GROMACS parameter menu for a description of those parameters and for possibilities to change them.

    OUTPUT FILES:
           OUT_SA.PDB
           OUT_SA_TRAJ.PDB
           Prep.log
           SA.log

Molecular dynamics (FASTMD)

If you feed Gert a beer he tells you his clear opinion about MD. Nevertheless, even Gert agrees that molecular dynamics (MD) is often a good technique to answer questions about protein stability, ligand binding, potential mutations, membrane interactions, etc. And MD can aid the modeller in many ways that are explained, e.g., in the homology modelling course that you can find on my WWW pages.

The command FASTMD will cause WHAT IF to prompt you for a range of residues. After extensive checking of this range, GROMACS will be started to perform a molecular dynamics run. You will be warned for many thinkable and unthinkable problems in your coordinates that would preclude a successful GROMACS run, and suggestions will be made about how to fix those problems. Many problems will even be fixed automagically.

Where possible default parameters are being used. As with the other FAST** commands see the GROMACS parameter menu for a description of those parameters and for possibilities to set them to your desired values.

    OUTPUT FILES:
           OUT_MD.PDB
           OUT_MD_TRAJ.PDB
           Prep.log
           MD.log

Other options from within WHAT IF

Synchronisation of software (DOSYNC)

The command DOSYNC simply does the minimal amount of work needed to synchronize all atom name and number schemes between WHAT IF and W(H)AG. DOSYNC also does all molecular validation and correction that normally are done by the FAST** options. Even though DOSYNC was originally written as a debug-option, we will leave it in for people who want to do massively parallel MD experiments (i.e. drug docking etc) because DOSYNC takes very little time, and if DOSYNC runs happily, the FAST** options are also likely to run. Additionally, after a successful DOSYNC the command MAKMOL will create a "clean"GERT PDB File, albeit that the coordinates might not reflect the experimental data. Use DO_CAL, DO_HBO, DO_NOR and DO_WAT only with DOSYNC DO_CAL DOSYNC: Add missing atoms or not 245 DO_HBO DOSYNC: Optimize the hydrogen bonding network or not 246 DO_NOR DOSYNC: Normalize geometries or not 247 DO_WAT DOSYNC: Do water corrections or not GERT: ist eine zusaetzliche Beschreibung noetig?

For DOSYNC only: whether to add missing atoms (DO_CAL)

GERT

(DO_HBO)

(DO_NOR)

(DO_WAT)

Listing the synchronisation results (LSTATM)

The command LSTATM is mainly a debug option. It will list for you all atoms and their corresponding atom number in the GROMACS coordinate file. Atoms that have zero as pointer number simply aren't used by GROMACS. This can either mean that you did not select them, or that GROMACS doesn't want them (e.g. non-polar hydrogens).

Disabling non-polar hydrogens (GRORHH)

WHAT IF works with all protons, polar and non-polar alike. When you use the GMX force field, GROMACS uses only polar hydrogens. If you want to use WHAT IF to look at a movie you need to synchronize the WHAT IF soup with the GROMACS atom names and atom order (See above and under Synchronizing in the trajectory movie chapter). As WHAT IF has the non-polar hydrogens in the SOUP, but GROMACS (under GMX) doesn't use those, you need to get rid of them in WHAT IF. This can be done with the GRORHH command. You can get those non-polar hydrogens back at a later stage with the commands in the HBONDS menu. Obviously, you can also use the SAV*** and RES*** commands in the SOUP menu.

GROMACS project status (GRSTAT)

The command GRSTAT tells you what was your last used set of W(H)AG parameters used to activate GROMACS. It also tells you if there is presently some GROMACS job running. GERT this actually only prints GROMACS.SCr to the screen EOF_GERT

Parameters for GROMACS (PARAMS)

The command PARAMS brings you to the menu from which you can change the parameters for GROMACS. You can change the following parameters:
    DO_CAL  DOSYNC: Add missing atoms or not
    DO_HBO  DOSYNC: Optimize the hydrogen bonding network or not
    DO_NOR  DOSYNC: Normalize geometries or not
    DO_WAT  DOSYNC: Do water corrections or not
    DOSYNC  Solely perform molecular validation and correction done by the FAST**
    DSTBOX  Minimal distance between solute and box in nanometer
    FIXCAS  Restraining C-alphas in MD run
    FIXFRC  Degree of restraining by FIX... options
    FIXHST  Should helix and strand C-alphas be restrained in MD runs
    FIXMAN  Determine `by hand` which residues are restrained in MD runs
    FORCEF  Which force field should be used
    KEPWAT  Keep waters in trajectory file or not
    LEVUSE  Degree of WHAG-interactivity
    MDSTEP  Number of pico-seconds in MD run
    NACL    Concentration of NaCl in the water in milliMol
    NCPU    Maximal number of CPUs to be used in MD simulations
    OUTFRQ  How much trajectory output per pico-second
    PDBXTC  Choice of trajectory format (PDB, XTC)
    SANPNT  Number of points in simulated anealing
    SATEMP  Temperatures in simulated anealing
    SATIME  Time points in simulated anealing
    SATYPE  Simulated anealing 'type'
    SHOPAR  Show the values of all parameters
    STEPS   Maximal number of STEPS in energy minimisation
    TEMP    Temperature of MD simulations
    TRAJEC  Get trajectory file or not
    USEWAT  Adding bulk water to simulation box - recommended 
    VERBOS  Degree of WHAG verbosity interface
The remainder of this section describes these parameters and explains the range of values you can choose from.
At the end of each description the corresponding WHAG parameters are listed. These are to be used, after lauching one of the FAST** commands from the GROMACS menu.

DSTBOX Defines distance between solute (e.g. protein) and box (DSTBOX)

DSTBOX is the shortest distance between any solute molecule and the edge of the waterbox. This parameter is ionly applicable if you use bulk water. The default is 1.0 nm. Accepted values are 0.4-4.0 nm. Value ..... = 1.000 A minimum of 1.0nm is recommended to avoid artefacts (see periodic image convension).
Corresponding WHAG parameter: DIST_SOL_BOX

Experts option: Fix alpha carbon positions (FIXCAS)

This parameter determines whether alpha carbons will be forced to stay near their original position. This option works on ALL alpha carbons. Possible values are NONE, LOW, MEDIUM, HIGH. The default is NONE.

FIXMAN has priority over FIXCAS.

Experts option: Set force to be used in restraint simulations (FIXFRC)

This parameter determines which force is applied to the atoms chosen with FIXCAS, FIXHST or FIXMAN. The selected atoms will be forced to stay near their original position. Possible values are "NONE", "LOW","MEDIUM","HIGH". The default is NONE.
Corresponding WHAG parameter: DFIX_FORCE ("NONE", "LOW","MEDIUM","HIGH")
If FIXCAS, FIXMAN or FIXHST are used DFIX_FORCE will be set "B-FACTOR".
JUERGENCHECKME: This will override the WHAG parameter POSRES.

Experts option: Restraining C-alpha atoms in helix and strands (FIXHST)

The selected atoms will be forced to stay near their original position. Possible values are NONE, LOW, MEDIUM, HIGH. The default is NONE.

Experts option: Selecting the atoms to be restraint manually (FIXMAN)

This parameter will cause WHAT IF to interactively prompt you for ranges on which to apply HIGH, MEDIUM, or LOW restraining forces, respectively.

Choosing the force field (FORCEF)

The FORCEF pararameter allows you to choose between force fields. Your choices are GROMOS96 (0, which is the default, and 1 for OPLS). GROMOS96 does not use explicit hydrogen atoms except for the non-polar hydrogens. OPLS includes all atoms and is the default.
Corresponding WHAG parameter: FORCEFIELD ("GMX", "OPLS")

Experts option: Keeping TIP4P dummy atoms (KEPTP4)

This is mainly intended to be for script use and should not be switched on, unless you know what you are doing. If set to 1, the TIP4P dummy atoms will not be deleted from the final output file (e.g. OUT_MD.PDB)! Note, that in SUBDIR/ all necessary files to extend/rerun the W(H)AG run, regardless of the settings of this switch.
Corresponding WHAG parameter: KEEP_TIP4P ("YES", "NO")

Keep waters in trajectory file (KEPWAT)

Determines, whether bulk water will be included in the final output file. KEPWAT has two values: "YES" and "NO". KEPWAT depends on USEWAT and is ignored when USEWAT is set to "NONE" or "XRAY".

When USEWAT is set to "BULK", KEPWAT determines what is written in the trajectory file. With KEPWAT set to "YES", everything is written in the trajectory file, including the bulk waters. With KEPWAT set to "NO", only what you put in yourself is written into the trajectory file (so only macromolecules if USEWAT was set to "NONE", or macromolecules and crystal waters if USEWAT was set to "XRAY".

Corresponding WHAG parameter: KEEP_WATER ("YES", "NO")

W(H)AG interactivity (LEVUSE)

This switch is intended to be used in script mode with no interactivity. Set to "0" for script mode and to "1" for normal interactive sessions

Steps in MD (MDSTEP)

This parameter determines the number of ps in the continuation MD-run. (1-10000). Default = 1.

WARNING. MDSTEP is the number of pico-seconds and not the number of steps!

Corresponding WHAG parameter: DURATION 

Setting Salt Concentration (NACL)

When using BULK water, this option allows to add NaCl ions to reach the specified concentration (physiological conditions are around 154 mM)
Corresponding WHAG parameter: CONC_NACL 

Running GROMACS in parallel (NCPU)

GROMACS is a very powerful MD package and can be run on many processors using the message passing interface (MPI). Just set NCPU to the number of CPUs in your system.

If you chose a number higher than the actual number of cpus in your system, NCPU will be automatically decreased. In SUBDIR/ a X_topol.tpr file will be created a so-called run input file for GROMACS. This file will be compiled with the originally specified number of CPUs to be run on any cluster later.

Choose NCPU to be a multiple of 2, since every time step two fourier transformations are performed and these cannot be split over an odd number of cpus.

Corresponding WHAG parameter: NCPU 
GERT

Snapshot frequency in trajectory file (OUTFRQ)

Allows you to specify the frequency of writing snapshots to the trajectory file. This is in units `per pico second'. The default is 0.25. So, if you want a finer grained movie choose a higher value, a value often used for trajectories around 100 ns is 1.
Corresponding WHAG parameter: OUTFRQ 

Output format (PDBXTC)

GROMACS uses a special compressed format (.xtc), that is considerably smaller than standard multi-model PDB file. The PDBXTC flag allows you to choose between this format and the multi-model PDB format.
Corresponding WHAG parameter: OUTPUT_FORMAT 

SANPNT

Corresponding WHAG parameter: SANPNT 

SATEMP

Corresponding WHAG parameter: SATEMP 

SATIME

Corresponding WHAG parameter: SATIME

SATYPE

Corresponding WHAG parameter: SATYPE 

Listing all parameters for GROMACS (SHOPAR)

Displays all settings available in GROMACS/PARAMS

Number of energy minimisation steps (STEPS)

This parameter determines the maximal number of steps in an energy minimization. If the minimization is completed before this number of steps is reached, minimization is stopped earlier. STEPS can range from 1 till 500.
Corresponding WHAG parameter: PREMD_EM_DURATION 

Simulating temperature (TEMP)

This parameter determines the temperature during an MD-run (1.0-999.9), default 300 K. Please talk with an expert before playing with this parameter.
Corresponding WHAG parameter: TEMPERATURE 

Write a trajectory file or not (TRAJEC)

This parameter determines whether trajectories are written out during an MD run or not. Trajectories are always written formatted, except for the optional XTC coordinate trajectory.

Set to "0" to only obtain the last structure from the simulation and to "1" for a complete time series (trajectory) of coordinates OUTFRQ determines how many snapshots are written if TRAJEC is set to "1".

Corresponding WHAG parameter: GERT 

How to include waters (USEWAT)

This parameter has three possible values: "NONE" (0), "XRAY" (1), "BULK" (2). NONE means that GROMACS runs in vacuum. "XRAY" means that GROMACS runs in vacuum, but with the crystal waters present, "BULK" means that GROMACS runs in a water box(with water molecules explicitly present, an implicit solvation scheme is not in this version).

See further also under KEPWAT.

Warning. MD runs without BULK water here do not make much sense, since the alternative is running in vacuum. It is therefore recommended to include bulk water in all MD simulations.

W(H)AG verbosity parameter (VERBOS)

The VERBOS flag determines the verbosity of W(H)AG. Allowed values are LOW (0) and HIGH (1).

W(H)AG parameters and fine tuning

The interface between WHAT IF and GROMACS is called W(H)AG, maintained and developed by Juergen Haas, based on work started by Oliver Lange. The idea is that W(H)AG mediates between WHAT IF and GROMACS. WHAT IF hands over coordinate files and control files via W(H)AG to GROMACS, where the W(H)AG control files describe what WHAT IF wants GROMACS to do with those coordinates. W(H)AG also makes sure that the GROMACS output files are named according to what WHAT IF expects them to be called.

This remaining of this chapter describes the commands that WHAT IF can write into these W(H)AG control files.

Overview of the W(H)AG control parameters

The following W(H)AG control parameters are supported:
Parameter        Type         Explanation


CONTINUE         YES/NO       Continuation after a previous PREPARE run
                              (set automatically)

CONC_NACL        Integer      Concentration (in mMol) of additional ions 
                              (not the atoms to neutralize the bulk,
                              these are added automatically)

DISRES           Filename     File defining distance restraints (e.g. from NMR
                              experiments)

DIST_SOL_BOX     Real         Defines distance between solute (e.g. protein) and box 
                              (in nm, default 1.0, also valid for vacuum simulations)

DRUGDIR         Directory     Name of Directory where your manually optimised 
                              topologies for drugs reside in. Beware that the naming 
                              of the files must be matching the order found in the
                              original PDB file. The topology and the structure files
                              for the first drug must be called DRG1.ITP/DRG1.PDB 
                              and for the second DRG2.ITP/DRG2.PDB (etc.)!
                              When selecting this option, no PRODRG topologies will 
                              be used. This is implemented to benefit from manually 
                              parametrised drugs and due to the suboptimal charges 
                              PRODRG can only deliver at present.

DSSP            YES/NO        At the end of the run, this option will activate the 
                              DSSP program to be run over the whole trajectory

DURATION         Integer      Number of picoseconds in MD or SA run and number of 
                              steps for EM.

DRUG_FILES       List of      Names of drug files x for which 
                 filenames    x.ITP and x.PDB exist.

EXTRA_BONDS      Filename     List of bonds which are to be included
                              into the system additionally.

DFIX_FORCE        Selection    Force to keep all atoms in place. Values
                              NONE, LOW, MEDIUM, HIGH, B-FACTOR.
                              Applied to all the atoms given in the file
                              PDB_FILE    

KEEP_WATER       String       Keep waters in GROMACS output trajectory for viewing 
                              in WHAT IF: YES or NO

NCPU            Integer       Indicates a parallel execution of GROMACS if larger than 1.

OUTPUT_STRUCT    Filename     final coordinate file after EM,MD or SA

OUTPUT_TRAJ      Filename     Name of trajectory file

OUTPUT_FORMAT    Filename     Format of trajectory: PDB or XTC

OUTFRQ           Integer      Number of MD steps per trajectory step

PARAM_FILE       Filename     EXPERT: use this .mdp file for GROMACS run
                              parameters (see also DRUGDIR)

PDB_FILE         Filename     Name of input coordinate file for GROMACS

POSRES           YES/NO       Should position restraint run be included, usually 
                              used to relax bulk water.

PROTONATION_STATES Filename   File with a list of LYS, ASP, GLU and HIS
                              residues and their respective 
                              protonation states(0 is unprotonated)

TEMPERATURE      Integer      (Initial) temperature of SA/MD run

SATYPE           string       Can only be "single" at present and is necessary to 
                              switch on simulated annealing.

SANPNT           Integer      Defines how many levels of temperature will be 
                              simulated at during a simulated annealing run.

SATEMP           Array of Int temperature in K for simulated annealing. For 
                              Explanation and Example see SATIME.

SATIME           Array of Int Absolute time at which the next level of temperature
                              will be reached. Depending on how many SANPNT were 
                              defined!(the same holds for SATEMP only that 
                              temperatures are expected)
                              Ex.: SANPNT:3; SATIME 0 100 1000; SATEMP 500 400 298.
                              This will set up an SA run starting at 
                              0ps with temperature SATEMP(1)=500K, 
                              arriving at SATEMP(2)=400K at SATIME(2)=100ps 
                              and finally reaching SATEMP(3)=298K at SATIME(3)=1000ps

SUBDIR           Dirname      Name of scratch dir for GROMACS files
                              (Must be subdirectory of working directory for which you
                              have write permissions)

TASK             EM/SA/MD/    Task to perform
                 PREPARE

USE_HYDROGEN    YES/NO        Use GROMACS routines to determine protonation states of 
                              HIS, LYS, ASP, GLU and termini.

VERBOSE          YES/NO       Whether GROMACS should be verbose: YES or NO

VERBOSE_FILE     Filename     File in which GROMACS writes messages

WATER            XRAY/BULK    Options of how water is included in simulations: 
                              none - deletes all water atoms prior to simulation
                              XRAY - use water present in XRAY structure 
                              BULK - add bulk water to simulation box
                              (see DIST_SOL_BOX) 




Note, that all of the W(H)AG parameters can be set by the WHAT IF user, but not every stage of operation is sensitive to all parameters specified. Parameters specified in the PREPARE stage of W(H)AG might need to be entered again for the MD/EM/SA stage. WHAT IF will supply you with a standard set of options and include the required switches when, e.g., encountering drugs.

Control file details

The default name of this file containing the W(H)AG parameters is "GROMACS.SCR".

Setting the stage - 1

Type of task: EM, SA, MD or PREPARE PREPARE means only the simulation setup is performed and will be the default primary step. The purpose of this two-stage approach for WHAT IF is to find out how GROMACS numbers and names the atoms.

Setting the stage - 2

Applies to EM, SA, MD To continue the run after the preparation step set CONTINUE="YES". (Don't change anything in SUBDIR/ between the runs).

NOE-like distance restraints

this file determins distance restraints, for instance from NOE experiments. It has the following format:
AtomIndex1 AtomIndex2 lb ub LOW/MEDIUM/HIGH
LOW/MEDIUM/HIGH gives the strength of the spring which will be interpreted as 0.1 / 1 / 5 * 1000kj/nm2. "lb" "ub" are parameters determining the shape of the restraining potential. if the distance is between "lb" and "ub" no force is applied. If distance leaves the intervall ("lb","ub") the force will increase linearly with the spring constant given above.

The example: 10 22 0.1 0.2 MEDIUM

will introduce a medium strength restraining force between atom 10 and atom 22 with the distance lying between 0.1 and 0.2 without being affected.

Assigning restraint forces

WARNING. WHAT IF does not use this W(H)AG facility. If you want to use this, you must enter it in the GROMACS.SCR file by hand when this file comes up in the editor during execution of DOSYNC, FASTEM, FASTSA, or FASTMD.

The restraint force is applied to all atoms in the input-pdb file with nonzero B-factor. Depending on the setting of the switch DFIX_FORCE we use the same constant for all restrained atoms or the respective number of the B-factor for the strength of the restraint. The settings where we ignore the numerical value of the B-factor and use the B-factor only as an on/off switch are:

LOW 
MED 
HIGH 
which will be 0.1, 1.0, or 5.0 respectively in units of 1000 kJ/mol/nm^2.

Other possible settings are:

NONE       means there are no restraints applied regardless the values 
           of the b-factors.
B-FACTOR   leads to an application of a spring constant to every atom 
           determined by its B-factor times 1000 kJ/mol/nm^2.
If you want to use the hydrogen positions of WHAT IF or apply position restraints, you have to do a TASK=PREPARE run of W(H)AG and write the correct hydrogen positions into the PDB file you get back from W(H)AG. To apply position restraints change the b-factor in the returned file from 0 to some positiv value. Depending on the setting of DFIX_FORCE this B-factor will determine the applied restraining potential or it's numerical value will be ignored and an equal restraining strength is used for all postion restraints, i. e. the B-Factor is only used as a switch in that case.

Including a position restraint run

Before any MD run it is generally recommended to perform a position restraint run, i where all heavy-atoms are kept in position to relax the system. Set POSRES to "YES", note, that if the WHAT IF commands FIXYYY (YYY=CAS,FRC,HST,MAN) will override this switch and no general position restraint run will then be performed. Instead, the position restraints encoded into the B-factors will be included.

Setting the input structure

This determines the filename of the input structure file. This file holds all protein fragments passed on to GROMACS.

Protonation states file

This file holds a list of protonation states. This file contains a list of titrable groups; first all LYS residues, then all ASP residues, then all GLU residues and last but not least the HIS residues. They have to be in the same order as in the PDB_FILE. The format is
LYS 3 0
where the first is the residue name, the second is the residue number starting with 0 and the third is the protonation state in the case of LYS, ASP and GLU:
    0 deprotonated
    1 protonated
in the case of HIS it is:
        0 H on ND1 
        1 H on NE2 
        2 H on ND1 and NE2 
        3 Coupled to Heme 
Note, that GROMACS cannot deal with unprotonated histidines.

Setting the output structure

The last set of coordinates computed by GROMACS will be returned in this pdb file; e.g. the structure after DURATION MD steps or after an energy minimization. In the case of EM this is the main result, in the case of MD or SA the trajectory is more important.

Setting the output trajectory

During an MD run, GROMACS will write a trajectory file, specified by OUT_TRAJ (for possible formats see OUTPUT_FORMAT). This file can later be viewed by the WHAT IF trajectory movie options or viewers such as pymol and VMD. It contains snapshots taken during the GROMACS computation every OUTFRQ steps.

Setting the format of the output trajectory

OUTPUT_FORMAT determines the format of the trajectory file OUTPUT_TRAJ. Valid values are:
PDB       (as an multi-model PDB file, needs a *lot* of disk space!)
XTC       (a compressed GROMACS specific format).

Setting the output frequency

The trajectory file will contain snapshots taken during the GROMACS computation every OUTFRQ picoseconds.

Handling water molecules

If this parameter is set to NO (default) all the automatically added "BULK" water and the bulk ions will not be written in the output and trajectory files. If there had been some "XRAY" water molecules in the input file, and they were used by setting the USEWAT parameter appropriately in WHAT IF, their position will be returned regardless of the setting of KEEP_WATER.

Setting the verbosity

W(H)AG can be very noisy and tell you about every step it does. Say YES here, if you need diagnostics. All the messages are written to the file VERBOSE_FILE. If the setting of VERBOSE is NO only the very basic information is written to the VERBOSE_FILE.

Setting the file name

All messages are written to the file VERBOSE_FILE. If VERBOSE is set to "YES" this file might become very large.

Parallel simulations

When using WATER is set to "BULK", setting NCPU larger than 1 will greatly enhance the performance and shorten the runtime. This is achieved by GROMACS supporting the MPI (Message Passing Interface). In case you set NCPU to a number larger than the CPUs in your system, a second run input file (xx_topol.tpr, where xx is the number of CPUS) will be created in SUBDIR. This will not be run locally, but is intended to be used on a cluster where the number of CPUs specified is actually present and available.

Setting the simulation run length

Number of picoseconds in MD or SA run and number of steps for EM.

Setting the W(H)AG/GROMACS working directory

W(H)AG creates many files while examining your system to prepare it for GROMACS. These files are all stored in the subdirectory SUBDIR. You don't have to look at them. But if you are curious...

You can delete the SUBDIR after you finished a W(H)AG computation, all the important information will be in your working directory.

PLEASE NOTE: the SUBDIR has to be a subdirectory of your working directory to ensure proper execution of W(H)AG.

WHAT IF uses "SCRATCH" as the default name for SUBDIR.

Adjusting the NaCl concentration

Adjust the salt concentration (in mMol/L) by adding NaCl to the bulk water, this option requires WATER to be set to "BULK" (see WATER below).

Handling of water molecules and solvent

This parameter determines, if solvent (water) is to added to the system. If any water molecules are given in the input file, they are used regardless what this parameter says. If the choice is "BULK" solvent water molecules are added to fill the simulation box additionally to the existing water. The solvent atoms come from an pre-equilibrated water box. For every water molecule of the pre-equilibrated water box GROMACS decides if it has space in the system and if so copies it into the simulation box. If "BULK" is chosen W(H)AG automatically adds counter ions, to balance a possible overall system charge, to the solvent if you want to do a MD or SA simulation. This is a technical necessity, due to the chosen algorithm computing the coulomb interactions.

If "XRAY" is chosen W(H)AG performs a simulation in vacuum which uses different algorithms so that no counterions are necessary. Running in vacuum results in shorter runtimes, however, "Bulk" is the recommended setting. In combination with adjusting NCPU to the number of desired CPUs and due to enhanced algorithms for water interactions GROMACS achieves a high performance per CPU. Currently there is no option for running implicit solvent simulations.

If you choose BULK to simulate at a certain NaCl concentration, you can add more ions to the solvent, by specifying the desired concentration with the parameter CONC_NACL.

Distance between solute and box

Defines distance between solute (e.g. protein) and box (in nm, defaults to 1.0nm). This is also valid for vacuum simulations. It is desirable (due to the periodic image conventions used that the protein of one box and its imaginary neighbour boxes do not interact with one another. A "safe" minimum distance is 1.0nm, if unsure increase slowly while taking into account that in the preferred "BULK" setting for WATER, this amounts to many more solvent molecules slowing down the simulation. W(H)AG currently does not support other box shapes than cubic/rectangular.

Choose the mode of determining hydrogen positions

This determines if W(H)AG asks GROMACS to find positions for hydrogen atoms in the protein structure itself (NO) or to USE the hydrogen positions given in the PROTONATION_STATES file (YES).

The drugs to be included in the run

A list of filenames. For every name X in this list W(H)AG expects to find the two files X.PDB and X.ITP in DRUGDIR. This are the structures and topologies of small molecules which cannot be understood by GROMACS as directly as protein structures. These topologies are currently prepared by PRODRG. See also DRUGDIR!

Defining new covalent bonds

Assuming your system is comprised of a protein input file and two drug files: DRUG_FILES DMSO SUGAR when asked, you can write the following to set additional bonds:
DMSO 3 MAIN 299      ; bind the third DMSO Atom to the 299th atom of the
             ; PDB_FILE

MAIN 299 DMSO 3      ; the sequence of the two atoms is not important

MAIN 1741 MAIN 1749  ; creates a non-canonical bond between two protein atoms

DMSO 39 SUGAR 23     ; bonds between drug fragments (you might be better
             ; off defining a single drug DMSO_SUGAR because then
             ; the bond will be parameterized better)
Be aware, though, that this is still a rather crude way of solving this problem. If you need reliable results on the EM/MD of residues with attached groups, you better get help from a real gromacologist.

Fine tuning of the simulation parameters and setup

For expert experts only... the MD parameters

For experts: here you can specify your own MDP file. The MDP file is the molecular dynamic parameter file. Here you can tell GROMACS which kind of algorithms should be used, for instance steepest descent or conjugate gradient for EM, or if it uses PME or cutoff to deal with coloumbic forces in the MD simulation. If you don't know what these things are, you probably don't want to use this option.

If this option is not used W(H)AG finds the MDP files under ../whatif/gromacs/whag/config/ if you want to make your own MDP file you can copy a suitable default file into your working directory and change the parameters as required.

If you then tell W(H)AG to use your private PARAM_FILE it will do so, but it will still change certain parameters like the number of steps to fit the setting of DURATION. That means all the options in the W(H)AG control file overrule settings in the PARAM_FILE. But there are not many paramaters, which are actually bound to be overruled, W(H)AG currently changes the following mdp parameters:

"annealing", "zero-temp_time", "annealing_temp", "annealing_time", "annealing_npoints", "tc-grps", "tau-t", "gen-temp", "ref-t", "xtc-grps", "xtc-grps", "tc-grps", "tau-t", "ref-t", "xtc-grps", "xtc-grps", "define"

A typical mdp file can look like:

;       This is the template file for
; an md simulation with Water as solvent
;
; VARIOUS PREPROCESSING OPTIONS           
title                                     =
cpp                                       = /lib/cpp
include                                   =
define                                    =
; RUN CONTROL PARAMETERS                  
integrator                                = md
; start time and timestep in ps           
tinit                                     = 0
dt                                        = 0.002
nsteps                                    = 100
; mode for center of mass motion removal  
comm-mode                                 = Linear
; number of steps for center of mass motion removal 
nstcomm                                   = 1
; group(s) for center of mass motion removal 
comm-grps                                 =
etc. 

EM minimisation before a MD/SA run

If you do MD or SA W(H)AG has to minimize the system before it can start a molcular dynamics simulation. If you want to tell W(H)AG how many steps minimization to perform you can specify this with this parameter.

Presently WHAT IF uses the default of 100 steps which is enough in most cases.

Be aware that this parameter is only used for the tasks SA and MD, and has nothing to do with the duration of FASTEM energy minimisation runs.

Setting the Temperature

This specifies the temperature to use in K, defaulting to 300 K. Note, that the forcefields used are not parametrised for high temperatures, hence interpret your results with care.

The output files

W(H)AG gives you output in the following files and you can specify their names in the control file.
OUT_STRUCT      the structure after a PREPARE run or a computation. 
OUTPUT_TRAJ     the trajectory file after a computation
VERBOSE_FILE    all the W(H)AG messages are written into that file
XX.log          XX currently stands for Prep, SA, MD and EM
(in the moment it compiles in such a way, that instead into this file information is written to the screen) Later: it will write some messages to the screen, but they can be routed to /dev/null, (they come from buried GROMACS code and it would be a hassle to supress them all) Note if you do MD you will get first the predicted time for the preparation EM (some minutes) and only later the predicted time for MD (can be very distant in time).

Analyses

After a completed run a set of analyses is performed, to allow the user to check the simulation and also interpret the outcome of a run more straightforwardly.

Note, that there are many powerful tools, already shipped with GROMACS, in the sub-directory gromacs/gromacs/bin of your WHAT IF installation. The GROMACS manual (http://www.gromacs.org) describes them and much more about simulations in general in detail.

Analysing the root mean square deviation

By default the root mean square deviation (rmsd) in respect to the start of the trajectory It is normal to see an increase in rmsd compared to the starting structure, which is still quite close to the xray structure. During this increase crystal packing artefacts are removed until after several nanoseconds (can take much longer, depending whether you started in a local minimum far away from the equilibrium structure), one slowly obtains a more equilibrated structure. Before interpreting your results, make sure that the trajectory you analyse is equilibrated as well as possible.

The output file: SUBDIR/whag.analysis.rmsd.xvg

Analysing energy related properties

Computing potential and kinetic energy, Temperature, Pressure-(bar) and the Temperature during simulation time.

The output file: SUBDIR/whag.analysis.energy.xvg

Analysing radius of gyration

Computing radius of gyration during simulation time, a large change indicates a structural transition.

The output file: SUBDIR/whag.analysis.rgyration.xvg

Analysing the number of hydrogen bonds

Computing the number of hydrogen bonds during simulation time, can be used, e.g., as a stability measure.

The output file: SUBDIR/whag.analysis.hbnum.xvg

Performing a principal component analysis

From a principal component analysis over the combined trajectories, after assembling and diagonalising the covariance matrix of atom pairs, the eigenvectors and eigenvalues are obtained. Projection of each individual trajectory onto the eigenvectors allows to analyse the most important modes found in the simulation. A linear interpolation of the first 10 (most prominent or essential) modes can be found in the extreme_X.pdb files (X runs from 1 to 10), each extreme projection containing 20 linearly interpolated structures between the two extremes found for this mode.

The output files of the PCA: SUBDIR/whag.analysis.eigenval.xvg; SUBDIR/whag.analysis.eigenvec.trr

The output files of the projection: SUBDIR/whag.analysis.proj.xvg; SUBDIR/whag.analysis.extreme_X.pdb, X runs from 1 to 10.

W(H)AG optional analyses - DSSP

Analysing secondary structure elements can take a while depending on the number of frames in your trajectory (set DSSP to yes).

The output files: SUBDIR/whag.analysis.scount.xvg; SUBDIR/whag.analysis.dssp.eps

W(H)AG summary

W(H)AG can work with fragments of proteins and other molecules which, in this manual, are called drugs. The protein fragments are given in the input file PDB_FILE and the drugs in separate pdb files the DRUG_FILES.

While your input before the PREPARE stage is in separate files for protein and drugs, the output will be in one long file. W(H)AG expects you to feed that file with changed coordinates and b-factors back in as new input file PDB_FILE, when you want to perform the actual computation. Now you set TASK=EM,MD or SA and CONTINUE=YES. W(H)AG will save a lot of information from the PREPARE stage in its SUBDIR, including the information about bonds, charges, etc. of the system. This information is expected to be untouched since the corresponding PREPARE run of W(H)AG.

Comings and Short Comings

W(H)AG can deal with everything you have the parameters for. For protein fragments W(H)AG will create its own topology and parameter files, by using the GROMACS tool pdb2gmx. W(H)AG uses whag_pdb2gmx, an extension of that tool. This does a fairly good job on proteins, but again for tricky cases experts in molecular dynamic simulations would carefully check the setup, before commencing a long simulation. If you have a good parameterfile for a protein in GROMACS format you can give this protein as a DRUG_FILE to W(H)AG and W(H)AG will use the model as it is. For drugs, this parameter file is created by a PRODRG, a part of WHAT IF which can deal with most of the small molecules and does a fairly good job, but for tricky cases the resulting model will be very bad, W(H)AG does not check the validity of your model!

To create bonds between protein fragments or drug molecules and protein you use the input file EXTRA_BONDS, the bonds created will be non-specific.

W(H)AG at the moment creates a bond between two aliphatic CH3 groups, even if the situation is very different. This means the parameters for bond strength, angle, dihedral, out of plane movement, etc. will be wrong.

Trouble-shooting W(H)AG

For every step you will find files in SUBDIR, the filename of which contains the command executed and a running number, together with STDERR and STDOUT. These are the files captioning the console output of the respective commands used to manipulate the simulation system.

When searching for possible reasons of W(H)AG/GROMACS problems, watch out for the console output files of programs like whag_pdb2gmx(initial setup), whag_grompp(final run input file preparation) or genion (handling of ions to be added). A log for every command sequence(Prep.log, SA.log, MD.log, etc.) is created after successfully completing one stage in the directory, where you started from. This sequence of shell commands can be used to understand, how W(H)AG prepares the simulations with GROMACS.

In case of LINCS warnings, the linear constraint solver (LINCS) failed to constrain bond lengths and angles to the desired values. These constraints are employed to ensure that, e.g., a C-H does not drastically change its length etc. Have a look at SUBDIR/md.log to find the atom numbers, where the LINCS warning was triggered to guide your trouble-shooting.

Should missing atoms have caused a failure, compare SUBDIR/soup.pdb with SYNC_IN.PDB (no drugs) and SYNC_OUT.PDB (drugs included). Sometimes badly resolved ligands from XRAY structures were modelled into the structure and some bond lengths might not match their requirements.

FAQ-Here you can find frequently answered questions:

Q:  Why does WHAT IF not start an MD with broken chains?
A: A chain break would either mean to end one chain and start a new one or to model in the missing aminoacids/atoms. In the first case you might observe different dynamics compared to what would be observed for the whole protein.

In the second case you have to some up with guesses for the missing coordinates, which can be from intruiging to impossible.

Q:  How can I restart a previous MD run?
A: Choose "TASK=RESTART", but make a backup copy before you restart the run. W(H)AG will not continue runs, where it cannot find previously saved "energy.edr" (system energies) and "traj.trr" (full precision trajectory at a low output frequency) files.
Q:  My run "crashed". How can I find out if the problem was with WHAT IF, PRODRG, 
    W(H)AG, or GROMACS ?
A: Make sure you started a run with the W(H)AG parameter "VERBOSE" set to "YES"! Any error message before the STATEMENT "WHAG is now active!!" is most likely from WHAT IF. If there are any drugs present also PRDRG may report some errors, they can be found in files called like "PRODRUGOUTXXX", where XXX is the number of the drug, assigned by WHAT IF. For WHAT IF see also the file called "GROOUT.LOG", where the individual actions of a FAST**command are logged to. W(H)AG and GROMACS errors are likely to be logged in files like "WHAG_STDERR_editconf.YY" located in the directory SUBDIR/, where YY is an running number assigned, when W(H)AG calls one the GROMACS programs or any other system command (e.g. awk scripts). Every system call is logged to files called "Prep.log", "EM.log", "MD.log" or "SA.log". These files, however are currently written at the end of a run, there might be a file called "WHAG_startgmx.csh", which is written during the run.
Q:  My drugs behave oddly. It seems to be very floppy and fluctuates a lot.
A: That is expected with the OPLS forcefield, e.g., the aromatic rings are parametrised to be flexible (compare to W or Y in your simulation).
Q:  How long do I have to simulate before the MD is finished? 
A: Frankly, it strongly depends on the phenomenon you are investigating. Make sure you allowed enough time for relaxation when starting form xray structures or simulating in various temperature regimes
Q:  How high can I go in temperature in MD runs?
A: You can set very high temperatures, just remember that the force fields are parametrised at a certain temperature. You can use simulated annealing runs to try and overcome apparent barriers, but it is suggested to use snapshots taken from these runs to start individual MD runs. Many small runs with slightly different starting conditions might be advantegeous over a single long run JH genseed flag EOFJH
Q:  WHICH forcefield shall I choose? 
A: In principle it should not matter, as long as the force field can provide the parameters for the atoms included, or you find a tool which does (e.g. PRODRG). In practice, however, there are reports(rules of thumb at most) that one finds rather beta strand-like structures with GROMOS96, more alpha helix with AMBER and OPLS/CHARMM should be in between.JUERGENcite bert's paper
Q:  GERT In some cases, while in script mode WHAT IF pops up editor windows and asks
    for confirmation about overlapping atoms etc. 

A1: To suppress the editor window use "setwif 873 1".

A2: In order to confirm questions with "Y" use "setwif 622 0"