WHAT IF understands proteins and strictly uses the coordinates as provided in the PDB-file as well as IUPAC nomenclature and atom order.
GROMACS uses its own, non-standard atom names and atom order. Also, GROMACS cannot deal with every atom type (e.g. metal ions), and it has some quircks like not understanding that aspartic acid can be (must be) protonated on the O-delta-1, or not knowing that histidines can be deprotonated. Also, GROMACS requires very elaborate descriptions, called topologies, for most ligands.
PRODRG sets up the drug topologies required by GROMACS for both the GROMOS96 and OPLS forcefield (support for CHARMM and AMBER99 is planned). PRODRG uses its own atom naming rules, atom order, and also changes the coordinates of the molecule to be parametrised. WHAT IF cares for these changes and updates the drug pdb files accordingly. Bound ligands are a special topic and much development has gone into that.
W(H)AG, finally, includes all individual topologies into a global system topology, sets all neccessary parameters and starts GROMACS. BR The basic philosophy behind the GROMACS menu is, that WHAT IF checks your structure(s), adds missing atoms and, through PRODRG, calculates topology parameters for any molecule, that GROMACS doesn't know about yet. After that, WHAT IF creates a W(H)AG control file, by default called "GROMACS.SCR", which the user can interactively edit. The file contains the options needed to properly set up, e.g., the protonation states of the titrable groups or which drug topologies to use. Many more parameters can be set and are described in detail below. The control file is then parsed by W(H)AG, the selected task will be set up and a subsequent GROMACS run will be started. After completing the run, W(H)AG performs some standard analyses on the trajectory. WHAT IF can read the results back in. Normally the result is an energy-minimised structure that can overwrite the coordinates in the soup, or a molecular dynamics trajectory that can be viewed or analysed.
Note: If you want to work with GROMACS, you need to use the proton-TOPOLOGY file for WHAT IF prior to calling WHAT IF. You do that by copying the file TOPOLOGY.H from WHAT IF's dbdata directory to the local directory, in which you currently work.
GROMACS fails bitterly on proteins with chain-breaks. So WHAT IF will not even send things with chain breaks to W(H)AG/GROMACS. In future versions there might be an option of loop modelling ro remedy this limitation.
Remember to be self-critical about the outcome of MD simulations and do comparison runs! Also make use of the built-in automatic analyses, which are performed at the end of each run. The files in question start with the prefix "SUBDIR/analysis.whag" (see Analyses/Trouble Shooting).
The combination of WHAT IF, PRODRUG and, through W(H)AG, GROMACS allows you to avoid common pitfalls when setting up MD runs as well as opening up a very fast way (WHAT IF and W(H)AG are completely scriptable) of investigating mutation effects, effects of salt concentrations and drugs (etc.). W(H)AG also creates run input files for as many CPUs as specified, to easily start your simulations - manually - on your favourite cluster (see NCPU).
The FAST** options start by extensively checking all kinds of things such as missing side chains, wrong bond lengths, positions of hydrogens in the total hydrogen bonding network. In case of ambiguous or unsolvable situations you will get warnings, which should be carefully checked. Make sure to double-check in cases where you get warnings that a given issue is assumed to be "ok" !
The FAST** options assume that your whole SOUP is one normal situation. If you have superposed things, residues/waters/etc from multiple PDB files, multiple docked drugs, or other things that should not be present in a PDB file, the FAST** options might fail.
The FAST** options will only use the first structure of NMR ensembles!
Where possible, default parameters are being used. See the GROMACS parameter menu for a description of those parameters and for possibilities to change them.
OUTPUT FILES:
OUT_EM.PDB
Prep.log
EM.log
Sometimes, a molecule gets stuck in a high-energy local minimum in energy space. In such cases it might help to shake the molecule a bit. Simulated annealing is just the 'right' way to perform that little shaking.
The command FASTSA will cause WHAT IF to prompt you for a range of residues. After extensive checking of this range, GROMACS will be started to perform a simulated annealing run. You will be warned for many thinkable and unthinkable problems in your coordinates that would preclude a successful GROMACS run, and suggestions will be made about how to fix those problems. Many problems will even be fixed automagically.
Where possible, default parameters are being used. Again, see the GROMACS parameter menu for a description of those parameters and for possibilities to change them.
OUTPUT FILES:
OUT_SA.PDB
OUT_SA_TRAJ.PDB
Prep.log
SA.log
The command FASTMD will cause WHAT IF to prompt you for a range of residues. After extensive checking of this range, GROMACS will be started to perform a molecular dynamics run. You will be warned for many thinkable and unthinkable problems in your coordinates that would preclude a successful GROMACS run, and suggestions will be made about how to fix those problems. Many problems will even be fixed automagically.
Where possible default parameters are being used. As with the other FAST** commands see the GROMACS parameter menu for a description of those parameters and for possibilities to set them to your desired values.
OUTPUT FILES:
OUT_MD.PDB
OUT_MD_TRAJ.PDB
Prep.log
MD.log
DO_CAL DOSYNC: Add missing atoms or not
DO_HBO DOSYNC: Optimize the hydrogen bonding network or not
DO_NOR DOSYNC: Normalize geometries or not
DO_WAT DOSYNC: Do water corrections or not
DOSYNC Solely perform molecular validation and correction done by the FAST**
DSTBOX Minimal distance between solute and box in nanometer
FIXCAS Restraining C-alphas in MD run
FIXFRC Degree of restraining by FIX... options
FIXHST Should helix and strand C-alphas be restrained in MD runs
FIXMAN Determine `by hand` which residues are restrained in MD runs
FORCEF Which force field should be used
KEPWAT Keep waters in trajectory file or not
LEVUSE Degree of WHAG-interactivity
MDSTEP Number of pico-seconds in MD run
NACL Concentration of NaCl in the water in milliMol
NCPU Maximal number of CPUs to be used in MD simulations
OUTFRQ How much trajectory output per pico-second
PDBXTC Choice of trajectory format (PDB, XTC)
SANPNT Number of points in simulated anealing
SATEMP Temperatures in simulated anealing
SATIME Time points in simulated anealing
SATYPE Simulated anealing 'type'
SHOPAR Show the values of all parameters
STEPS Maximal number of STEPS in energy minimisation
TEMP Temperature of MD simulations
TRAJEC Get trajectory file or not
USEWAT Adding bulk water to simulation box - recommended
VERBOS Degree of WHAG verbosity interface
The remainder of this section describes these parameters and explains the
range of values you can choose from.
At the end of each description the corresponding WHAG parameters are listed. These are to be used, after lauching one of the FAST** commands from the GROMACS menu.
Corresponding WHAG parameter: DIST_SOL_BOX
FIXMAN has priority over FIXCAS.
Corresponding WHAG parameter: DFIX_FORCE ("NONE", "LOW","MEDIUM","HIGH")
If FIXCAS, FIXMAN or FIXHST are used DFIX_FORCE will be set "B-FACTOR".
JUERGENCHECKME: This will override the WHAG parameter POSRES.
Corresponding WHAG parameter: FORCEFIELD ("GMX", "OPLS")
Corresponding WHAG parameter: KEEP_TIP4P ("YES", "NO")
When USEWAT is set to "BULK", KEPWAT determines what is written in the trajectory file. With KEPWAT set to "YES", everything is written in the trajectory file, including the bulk waters. With KEPWAT set to "NO", only what you put in yourself is written into the trajectory file (so only macromolecules if USEWAT was set to "NONE", or macromolecules and crystal waters if USEWAT was set to "XRAY".
Corresponding WHAG parameter: KEEP_WATER ("YES", "NO")
WARNING. MDSTEP is the number of pico-seconds and not the number of steps!
Corresponding WHAG parameter: DURATION
Corresponding WHAG parameter: CONC_NACL
If you chose a number higher than the actual number of cpus in your system, NCPU will be automatically decreased. In SUBDIR/ a X_topol.tpr file will be created a so-called run input file for GROMACS. This file will be compiled with the originally specified number of CPUs to be run on any cluster later.
Choose NCPU to be a multiple of 2, since every time step two fourier transformations are performed and these cannot be split over an odd number of cpus.
Corresponding WHAG parameter: NCPUGERT
Corresponding WHAG parameter: OUTFRQ
Corresponding WHAG parameter: OUTPUT_FORMAT
Corresponding WHAG parameter: SANPNT
Corresponding WHAG parameter: SATEMP
Corresponding WHAG parameter: SATIME
Corresponding WHAG parameter: SATYPE
Corresponding WHAG parameter: PREMD_EM_DURATION
Corresponding WHAG parameter: TEMPERATURE
Set to "0" to only obtain the last structure from the simulation and to "1" for a complete time series (trajectory) of coordinates OUTFRQ determines how many snapshots are written if TRAJEC is set to "1".
Corresponding WHAG parameter: GERT
See further also under KEPWAT.
Warning. MD runs without BULK water here do not make much sense, since the alternative is running in vacuum. It is therefore recommended to include bulk water in all MD simulations.
This remaining of this chapter describes the commands that WHAT IF can write into these W(H)AG control files.
Parameter Type Explanation
CONTINUE YES/NO Continuation after a previous PREPARE run
(set automatically)
CONC_NACL Integer Concentration (in mMol) of additional ions
(not the atoms to neutralize the bulk,
these are added automatically)
DISRES Filename File defining distance restraints (e.g. from NMR
experiments)
DIST_SOL_BOX Real Defines distance between solute (e.g. protein) and box
(in nm, default 1.0, also valid for vacuum simulations)
DRUGDIR Directory Name of Directory where your manually optimised
topologies for drugs reside in. Beware that the naming
of the files must be matching the order found in the
original PDB file. The topology and the structure files
for the first drug must be called DRG1.ITP/DRG1.PDB
and for the second DRG2.ITP/DRG2.PDB (etc.)!
When selecting this option, no PRODRG topologies will
be used. This is implemented to benefit from manually
parametrised drugs and due to the suboptimal charges
PRODRG can only deliver at present.
DSSP YES/NO At the end of the run, this option will activate the
DSSP program to be run over the whole trajectory
DURATION Integer Number of picoseconds in MD or SA run and number of
steps for EM.
DRUG_FILES List of Names of drug files x for which
filenames x.ITP and x.PDB exist.
EXTRA_BONDS Filename List of bonds which are to be included
into the system additionally.
DFIX_FORCE Selection Force to keep all atoms in place. Values
NONE, LOW, MEDIUM, HIGH, B-FACTOR.
Applied to all the atoms given in the file
PDB_FILE
KEEP_WATER String Keep waters in GROMACS output trajectory for viewing
in WHAT IF: YES or NO
NCPU Integer Indicates a parallel execution of GROMACS if larger than 1.
OUTPUT_STRUCT Filename final coordinate file after EM,MD or SA
OUTPUT_TRAJ Filename Name of trajectory file
OUTPUT_FORMAT Filename Format of trajectory: PDB or XTC
OUTFRQ Integer Number of MD steps per trajectory step
PARAM_FILE Filename EXPERT: use this .mdp file for GROMACS run
parameters (see also DRUGDIR)
PDB_FILE Filename Name of input coordinate file for GROMACS
POSRES YES/NO Should position restraint run be included, usually
used to relax bulk water.
PROTONATION_STATES Filename File with a list of LYS, ASP, GLU and HIS
residues and their respective
protonation states(0 is unprotonated)
TEMPERATURE Integer (Initial) temperature of SA/MD run
SATYPE string Can only be "single" at present and is necessary to
switch on simulated annealing.
SANPNT Integer Defines how many levels of temperature will be
simulated at during a simulated annealing run.
SATEMP Array of Int temperature in K for simulated annealing. For
Explanation and Example see SATIME.
SATIME Array of Int Absolute time at which the next level of temperature
will be reached. Depending on how many SANPNT were
defined!(the same holds for SATEMP only that
temperatures are expected)
Ex.: SANPNT:3; SATIME 0 100 1000; SATEMP 500 400 298.
This will set up an SA run starting at
0ps with temperature SATEMP(1)=500K,
arriving at SATEMP(2)=400K at SATIME(2)=100ps
and finally reaching SATEMP(3)=298K at SATIME(3)=1000ps
SUBDIR Dirname Name of scratch dir for GROMACS files
(Must be subdirectory of working directory for which you
have write permissions)
TASK EM/SA/MD/ Task to perform
PREPARE
USE_HYDROGEN YES/NO Use GROMACS routines to determine protonation states of
HIS, LYS, ASP, GLU and termini.
VERBOSE YES/NO Whether GROMACS should be verbose: YES or NO
VERBOSE_FILE Filename File in which GROMACS writes messages
WATER XRAY/BULK Options of how water is included in simulations:
none - deletes all water atoms prior to simulation
XRAY - use water present in XRAY structure
BULK - add bulk water to simulation box
(see DIST_SOL_BOX)
Note, that all of the W(H)AG parameters can be set by the WHAT IF user, but not every
stage of operation is sensitive to all parameters specified. Parameters specified in
the PREPARE stage of W(H)AG might need to be entered again for the MD/EM/SA stage.
WHAT IF will supply you with a standard set of options and include the required
switches when, e.g., encountering drugs.
AtomIndex1 AtomIndex2 lb ub LOW/MEDIUM/HIGHLOW/MEDIUM/HIGH gives the strength of the spring which will be interpreted as 0.1 / 1 / 5 * 1000kj/nm2. "lb" "ub" are parameters determining the shape of the restraining potential. if the distance is between "lb" and "ub" no force is applied. If distance leaves the intervall ("lb","ub") the force will increase linearly with the spring constant given above.
The example: 10 22 0.1 0.2 MEDIUM
will introduce a medium strength restraining force between atom 10 and atom 22 with the distance lying between 0.1 and 0.2 without being affected.
The restraint force is applied to all atoms in the input-pdb file with nonzero B-factor. Depending on the setting of the switch DFIX_FORCE we use the same constant for all restrained atoms or the respective number of the B-factor for the strength of the restraint. The settings where we ignore the numerical value of the B-factor and use the B-factor only as an on/off switch are:
LOW MED HIGHwhich will be 0.1, 1.0, or 5.0 respectively in units of 1000 kJ/mol/nm^2.
Other possible settings are:
NONE means there are no restraints applied regardless the values
of the b-factors.
B-FACTOR leads to an application of a spring constant to every atom
determined by its B-factor times 1000 kJ/mol/nm^2.
If you want to use the hydrogen positions of WHAT IF or apply position
restraints, you have
to do a TASK=PREPARE run of W(H)AG and write the correct hydrogen
positions into the PDB file
you get back from W(H)AG. To apply position restraints change the
b-factor in the returned file from 0 to some positiv value. Depending
on the setting of DFIX_FORCE this B-factor will determine the applied
restraining potential or it's numerical value will be ignored and
an equal restraining strength is used for all postion restraints,
i. e. the B-Factor is only used as a switch in that case.
LYS 3 0where the first is the residue name, the second is the residue number starting with 0 and the third is the protonation state in the case of LYS, ASP and GLU:
0 deprotonated
1 protonated
in the case of HIS it is:
0 H on ND1
1 H on NE2
2 H on ND1 and NE2
3 Coupled to Heme
Note, that GROMACS cannot deal with unprotonated histidines.
PDB (as an multi-model PDB file, needs a *lot* of disk space!) XTC (a compressed GROMACS specific format).
If this parameter is set to NO (default) all the automatically added "BULK" water and the bulk ions will not be written in the output and trajectory files. If there had been some "XRAY" water molecules in the input file, and they were used by setting the USEWAT parameter appropriately in WHAT IF, their position will be returned regardless of the setting of KEEP_WATER.
You can delete the SUBDIR after you finished a W(H)AG computation, all the important information will be in your working directory.
PLEASE NOTE: the SUBDIR has to be a subdirectory of your working directory to ensure proper execution of W(H)AG.
WHAT IF uses "SCRATCH" as the default name for SUBDIR.
If "XRAY" is chosen W(H)AG performs a simulation in vacuum which uses different algorithms so that no counterions are necessary. Running in vacuum results in shorter runtimes, however, "Bulk" is the recommended setting. In combination with adjusting NCPU to the number of desired CPUs and due to enhanced algorithms for water interactions GROMACS achieves a high performance per CPU. Currently there is no option for running implicit solvent simulations.
If you choose BULK to simulate at a certain NaCl concentration, you can add more ions to the solvent, by specifying the desired concentration with the parameter CONC_NACL.
DMSO 3 MAIN 299 ; bind the third DMSO Atom to the 299th atom of the
; PDB_FILE
MAIN 299 DMSO 3 ; the sequence of the two atoms is not important
MAIN 1741 MAIN 1749 ; creates a non-canonical bond between two protein atoms
DMSO 39 SUGAR 23 ; bonds between drug fragments (you might be better
; off defining a single drug DMSO_SUGAR because then
; the bond will be parameterized better)
Be aware, though, that this is still a rather crude way of solving this problem. If
you need reliable results on the EM/MD of residues with attached groups, you
better get help from a real gromacologist.
If this option is not used W(H)AG finds the MDP files under ../whatif/gromacs/whag/config/ if you want to make your own MDP file you can copy a suitable default file into your working directory and change the parameters as required.
If you then tell W(H)AG to use your private PARAM_FILE it will do so, but it will still change certain parameters like the number of steps to fit the setting of DURATION. That means all the options in the W(H)AG control file overrule settings in the PARAM_FILE. But there are not many paramaters, which are actually bound to be overruled, W(H)AG currently changes the following mdp parameters:
"annealing", "zero-temp_time", "annealing_temp", "annealing_time", "annealing_npoints", "tc-grps", "tau-t", "gen-temp", "ref-t", "xtc-grps", "xtc-grps", "tc-grps", "tau-t", "ref-t", "xtc-grps", "xtc-grps", "define"
A typical mdp file can look like:
; This is the template file for ; an md simulation with Water as solvent ; ; VARIOUS PREPROCESSING OPTIONS title = cpp = /lib/cpp include = define = ; RUN CONTROL PARAMETERS integrator = md ; start time and timestep in ps tinit = 0 dt = 0.002 nsteps = 100 ; mode for center of mass motion removal comm-mode = Linear ; number of steps for center of mass motion removal nstcomm = 1 ; group(s) for center of mass motion removal comm-grps = etc.
Presently WHAT IF uses the default of 100 steps which is enough in most cases.
Be aware that this parameter is only used for the tasks SA and MD, and has nothing to do with the duration of FASTEM energy minimisation runs.
OUT_STRUCT the structure after a PREPARE run or a computation. OUTPUT_TRAJ the trajectory file after a computation VERBOSE_FILE all the W(H)AG messages are written into that file XX.log XX currently stands for Prep, SA, MD and EM(in the moment it compiles in such a way, that instead into this file information is written to the screen) Later: it will write some messages to the screen, but they can be routed to /dev/null, (they come from buried GROMACS code and it would be a hassle to supress them all) Note if you do MD you will get first the predicted time for the preparation EM (some minutes) and only later the predicted time for MD (can be very distant in time).
Note, that there are many powerful tools, already shipped with GROMACS, in the sub-directory gromacs/gromacs/bin of your WHAT IF installation. The GROMACS manual (http://www.gromacs.org) describes them and much more about simulations in general in detail.
The output file: SUBDIR/whag.analysis.rmsd.xvg
The output file: SUBDIR/whag.analysis.energy.xvg
The output file: SUBDIR/whag.analysis.rgyration.xvg
The output file: SUBDIR/whag.analysis.hbnum.xvg
The output files of the PCA: SUBDIR/whag.analysis.eigenval.xvg; SUBDIR/whag.analysis.eigenvec.trr
The output files of the projection: SUBDIR/whag.analysis.proj.xvg; SUBDIR/whag.analysis.extreme_X.pdb, X runs from 1 to 10.
The output files: SUBDIR/whag.analysis.scount.xvg; SUBDIR/whag.analysis.dssp.eps
While your input before the PREPARE stage is in separate files for protein and drugs, the output will be in one long file. W(H)AG expects you to feed that file with changed coordinates and b-factors back in as new input file PDB_FILE, when you want to perform the actual computation. Now you set TASK=EM,MD or SA and CONTINUE=YES. W(H)AG will save a lot of information from the PREPARE stage in its SUBDIR, including the information about bonds, charges, etc. of the system. This information is expected to be untouched since the corresponding PREPARE run of W(H)AG.
To create bonds between protein fragments or drug molecules and protein you use the input file EXTRA_BONDS, the bonds created will be non-specific.
W(H)AG at the moment creates a bond between two aliphatic CH3 groups, even if the situation is very different. This means the parameters for bond strength, angle, dihedral, out of plane movement, etc. will be wrong.
When searching for possible reasons of W(H)AG/GROMACS problems, watch out for the console output files of programs like whag_pdb2gmx(initial setup), whag_grompp(final run input file preparation) or genion (handling of ions to be added). A log for every command sequence(Prep.log, SA.log, MD.log, etc.) is created after successfully completing one stage in the directory, where you started from. This sequence of shell commands can be used to understand, how W(H)AG prepares the simulations with GROMACS.
In case of LINCS warnings, the linear constraint solver (LINCS) failed to constrain bond lengths and angles to the desired values. These constraints are employed to ensure that, e.g., a C-H does not drastically change its length etc. Have a look at SUBDIR/md.log to find the atom numbers, where the LINCS warning was triggered to guide your trouble-shooting.
Should missing atoms have caused a failure, compare SUBDIR/soup.pdb with SYNC_IN.PDB (no drugs) and SYNC_OUT.PDB (drugs included). Sometimes badly resolved ligands from XRAY structures were modelled into the structure and some bond lengths might not match their requirements.
FAQ-Here you can find frequently answered questions:
Q: Why does WHAT IF not start an MD with broken chains?A: A chain break would either mean to end one chain and start a new one or to model in the missing aminoacids/atoms. In the first case you might observe different dynamics compared to what would be observed for the whole protein.
In the second case you have to some up with guesses for the missing coordinates, which can be from intruiging to impossible.
Q: How can I restart a previous MD run?A: Choose "TASK=RESTART", but make a backup copy before you restart the run. W(H)AG will not continue runs, where it cannot find previously saved "energy.edr" (system energies) and "traj.trr" (full precision trajectory at a low output frequency) files.
Q: My run "crashed". How can I find out if the problem was with WHAT IF, PRODRG,
W(H)AG, or GROMACS ?
A: Make sure you started a run with the W(H)AG parameter "VERBOSE" set to "YES"!
Any error message before the STATEMENT "WHAG is now active!!" is most likely from
WHAT IF.
If there are any drugs present also PRDRG may report some errors, they can be found
in files called like "PRODRUGOUTXXX", where XXX is the number of the drug, assigned
by WHAT IF. For WHAT IF see also the file called "GROOUT.LOG", where the individual
actions of a FAST**command are logged to.
W(H)AG and GROMACS errors are likely to be logged in files like
"WHAG_STDERR_editconf.YY" located in the directory SUBDIR/, where YY is an
running number assigned, when W(H)AG calls one the GROMACS programs or any other
system command (e.g. awk scripts).
Every system call is logged to files called "Prep.log", "EM.log", "MD.log" or
"SA.log". These files, however are currently written at the end of a run, there
might be a file called "WHAG_startgmx.csh", which is written during the run.
Q: My drugs behave oddly. It seems to be very floppy and fluctuates a lot.A: That is expected with the OPLS forcefield, e.g., the aromatic rings are parametrised to be flexible (compare to W or Y in your simulation).
Q: How long do I have to simulate before the MD is finished?A: Frankly, it strongly depends on the phenomenon you are investigating. Make sure you allowed enough time for relaxation when starting form xray structures or simulating in various temperature regimes
Q: How high can I go in temperature in MD runs?A: You can set very high temperatures, just remember that the force fields are parametrised at a certain temperature. You can use simulated annealing runs to try and overcome apparent barriers, but it is suggested to use snapshots taken from these runs to start individual MD runs. Many small runs with slightly different starting conditions might be advantegeous over a single long run JH genseed flag EOFJH
Q: WHICH forcefield shall I choose?A: In principle it should not matter, as long as the force field can provide the parameters for the atoms included, or you find a tool which does (e.g. PRODRG). In practice, however, there are reports(rules of thumb at most) that one finds rather beta strand-like structures with GROMOS96, more alpha helix with AMBER and OPLS/CHARMM should be in between.JUERGENcite bert's paper
Q: GERT In some cases, while in script mode WHAT IF pops up editor windows and asks
for confirmation about overlapping atoms etc.
A1: To suppress the editor window use "setwif 873 1".
A2: In order to confirm questions with "Y" use "setwif 622 0"