Moloc and mol-Files (.sd)
Basics
- The mol format is well established for structure files. However, its
definitions of topology, in particular the bond-type assignment, are different
from Moloc's way of using the number of hydrogens at each heavy atom (see e.g.
InChI's). Although Moloc is able to read .sd files properly, it would not write
back identical files in all cases. Aromatic bond types e.g. are replaced by a
Kekule-type representation.
- Thus, in general, batch-programs, which start from a .sd file and write
back the same format, keep the topology information and write it back unchanged
in order to avoid this difficulty. To avoid complications in this process the
resulting number of atoms must be the same as on input.
- The program Mol3d is designed to handle these issues before other program
are run.
Stereochemistry
- Moloc does not carry along flags indicating stereochemistry. However, it
preserves in general stereochemistry of 3-d structures through applying adequate
constraints. Thus, if stereochemistry is to be preserved it has to be already
present in a 3-d input structure.
- There are two programs that convert topologically defined stereochemistry
into appropriate 3-d structures: Mol3d (.sd-files) and Msmab (SMILES codes). If
e.g. a 2-d .sd-file containing chiral compounds is input to a Moloc program,
it should first be run through Mol3d. In Moloc, the corresponding algorithm
can be activated when reading files (.../g/o, and always for SMILES or InChI's).
Explicit Hydrogens
- A standart .sd file does not have explicit hydrogens. However, some programs
generate .sd files containing some or all hydrogens explicitly.
- When Moloc's batch programs (except Mol3d or Msmab) encounter a structure
with explicit H's, they assume that all H's are explicitly given. This leads
to strange topologies when only part of the H's are given explicitly.
- Moloc and its bach programs mostly perform calculations without explicit
hydrogens and often strip them to start with.
- To avoid the above mentioned difficulties with atom counts .sd-files
must be free of explicit hydrogens, i.e. H's must be stripped beforehand
with Mol3d.
Multi-Fragment Structures
- The most common cases of multifragment structures are ion pairs, in which
the counter-ion of the relevant structure is small, often mono-atomic.
- Most of Moloc's batch programm's do not work with multi-fragment structures.
- The flag -l is provided in some programs to retain only the largest fraction
of a structure. In Moloc the frag can be set in .../g/o. In almost all cases this
retains the structure of interest. Multi-fragment entries can be identified in
lib/e/f.
- As with explicit hydrogens, programs returning an .sd file will fail with
multifragment input due to atom-count mismatch.
Recommendations
- For input files with 3-d coordinates it is advisable to run the command
'Mol3d -lm1' prior to run a batch job. This removes small fragments and also
strips off explicit hydrogens.
- For flat structures use 'Mol3d -lw6' to also generate a 3-d structure of
appropriate stereochemistry.
- If hydrogens are used to indicate stereochemistry one should separate
generation of 3-d structure and H-stripping: 'Mol3d -w6', then 'Mol3d -lm1'.
- To avoid writing intermediate files the jobs can be piped together with
the flag (parameter) '-r'. The command for docking a flat file would e.g.
read: Mol3d -rw6 < in.sd | Mol3d -rlm1 | Mdck -rq cav.mab -p pos.php > out.sd
Piping
Many of Moloc's batch programs allow to pipe through .sd files by specifying
the rederect flag (parameter '-r'. A typical case is the program 'Mtprmp' which
produces predictions from linear models. When this program is run in redirect
mode, it reproduces the input .sd file, augmented for each structure by a data
field, PRED (for '-r PRED') in which the prediction of the model is written.