Various methods of matching structures on top of each
other are assembled in this utility. Also provided are tools
to handle the transformations that result from these
matches, as well as the possibility to move molecules with
previously generated transformations.
Various manipulations with transformations can be
performed, in particular reading from or writing to a file.
The result of this option is the last generated or chosen
transformation, unless it is left by quitting.
Pairs of topologically non-identical molecules can be
matched on top of each other after corresponding atom pairs
to match each other have been specified. For Calf structures
sequences-wise specification is possible. Corresponding
pairs are linked by a dotted distance line.
Any number of structures of identical topology can be
compared here. Either only tests are performed (options t,
r, q) or matching operations are included (options m, d, p).
Since this is a multi-match facility, no transformation is
returned.
First the (connected) substructure to be superimposed has
to be defined. This can be done by specifying its SMILES
string or by defining it as a set in any of the structures.
Then a search for this fragment in all structures must be
initiated (option s). Finally, the user has to specify the
entries that he wants to participate in the match. Only then
the actual match can be performed. This is also a multi-match
facility and no transformation is returned.
Two arbitrary structures can be superimposed. The criteria
of optimal superposition can be chosen to be one of the
following three possibilities or an arbitrary mix of them:
Overlap of occupied volume (O), overlap of H-bond donor- and
acceptor properties (including directionality) (H), and
overlap of atomic charges (C), which have to be defined in
advance. The mix (P) is termed pharmacophor (P) and its
ratios are susceptible to the users choice (see option w).
For a set of structures a analysis of pharmacophoric
diversity can be performed. To that end a couple of
preliminary preparations are necessary. Depending on the
mode of pharmacophoric comparison (positional, fixed point,
or one of the two cases of axial) more or less of the menu
points need to be addressed before the actual calculation
can be done.
Analysis of a binary cluster tree can be done here. When
the tree data are associated with structures simultaneous
visualization is possible. If no other option is
highlighted, the tree can be moved by pressing the shift key
and operating the mouse (see forge menu). Similarly, picking
a bond of a branch causes the entire branch to be
highlighted. If entries are attributed to the tree data the
ones belonging to the picked branch are displayed in
addition to the tree. The shown entries are matched onto the
center entry of the branch. If the branch has many elements
and the matching requires much computing, this option may
take some time before the entries are displayed. Matching
can be omitted with the M<->m toggle option.
The entries of a branch, or of a selection can be examined.
Picking an entry with the left hand (middle) mouse button
adds it to (removes it from) the selection. Selected entries
show up in half-bond representation, others in their natural
color. While the mouse changes the view as usual, keeping the
'shift\q button pressed causes the single entries to rotate
about their fixed atom (if defined) or about their
individual centroid. By this token the whole set can
conveniently be examined, especially so, if the entries have
been spread apart beforehand. Leaving the menu with x
removes the spreading, while q keeps the current spread.
Match Utility [mch]
g: geometry display
f: forge structures
t: handling of transformations
From some of the matching procedures Moloc saves a
final transformation. Here, several possibilities are
offered to manipulate, apply, store, or retrieve these
transformations.
r: relocate entry with current transformation.
All designated structures are transformed with the cur
rent transformation, which may either originate from a
matching process, or be defined in the preceding
option.
m: pair-wise matching
The structure picked first will be matched on top of the
one picked second. Upon leaving this option, the final
(cumulative) transformation is kept and becomes the
current transformation. It may be applied to other
entries (e.g. to move inhibitors with a protein) or can
be stored onto file for later use.
i: matching structures of identical topology
Several structures can be superimposed (multi-match).
All specified structures have to be of the same topol
ogy. The set of atoms used for superposition is only
identified in a single entry, which will be the target
(not moved). No resulting transformation is kept upon
leaving the option!
j: matching identical proteins
Same as option i, but specification only residue wise.
w: weighted match of Calf structures
Rmsd of two Calf structures of identical size is calcu
lated. The matching weight for each residue is taken
from a sequence file. This file must contain lines with
the key #w which contain for each residue a character.
If that character is numeric (= n) it is interpreted as
weight according to w = 0.5**n. Otherwise the residue
is omitted from the match. The transformation for the
optimal match is returned as the current transforma
tion, without actually having been applied.
s: substructure matching
This is a multi-match of several structures containing
an identical substructure element. Only the substruc
tures are superimposed.
a: calculate partial atomic charges
For the selected entries partial atomic charges are
calculated (see MAB force field).
p: pharmacophor matching
Coarse matches based of pharmacophoric features can be
performed here. If charges are to be considered in the
match, atomic charges need to be given (either from
file or by calculating them in advance for both entries
with the corresponding option).
k: charge matching
(See also option p!) A match of charge distributions on
two sets of atoms from two entries is made by firstly
putting the two centers of absolute charge on top of
each other. The mutual orientation is governed by the
comparison of dipole and quadrupole moments. The rela
tive weight of dipole to quadrupole influence is
obtained by multiplying the dipole moment with the
radius of gyration of the charge distribution, aug
mented by a typical van der Waals distance (3 A). The
first entry picked will be moved. The definition of the
atom set to be used in the charge distribution evalua
tion, is requested upon picking the entries.
d: calculate diversity matrix of a set of entries
The members of a set of entries can be mutually compared
with respect to their pharmacophoric properties.
c: cluster analysis of similarity file
A previously written similarity file can be read. A
clustering tree is generated and can be analyzed. This
is also useful for reanalyzing diversity runs without
having to redo the calculation.
Handling Transformations
f: file type
There are two file formats for storing or reading
transformations the user can choose from, a Moloc for
mat (extension .trf) and a format used by the crystal
lography program 'O\q (extension .tro).
g: get transformation from file
For the transformation file to be recognized as a .trf
format, it must contain three lines defining the trans
formation. Each line should contain an index (1 to 3),
followed by three components of the rotation matrix and
a component of the translation vector. Lines not con
forming with this format are interpreted as comments.
s: store current transformation onto file
For formats see option f.
m: multiply two transformations
The user is asked with a selector to identify the trans
formation to be applied first and the one to be applied
second. The resulting product will be the current
transformation.
i: invert current transformation
The inverted transformation to the current one is cal
culated, and becomes the new current transformation.
c: choose current transformation
From a selection menu of all previously defined trans
formations the one to become the current one can be
selected.
d: delete transformations
q: quit
No transformation is returned, i.e. the calling program
may keep its previous current transformation.
Pair-wise Matching
The following items only show up if the two structures are
proteins or Calf structures.
c: clear all specifications
g: geometry display
f: forge structures
a: atom-wise specification
Pairs of atoms have to be picked by left-hand button
clicks. By clicking an atom with the middle button, all
specifications containing this atom are removed.
r: do rigid-body match
For the set of currently specified pairs of atoms a
rigid-body match is performed. If specifications are
altered this step can be repeated. All transformations
are accumulated.
i: iterative rigid body match
The rigid-body match is performed with individual
weights assigned to each atom pair. These are initially
set to one and after each match reset to 1/(1+(d/
a)**4), where d is the distance of the pair after the
match and a is a characteristic distance. This function
leads to de facto elimination of pairs that are at a
distance larger than a. The procedure is iterated until
the change of weights becomes negligible.
d: set critical distance
The value of the critical distance a for the iterative
match can be altered here (see previous menu item).
m: mono-flexible match
After this choice the force field menu is entered with
the entry to be moved set exclusively in the active sta
tus. The match constraints are taken over, and appear
as positional constraints for the minimization. When
this menu has been chosen, no transformation is
returned!
b: biflexible match
Under this option the force field is called with both
structures in the active state. However, no mutual
interaction is calculated such that the two structures
can freely penetrate each other. The constraints are
applied during minimization. Thus, both structures try
to adapt their conformation to optimally obey the con
straints. No transformation is returned in this case.
s: sequence-wise specifications
For a pair of Calf structures the specifications can be
done for two segments of the sequences. Specifications
are done in the set menu. Upon leaving, correspondence
of pairs is assumed to be consecutive along the two
specified sequence stretches. Reverse assignment can be
chosen. If the numbers of specified residues in the two
structures are unequal no correspondence is made and
the option can be reentered with the set unchanged.
n: number-wise specification
For a pair of Calf structures residues with the same
sequence number are considered a pair. The user can
specify ranges of labels for which this criterion
should apply. If no range is given all residues are con
sidered. Ranges must be separated by commas.
h: specification by homology alignment
Pairs of corresponding Calf positions are specified by
an alignment file. The moved entry is assumed to be
specified by '#1\q, the target by \q#2\q (see homology
building).
Matching Structures of Identical Topology [mmch]
s: define set of entries to be matched
The following items are for proteins only.
Tests or matches are performed on several structures
(multi-match). The structures that take part in the
test or match must be specified here!
m: define matching sequences
Upon picking a structure, the set-tool is entered for
specification of the matching set. In case of residue-
wise specification the corresponding Calf structure is
presented. Only the atoms specified here are relevant
for the matching calculations!
d: do the matching
All specified structures are matched onto the target
used for specification of the atom set. A simultaneous
(multi-) match is performed which in general differs
from a sequence of pair-wise matches. In favorable
cases the structures are similarly positioned, however.
This is the only option where any repositioning of
structures takes place.
p: table of pair-wise matching rmsd
A table of root mean square deviations after pair-wise
matching is written for all possible pairs. However, no
structures are repositioned.
t: define testing sequences
Identical to option m, except that this set is used in
the testing options r and q.
r: calculate rmsd of testing sequence
The root mean square deviation of the positions of cor
responding atoms in all specified entries is calcu
lated. The actual positions are taken.
q: table of pair-wise rmsd
A table of root mean square deviations is written for
all possible pairs. In contrast to option p the struc
tures are taken in their actual position.
a: generate an average structure
From the structures defined under menu item s, an aver
age structure is generated. Current coordinates are
taken. The average structure contains data to charac
terize the anisotropic rmsd deviation. These data can
be visualized in analogy to anisotropic R-factor ellip
soids (option xnr in main menu).
l: print list of residue-wise rmsd
c,b,a: Calfs only, backbone atoms, all atoms
For full protein structures in the residue-wise speci
fication mode, one can choose among three possibili
ties: All atoms from a selected residue take part (a),
just the backbone atoms are taken (b), or only Calf par
ticipate (c).
Substructure Matching
l: define matching fragment by SMILES code
f: define matching fragment set on an entry
s: search for fragment
m: define set of entries to be matched
d: do the matching
Pharmacophor Matching [mchp]
t: define atom set on target structure
m: define atom set on moved structure
If only part of the target structure should participate
in the evaluation of the match function, this option
allows to specify the corresponding set. If no set is
defined, the whole structure is taken.
z: center structure (set) onto target (set)
The moved structure is translated such that the cen
troids of the moved- and target-sets coincide.
v: value of match function
Evaluates the chosen match function at the current
position and orientation of the moved structure.
l: repositioning by match optimization to local extremum
Starting from the current position and orientation, the
structure to be moved is repositioned to yield a
(local) extremum of the currently chosen matching func
tion.
s: match by multiple search
The process of option r is repeated for various initial
orientations but the same initial position of the cen
troid. For each case the final (extreme) value of the
matching function is listed together with the number of
the trial. The final position corresponds to the (par
tial) global extremum.
p: permutational search
If both structures have H-bond donors or acceptors the
pharmacophor matching is preceded by a rigid body match
of donors upon donors and acceptors upon acceptors. The
procedure is repeated, taking all possible ways of
pairings as starting points.
e: examine extreme orientations
The positioning corresponding to any of the extrema
found under option s or s can be reexamined.
r: set ranges
The match functions depend on characteristic ranges,
which determine over which distance pairs of atoms are
recognized. These ranges can be chosen independently
for overlap (O, multiplicative!), H-bond (H) and charge
(C) matching functions. In addition a range for mis
alignment of H-bond directionalities can be set.
Finally, the coverage parameter determines how closely
the initial orientations in the search option are
spaced, a larger value leads to narrower spacing and
correspondingly to a larger number of initial orienta
tions.
w: set weights
For the pharmacophor match (P) the weights which deter
mine the mix of the three components (O, H, C) can be
set. In the calculation these weights are proportioned
to yield a sum of one.
O,H,C,P: pharmacophor
Toggle for the choice of the matching function, between
overlap (O), H-bond (H), charge (C), and pharmacophor
mix (P)
+,-: maximize
Toggle for the choice of maximizing (+) the match func
tion or for minimizing (-), which more appropriately
corresponds to fitting the moved structure into a tar
get mould.
Diversity Analysis [dvrs]
0,1,2,3: positional, fixed point, axial match
Several modi of superposition can be selected. For case
0 starts by superimposing the centroids of the pair of
entries. Then the matching function is maximized by
allowing the two entries to rotate and shift freely
against each other. For all other cases the two entries
are positioned such that their identity atoms superim
pose. For case 1 the matching function is maximized by
allowing the entries to rotate freely around the iden
tity atom (i). In the remaining cases rotation is only
freed around a given axis which always runs through the
identity atom. Consequently the initial positioning is
such as to have collinear axes. In case 2 the axis is
defined by a second marked atom, the axis atom (a),
which has to be specified in advance. In case 3, a Car
bon atom is added in standard geometry at the identity
atom. It serves as the axis atom and is removed after
the match. The calculation runs fastest for cases 2 and
3, roughly a factor of 30 slower for case 1 and again a
factor of 3 slower for case 0. The calculations for this
analysis are very time consuming. For this reason it is
very advisable to narrow down the selection of struc
tures as much as possible before starting this analysis
In addition the examination of the resulting tree
becomes less transparent the larger the number of
structures!
w: set weights of matching terms and coverage parameter
The relative weight of the three possible terms in the
pharmacophoric match, overlap (vdW, o), H-bond (h), and
charge (Clb, c) are set here. Atomic charges must be
assigned in advance for the charge term to have any
effect.
e: specify entries that will participate
i,a: atoms to be identified, to specify axis
This point indicates which atom will be identified by
the assigning operation, the atom to be identified (i)
or the one to define the axis (a). Of course this has no
effect when assigning atoms by SMILES code or by mask.
l: specify atoms smiles code
The fragment specifying and containing the identical
and axial atoms can be given by its SMILES code. The
first atom in the SMILE will be the identical one, the
second the axial one. This assumes that identical and
axial atoms are connected by a bond! Furthermore the
user can choose whether the additional atoms defined by
the SMILES string should participate in the match or
not.
t: specify atoms by type
To facilitate assignment of identity- or axis atoms for
many entries, they may be specified by type, number of
ligands and number of hydrogens. All ambiguous cases
are subsequently presented in the set menu with an
automatic proposition, which may then be altered.
s: specify atoms by hand
For smaller sets of entries, individual assignment can
be chosen. All entries are presented simultaneously in
the set tool. After exit, the non- or ambiguously
defined entries are presented again.
m: specify atoms by mask
If identity or axis atoms have been predefined as user
sets (e.g. by reading the entries from a file produced
in a previous diversity calculation), these specifica
tions can be taken over by indicating the bits that
belong to the corresponding user sets. A value of -1
signifies that the corresponding atoms are not speci
fied. For default values of the mask see option f.
n: specify atoms by neighborhood
If the identity or the axis set is already defined the
other one may be specified by neighborhood. This is
useful, when the already specified atoms are singly
correlated (e.g. carbonyl oxygens).
u: undo specification
All specifications already done will be removed.
d: do the calculation
If the necessary specifications are complete, the pro
gram will start with a pair-wise mutual comparison of
the specified entries. A pharmacophor match is done for
each pair to yield a trigonal matrix of similarity val
ues between zero and one (values in % are displayed on
the text port). This similarity matrix is the basis for
the consecutive cluster analysis which is automatically
added. The calculation time depends heavily on the mode
(axial being the fastest) and on the square (!) of the
number of participating entries.
r: repeat cluster analysis with automatic assignments
This option requires that the corresponding structures
have previously been read into Moloc from a saved .mab
file. Then the program tries to associate items of the
tree (to be read from file) with entries in Moloc, and
makes all necessary assignments.
a: cluster analysis
The result of the similarity calculation can be ana
lyzed in form of a binary tree. This option is automat
ically activated after a calculation. However, previous
calculations can be looked at again by choosing this
option. This works only properly, if all definitions
used in the calculation have been reproduced. Further
more the similarity file must be read in advance in par
ent menu (mch).
c: specify corresponding cluster
If the cluster analysis is repeated without repeating
the whole calculation, the cluster (tree entry) corre
sponding to the set of entries and the specifications
has to be identified here by picking it. It must have
been read in the parent menu (mch).
f: store set of entries onto .mab file
All specified entries are written onto a .mab file for
later retrieval. If identity atoms are specified they
will carry bit zero as user set. Axis atoms carry bit
one. If omitted atoms are defined they have bit two set.
Cluster Analysis [clan]
s: print clusters of a given size
l: print clusters at a given level
A list of clusters is given, which have been separated
at the given similarity level or by a maximum size argu
ment. For each cluster the elements are listed in order
of decreasing average similarity within the cluster,
the first element being the center of the cluster. As a
result the center element of each cluster becomes visi
ble.
c: combined: print clusters at given level and size
The level criterion (option l) is applied. However,
clusters which exceed the chosen size are further split
until all subclusters are smaller. A list of these sub
clusters is printed.
D,d: (no) diversity analysis
In case D (d) a (no) diversity analysis is performed
when picking a branch. This analysis consists of a
diagonalization of the (n-dimensional) diversity matrix
of the branch. While the largest (positive) eigenvalue
is printed as indicating an average diversity (not
exceeding n-1), the remaining (n-1) eigenvalues are
shown in a graphical display, after having been multi
plied by -1. From looking at specific simple examples,
they are thought to represent how strongly a particular
component of the (n-dimensional) diversity space of the
branch is represented by the set. The number m of size
able eigenvalues (>~1) gives an indication on the mini
mal (m+1) number of members needed to represent the
pharmacological spectrum of the branch. This option is
time consuming for large branches!
v: view last branch
All currently visible entries can be separately made
visible and non-visible. The entries can be spread
apart in steps. Quitting this option leaves the spread
while exiting undoes it. The entry names can be dis
played (L,l). A toggle (T,t) switches between the visi
ble and invisible tree.
e: examine selection
The selected structures are displayed together
w: write selection to a file
The selection is written to a disc file in SMILES format
(.smi)
r: remove selection
The flags defining the selected entries are removed.
g: get selection from .smi file
The structures indicated on the .smi file obtain the
selection flag. For the remaining structures the flag
is removed.
M,m: perform match
Toggle between performing the matching (M) and not
doing so (m) when entries are shown together. Because
matching may be quite time consuming, it may be favor
ably omitted when viewing large branches.
View Branch
q: quit (keep spread)
s,u: spread, unspread
The entries can be spread apart in steps with positive
(spread) or negative steps (unspread).
T,t: tree
The binary tree can be made visible (T) or invisible (t)
by changing this toggle switch.
L,l: label
The entry labels can be made visible (L) or invisible
(l) by changing this toggle switch.