Diversity Analysis of Pharmacophoric Properties
General remarks
Pharmacophoric properties of a molecule depend on its pharmacophores
and on their mutual spatial arrangement, i.e. on its conformation.
In this generality Moloc can compare molecules in given conformations by a
3d-algorithm,
that can be utilized interactively (mch/d) or with the batch program
M3dsml (see documentation).
A more summarized (2d-) description is provided by Moloc's concept of
topological pharmacophores that attributes to a molecule a single
description and thus allows to compare two molecules by a single
similarity value. This tutorial deals with this approach.
We assume that in a combinatorial chemistry reaction a particular ligand
substitution is achieved by reaction of a central scaffold with various
aldehydes. This tutorial describes the assessment of pharmacophoric
diversity of a set of aldehydes. An example set is provided by the file
moloc/dat/aldhd.mab. For theoretircal background on topological
pharmacophores see: Topological
Pharmacophore Description of Chemical Structures using MAB-Force-Field
Derived Data and Corresponding Similarity Measures.
Generate a file of Rooted Topological Pharmacophores
- Copy moloc/dat/aldhd.mab into your working directory.
- Start Moloc and read in this file '.../g/m',
preferentially by previously setting read-in entries invisible
'.../g/o'. Return to main menu.
- Go to 'mch/d' where diversity matrix calculations
are set up. (The same menu allows to also analyze such matrices.)
- Select the option showing a number (initially 0) and select the
option '5 rooted topological pharmacophore match'.
- Accept the two slider panels to perform the similarity calculation
with default parameters. The menu now displays the number 5.
- Choose option 'e' and select all structures as
participants in the calculation.
- Choose option 'l' to specify the root (point of
attachment in the substitution reaction). In the appearing box enter the
SMILES code for the aldehyde group:
- O=[CH]C for aliphatic aldehydes. Upon hitting return the program
accepts the SMILES by printing it to the text port and asks for the
next SMILES.
- Enter O=[CH]c for aromatic aldehydes.
- Terminate SMILES specification by pressing the Esc key.
- Choose option 'h' to write a file of rooted
topological pharmacophores. Suggested file name: aldhd.tpr.
- At the end of the calculation Moloc prints the size of the largest
pharmacophore 'Maximum number of agons: 10'. This number will be
needed in the following similarity calculation.
Calculation of the Pharmacophoric Similarity Matrix
- This calculation is performed by the program moloc/bin/Mtprsml.
- To get information on the usage of this program, start it without
any parameters. It will then print a help text with parameters and
switches.
- For our case the appropriate call is: 'Mtprsml -r -t10
aldhd.tpr'. The flag '-t10' advises the program to omit
pharmacophores with more than 10 agons. (The default is 8, but we have
one pharmacophore with 10 agons.)
- To find out the size of the agons in a file, run the program:
'Mtprcnt aldhd.tpr'.
- Mtprsml will produce the file 'aldhd.sml' which
provides the basis for the following similarity analyses.
Generation of the Similarity Tree
To analyze the similarity file return to Moloc where we still find
ourselves in diversity menu 'dvrs' (mch/d).
- Select 'r' (repeat cluster analysis ...) to obtain
a string request for the similarity file name.
- Click the right-hand mouse button and select the just calculated file
'aldhd.sml'.
- Now you are asked to select the cluster linkage algorithm. The default
'complete linkage' is recommended.
- Now Moloc wants to now the parameters used to calculate topological
pharmacophores. These should correspond to the ones used to produce the
file aldhd.tpr. The proposal is what you selected before. they are also
given in the first row of the .tpr file.
- After you specified the requested entry naming, Moloc presents the
cluster analysis menu 'clan'.
Now the display shows the similarity tree. The stem starts at similarity
zero and then branches with increasing similarity values, until at
similarity one its leaves represent the individual compounds.
Diversity Analysis
The purpose of diversity analysis is to come up with a set of a desired
number of entries, which show a maximum amount of pharmacophoric diversity.
For that purpose the options s, l, z allow to split up the tree in several
subbranches, what we exemplify for the case 'l'.
- Click 'l' (print clusters at a given level of
similarity) and specify the level of subtree separation. Do not yet
select centers.
On the output window you can see how many subtrees exist at the chosen
level. Repeat this point until this number is satisfies your needs. On
the display the stems of the subtrees are now shown in yellow, and their
roots carry the label (number) of the subtree.
- In addition to the tree the display now also shows the structures of
all subtree centers. In case that the tree overlaps with these structures
it can be positioned separately by operating the mouse with the shift
key pressed down.
- Adjust the output level (last option) to your needs and repeat the
previous point. You can now have subtree centers selected, to see them
within the tree. On the output a dashed line leads from each center name
to a number, indicating how many members the subtree contains. In brackets
the label of the subtree is given which allows to find it on the display.
For non-zero output specification a file 'clusters.lst' is also written.
- The view option allows to examine the entries. It's function is
described in the next section.
- To perform a diversity analysis, the option '!'
must be pressed. It then switches to the letter 'd'
which indicates that such an analysis will be performed. Then pick
the stem (bond) of the (sub)tree you are interested in. Moloc will then
calculate the spectrum of the similarity matrix and display it in a
separate 2-d bar graph. The number of eigenvalues larger then one
gives an indication about the diversity content of the (sub)tree. This
calculation may take some time for large trees!
Similarity Analysis
Similarity analysis applies to cases where a entry has desirable properties
and further entries with similar (or hopefully better) properties are
required.
- Pick the stem of a (sub)tree to have it colors in cyan and its entries
displayed. Then select 'v' (view last branch) to enter the
branch-display menu (brdspl).
- Alternative ways to arrive at that menu are options 'b'
or 'n', provided a diversity analysis has previously been
performed (options 's', 'l', or 'z').
- In all these options, superposition of the structures can be performed
by setting the match toggle option (between 'm' and 'M') to 'M'.
The branch display menu is meant to facilitate inspection
of and selection from individual branches and provides the options:
- To spread the structures apart by repeatedly clicking
's'
- To undo spreading stepwise 'u'
- To toggle on and off the tree display 't', 'T'
- To toggle on and off entry-name display 'l', 'L'
- To exit from the menu with the structures arranged in the spread
representation 'q'. Upon leaving with 'x'
all spreading is removed.
- Picking an entry with the left-hand mouse button adds it to the set of
selected structures. This is indicated by half-bond coloring. Picking with
the middle mouse button removes an entry from the selection.
- Rotation action (i.e. moving the mouse with the left-hand button pressed)
can be applied to the structures with individual origins of rotation by
pressing in addition the shift key. The individual centers of rotation depend
on the mode of superposition chosen. In the fixed atom case these atoms are
also the centers of rotation, otherwise the molecular centroids are taken,