Mining a Database of Topological Pharmacophores
General remarks
Moloc offers the possibility to generate databases
of topological pharmacophores. These databases can be ranked against the
pharmacophore description of lead structures by applying a similarity measure
between pharmacophores. Since this similarity is based on relating pharmacophoric
units (agons), the results of the database search can be displayed in appropriate
orientation with respect to the lead structure. In the following the steps in such
a search are described one by one.
Example files can be found in Moloc's dat directory:
- nma.mab as lead structure (N-methyl-adenine)
- mdcm.tdt as database SMILES file (~9000 structures)
- To obtain the corresponding topological pharmacophore file mdcm.tpr
and structure file mdcm.mab issue the command 'Mtprgn -lg mdcm.tdt'.
This calculation will take some time (~hour).
Generation of the Lead Pharmacophore File (.tpr)
- If your lead is a chemical structure, read in this structure
from the appropriate file (.../g) or build it within the
build utility.
- Make sure the H-counts of the heavy atoms correspond to the desired protonation
(dTp/h with the entry in the active state).
- Enter the pharmacophore modeling menu 'php and select
'w'.
- Specify the desired lead structure and provide the requested file name (we call it
'lead').
- Moloc now asks to specify the parameters values with which the topological pharmacophore
description of the lead should be generated. These values should be the same as the
ones used to generate the database to be searched. They can be read from the header line
of the database .tpr file. Upon entering these parameter values Moloc writes the .tpr
file of the lead and issues the message 'File written: lead.tpr'.
- If your lead is given in the form of a 3-d pharmacophore, read in this pharmacophore
from the corresponding .php file (.../g/h) or build the desired pharmacophore within
the pharmacophore modeling utility 'php'.
- Within the pharmacophore modeling utility 'php', make sure all agons
are connected together by bonds (constraints)! This connection may contain rings and
should indicate neighborhood of agons.
- Now select option 'v', pick the pharmacophore, and provide the file
name ('lead'). Moloc issues the message 'File written: lead.tpr'.
If the lead structure (3-d pharmacophore) was built within Moloc, it is advisable to
also store it as a disk file 'lead.mab' (lead.php) to enable later comparison with the
hit structures found in the database search.
Database Search and 3-d Structure Extraction
The program to calculate similarities between topological pharmacophores is called Mtprsml,
and resides in the same directory (moloc/bin) as the Moloc executable. It takes one or two
arguments, files of topological pharmacophores, and a couple of parameters to modify the
course of the calculation. The usage of the program is described by calling it without
any argument.
In our case a possible call would be:
Mtprsml -t10 -b100 -n1 -o l1.sml base.tpr lead.tpr
with the following explanations:
- -t10 : keep pharmacophores with up to 10 agons
- -b100 : keep the 100 best hits
- -n1 : use lead as substructure, i.e. do not penalize for not-matching agons of database
entries
- -o : call output file l1.sml
- base.tpr : file containing the pharmacophores of the database
- lead.tpr : file of the pharmacophore description of the lead
When the list of 100 top hits has been generated, it is possible to produce a structure
file (.mab), provided the structures of the database are also available in a multi-structure
file, called e.g. 'base.mab'. The program Mabxtr will extract the structures of the hits
upon issuing the command:
Mabxtr l1.sml base.mab
This command will produce a file 'l1.mab' which contains the structures
in the order in which they appear in the hit-list file l1.sml.
Superimposition of Hits onto Lead and Hit Examination
For a visual examination of the hits of the database search the following steps may be
followed:
- Start Moloc with the lead structure with the command 'Moloc lead.mab'.
- Read in the structures of the hit list:
- Enter the options menu of the 'get file' menu '.../g/o'.
- Set status to 'invisible' and the color-default e.g. to 'yellow'.
- Leave options setup.
- Read in the structures of the hit list 'l1.mab' (option 'm'). They will
remain invisible.
- Return to the main menu.
- Match the hit structures onto the lead structures.
- Enter the match menu and set options for the pharmacophore match 'mch/o',
in our case 'target sub' and e.g. 'rigid'.
- Choose the pharmacophore match option 'q'.
- Specify the structures to take part in the match with the appearing selector (in our case
all).
- From the now appearing requester with the just specified structures select the target
(the lead in cyan).
- Specify the parameters for the generation of pharmacophores and for the similarity
calculations as used for the search.
- Moloc now matches the hits onto the target.
- Exit the match menu.
- Enter the library menu 'lib' to define the hit library and to browse
through the hits.
- Select 'n' to define a library from a file.
- specify the file (l1.sml) of the hit list.
- Give the library a name (e.g. l1) when requested.
- You need not keep the similarity value nor to rearrange the entries which are already
sorted according to similarity values.
- Select the browse option 'b' and specify the just defined library. This
will enable you to toggle through the hits with the lead structure also visible (if desired).
- Read the help 'Ctrl n' to find out about the possible ways to cycle
through the structures.
- Select structures of interest via the (un)mark option '(Shift) m'.
- Store library of marked entries onto a file.
Build a Database of Topological Pharmacophores
The program Mtprgn is provided to generate a database of topological pharmacophores which
consists simply as a .tpr file containing the entries of the database. As with most batch
programs coming with Moloc, its usage is printed to the screen by issuing the command
'Mtprgn' without any arguments.
Input
Several input formats are possible:
- SMILES input, a file in .tdt format (DAYLIGHT) or a table file containing on each line
as first item the SMILES code of the structure and (optionally) as second item a entry name.
If the second column is missing the SMILES code will be taken as identifier, as also in the
case of a .tdt file.
- .sd-file input (MDL). To use this mol-file format, a flag -s (-S) has
to be set.
- .mab-file input (Moloc format). This format is useful, when specific protonations are
needed, which are contained in the structures of the file. For reruns with different
parameter values this format is very convenient, because preparing calculations (protonation)
need not be repeated. .mab files may be generated in the first round (option
-g), and are useful for later 3-d structure retrieval (see above).
The flag -M specifies this format.
Multi-Fragment Structures and Protonation
Databases often contain multi-fragment structures (e.g. salts). Since topological
pharmacophores are single-fragment objects, a flag -l (-L) allows to
remove all but the largest fragment before the actual transformation to a topological
pharmacophore.
The structures (fragments) as extracted from a database (e.g. corporate) may not always
be in the proper state of protonation. This may have severe consequences for the results
of the topologiclal pharmacophore calculations, because donor- or acceptor strengths of
H-binder agons depend critically on the state of protonation. The flags
-p (-P) provide measures to influence the state of protonation. The flag
-P requires additional software (pcalc)!
Output and Examples
The output is always a .tpr file. Its name is taken from the input file name unless
otherwise specified by the flag -o.
- Mtprgn base.tdt generates a file base.tpr containing all single
fragment structures of the DAYLIGHT Thor Data Tree file 'base.tdt'. SMILES codes are
the structure identifiers.
- Mtprgn -glP7 base.smi generates files base.tpr and base.mab containing
the largest fragments of each structure of base.smi. This input file must contain the SMILES
code of every structure and optionally an identifier as a second column (e.g. Roche number).
The fragments are protonated according to the prediction of the software pcalc for a pH of
7. A 3-d conformation of the fragment is written under the same identifier into the file
base.mab.
- Mtprgn -M -b2 -d7 -o base_2_7.tpr base.mab generates files base_2_7.tpr
taking the file base.mab as input. A parametrization yielding more fine grained
pharmacophores is applied.