[Dock-fans] DB2 gen Pipeline - failed molecules

Trent E. Balius tbalius at aol.com
Thu Mar 1 08:52:41 PST 2018

Hi Corey,

Did you apply the AMSOL patch to address co-linear atoms?


For example, this compound: Cc1c([nH+]c[nH]1)CSCCN/C(=N/C#N)/NCC#C
has co-linear atoms. 

The other two molecules are highly polar/charged.

I hope this helps,


Trent E. Balius, PhD
Postdoc, Shoichet Lab,
Dept. Pharm. Chem., UCSF
Phone: (415) 514-4253
URL: http://docking.org/~tbalius

-----Original Message-----
From: Corey Taylor <corey.taylor at uni-marburg.de>
To: dock-fans <dock-fans at docking.org>
Sent: Thu, Mar 1, 2018 7:54 am
Subject: [Dock-fans] DB2 gen Pipeline - failed molecules

Dear fellow fans of DOCK,

I've been using the pipeline for db2 file generation that comes with
DOCK 3.7 and in general it does a great job of generating molecules.
However, when trying to parameterise the KEGG dataset as downloaded from
here (http://zinc.docking.org/catalogs/keggviapc), there seems to be
quite a few molecules which not end up in the 'failed' folder (not in
and of itself a problem) but literally fail so hard the pipeline stops
running entirely.

The following command:

$DOCKBASE/ligand/generate/build_database_ligand.sh -s KEGG_test.smi

Seems to create protomers okay:

Precomputing protomers for all compounds (pH: 7.4 6.4 8.4)
ph 7.4: 1 protomers created
ph 6.4: 1 protomers created
ph 8.4: 1 protomers created
Coalesing and merging protomers
1 protomers generated for 1 compounds

But then stops upon running AMSOL:

Refusing to build conformations with > 5 rotatable hydrogens
Conformer generation failed
Skipping ZINC08214483 0

Logs in /failed/ZINC08214483 seem to be generated so presumably AMSOL
itself runs okay. The molecules which cause these problems (~1% of the
SMILES from the above link) tend to be very large molecules, lots of
stereocentres (10 or more) and/or with probably quite unusual
protonation states, as you'd expect in dataset like KEGG. Here are a
couple of examples:

C1[C@@H]([C at H]([C@@H]([C at H]([C@@H]1NC(=O)[C at H](CC[NH3+])O)O[C@@H]1[C@@H]([C at H]([C@@H]([C at H](O1)CO)O)N)O)O)O[C@@H]1[C@@H]([C at H]([C@@H]([C at H](O1)C[NH3+])O)O)O)[NH3+]
Cc1c([nH+]c[nH]1)CSCCN/C(=N/C#N)/NCC#C ZINC11616902

So my questions are:

- Is this generally the case that these molecules will fail? i.e. there
are no tweaks, parameters or options in the pipeline that will lead to
db2 files for weird molecules like these? This isn't a big problem, per
se as a lot of molecules of this nature we probably wouldn't DOCK anyway
- Is there any obvious reason why these molecules would stop the scripts
in the pipeline dead or, better, any way to avoid this? Or do we just
have to live with some crashes? Although AMSOL runs okay, perhaps if all
attempts at protomer generation fail, a downstream script ends up with
an empty variable/container to handle and crashes? You can imagine it
gets frustrating when parameterising 10K molecules, if every 100th
molecule fails and crashes the pipeline...

Of course, if s/w used in the pipeline simply will fail for any
molecules with > 5 seterocentres, phosphorus, etc., then of course I can
just write a script to omit these. Just curious if I'm missing something

Cheers guys,

Corey Taylor
Kolb Lab
Institute of Pharmaceutical Chemistry
Philipps-University Marburg
Marbacher Weg 6
35032 Marburg

Mailto: corey.taylor at uni-marburg.de
Ph: +49 6421 28 21351
Web: http://www.kolblab.org/taylor.html

Dock-fans mailing list
Dock-fans at docking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.docking.org/pipermail/dock-fans/attachments/20180301/b9cf4656/attachment.html 

More information about the Dock-fans mailing list