[Zinc-fans] zinc smiles search problem

John J. Irwin jji at cgl.ucsf.edu
Tue Mar 31 11:40:13 PDT 2009


Hi Wally

Thanks for your question about searching ZINC.

Walter Novak wrote:
> Hi All,
>
> I am having trouble today searching using a smiles with a tanimoto  
> cutoff. I currently get no hits for even benzene with a 50 cutoff, e.g.
>   
> c1ccccc1 50
>
> Is this a known issue right now?
>   
The real time performance of ZINC can vary, and we regret that your
search turned up no hits.  For what it's worth, I re-ran the query you
just described and was presented with 32 hits on the first page.  If you
check the "no time limit" you can get hits that may not appear without
the box checked.  Try it! Also, you won't be surprised to learn that
millions of compounds qualify by your criteria.  And that, I think, is
part of the problem.  Our search tool works best when only a few
molecules (of the 20M or so) are matched.  It feels very heavy and very
slow when many molecules match, say more than 1000.

Our recommendation for any search that causes problems is to download
all of ZINC as SMILES and perform the search locally on your own
hardware.  It should only take seconds to download to most places, and
you can run as many queries as you like on your own hardware.  Once you
have the ZINC IDs, you can come back to download the molecules, or you
can download the entire ZINC and do the subsetting yourself.

ZINC is really all about getting libraries of ready-to-dock small
molecules into people's hands with as little hassle as possible.  Much
as we would love to be the "google of chemistry", it just isn't our
focus.  For rapid searches, may I recommend a few excellent sites: 
emolecules.com, chemspider.com, pubmed.org (pubchem part), ChemDB
http://cdb.ics.uci.edu/cgibin/ChemicalSearchWeb.psp.   I am sure there
are others, and I do not mean to slight those I did not mention.

Hope this is helpful.

John




More information about the Zinc-fans mailing list