Hi Maarten, Hi Andrius,

I've been in the vicinity for a couple of years now, and yet
neither biology nor chemistry are my field, far from it.  Thank
you for helping me connecting the dots with this issue!  :)

Maarten L. Hekkelman, on 2024-10-30:
> Of course, it would be easiest to include the components.cif file in
> libcifpp. However, this file changes weekly and it is huge. Two good reasons
> not to do it.
> 
> What I did for density-fitness e.g. is include a subset of components.cif
> large enough for testing. That way, you can avoid having to download the
> entire CCD file for just some simple tests on basic proteins. Have a look at
> the density-fitness debian/tests files, perhaps this may help you solve your
> problem.
> 
> There are similar ways to provide a dummy components.cif file. See
> https://pdb-redo.github.io/libcifpp/resources.html for more information.

Thank you for the hints, as a start, I checked whether I manage
to get somewhere with your mini-ccd.cif from density-fitness.
It seemed on first sight that I may have to populate a few more
fields before being able to fulfil the requirements of
python-biopython test suite :

        test_PDB_DSSP ... 
/<<PKGBUILDDIR>>/.pybuild/cpython3_3.12/build/Bio/PDB/DSSP.py:199: UserWarning: 
        Configuration error:
        
        The attempt to retrieve compound information for "SO4" failed.
        
        This information is searched for in a CCD file called components.cif or
        components.cif.gz which should be located in one of the following 
directories:
        
        "/usr/share/libcifpp"
        "/var/cache/libcifpp"
        "/<<PKGBUILDDIR>>/libcifpp-data"
        "/usr/share/libcifpp"
        "/usr/share/libcifpp"
        
        (Note that you can add a directory to the search paths by setting the 
        LIBCIFPP_DATA_DIR environmental variable)
        
        On Linux an optional cron script might have been installed that 
automatically updates
        components.cif and mmCIF dictionary files. This script only works when 
the file
        libcifpp.conf contains an uncommented line with the text:
        
        update=true
        
        If you do not have a working cron script, you can manually update the 
files
        in /var/cache/libcifpp using the following commands:
        
        curl -o /var/cache/libcifpp/components.cif 
https://files.wwpdb.org/pub/pdb/data/monomers/components.cif
        curl -o /var/cache/libcifpp/mmcif_pdbx.dic 
https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_pdbx_v50.dic
        curl -o /var/cache/libcifpp/mmcif_ma.dic 
https://github.com/ihmwg/ModelCIF/raw/master/dist/mmcif_ma.dic
        
        The current order of compound factory objects is:
        
        CCD components.cif resource
        CCD components.cif resource
        Unknown compound: SO4
        Missing compound information for SO4
        
          warnings.warn(err)
        ok

That being said, despite the incomplete subset, it allows the
test to pass, allowing the python-biopython build to go through.
I should be able to upload a fix shortly.  :)

> As an alternative, I'm thinking about packaging a subset with libcifpp.
> Dictionaries can be stacked already, so having a subset might be a simple
> way out of this. But the question then is, what to include in the subset. I
> did some counting on components in PDB entries to see if there is a clear
> cut-off, but couldn't find one. And only including the standard amino acids
> and nucleic acids is a bit too limited to be useful.

I understand how problematic it is to determine a useful subset,
and am afraid I don't believe to have any good idea.  For all I
can tell, in its present shape, the mini-ccd.cif already allows
for basic functional testing.

Have a nice day,  :)
-- 
  .''`.  Étienne Mollier <emoll...@debian.org>
 : :' :  pgp: 8f91 b227 c7d6 f2b1 948c  8236 793c f67e 8f0d 11da
 `. `'   sent from /dev/pts/2, please excuse my verbosity
   `-    on air: Patrick Moraz - Sonata in C (3rd movement Alle…

Attachment: signature.asc
Description: PGP signature

Reply via email to