Hi Étienne,
Of course, it would be easiest to include the components.cif file in
libcifpp. However, this file changes weekly and it is huge. Two good
reasons not to do it.
What I did for density-fitness e.g. is include a subset of
components.cif large enough for testing. That way, you can avoid having
to download the entire CCD file for just some simple tests on basic
proteins. Have a look at the density-fitness debian/tests files, perhaps
this may help you solve your problem.
There are similar ways to provide a dummy components.cif file. See
https://pdb-redo.github.io/libcifpp/resources.html for more information.
As an alternative, I'm thinking about packaging a subset with libcifpp.
Dictionaries can be stacked already, so having a subset might be a
simple way out of this. But the question then is, what to include in the
subset. I did some counting on components in PDB entries to see if there
is a clear cut-off, but couldn't find one. And only including the
standard amino acids and nucleic acids is a bit too limited to be useful.
-maarten
Op 29-10-2024 om 22:29 schreef Étienne Mollier:
Hi Maarten,
I'm dealing with python-biopython release critical bug #1086156,
and I was wondering whether the 422M dataset components.cif
wouldn't be worth injecting into the libcifpp-data package, to
facilitate the handling of data by end users? I guess that sure
would make the package rather large, so I can understand if this
is deemed excessive; although I believe some Debian packages do
happen to be a little larger, although I don't really have
specific examples in mind.
https://bugs.debian.org/1086156
If there are blockers, then another option for me might be to
skip the test, as internet access at build time is not an
option, but it means missing out on other genuine issues
affecting libcifpp support in python-biopython. The thing is,
I'm not sure yet whether the test failure results from issues in
python-biopython or libcifpp-data.
Have a nice day, :)
--
Maarten L. Hekkelman
https://www.hekkelman.com/