On 17/08/2013 20:09, Chris Swain wrote:
Is there a way to simply count the number of molecules in a file?
There is an API command OBConversion::NumInputObjects() which works with most multi-molecule formats (although it looks faulty for mol2). It is currently not exposed in the obabel interface, as far as I know, but the attached draft op uses it. Use --count like:
obabel 10dataset.sdf -onul --count 10dataset.sdf contains 10 molecules 0 molecules convertedIt works with a single or multiple files which can be of different formats and may be compressed. It chemically converts only the first molecule in a file and is usable in files up to 2GB: it took 30 seconds to count a 1.6GB sdf file with 517K molecules.
Note that an output format is not used but something still has to be there. Writing the functionality as an output format -ocount as suggested by Noel would be clearer but I didn't find it easy to do.
Chris
/********************************************************************** opcount.cpp - Counts objects in files Copyright(C) 2013 by Chris Morley This file is part of the Open Babel project. For more information, see <http://openbabel.sourceforge.net/> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 2 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. ***********************************************************************/ #include <openbabel/babelconfig.h> #include<openbabel/op.h> #include<openbabel/obconversion.h> namespace OpenBabel { using namespace std; class OpCount : public OBOp { public: OpCount(const char* ID) : OBOp(ID, false){} const char* Description(){ return "Count molecules in files\n" " obabel infile.xxx -onul --count;\n" "An output format must be present, although it is not used.\n" "There can be multiple input files, of various formats,\n" "including compressed files.\n" "The input format must have a SkipObjects function, which most\n" "multi-molecule formats do./n/n"; } virtual bool WorksWith(OBBase* pOb)const{ return true; } //all OBBase objects virtual bool Do(OBBase* pOb, const char* OptionText, OpMap* pmap, OBConversion* pConv); }; ///////////////////////////////////////////////////////////////// OpCount theOpCount("count"); //Global instance ///////////////////////////////////////////////////////////////// bool OpCount::Do(OBBase* pOb, const char* OptionText, OpMap* pmap, OBConversion* pConv) { // Use file name without path string name(pConv->GetInFilename()); string::size_type posn = name.find_last_of("/\\"); if(posn==string::npos) posn=0; int nOb = pConv->NumInputObjects(); clog << name.substr(posn+1) << " contains " << nOb << " molecules" << endl; pConv->SetOneObjectOnly(); return false; } }//namespace
------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss