On 17/08/2013 20:09, Chris Swain wrote:
Is there a way to simply count the number of molecules in a file?

There is an API command OBConversion::NumInputObjects() which works with most multi-molecule formats (although it looks faulty for mol2). It is currently not exposed in the obabel interface, as far as I know, but the attached draft op uses it. Use --count like:

  obabel 10dataset.sdf -onul --count

  10dataset.sdf contains 10 molecules
  0 molecules converted

It works with a single or multiple files which can be of different formats and may be compressed. It chemically converts only the first molecule in a file and is usable in files up to 2GB: it took 30 seconds to count a 1.6GB sdf file with 517K molecules.

Note that an output format is not used but something still has to be there. Writing the functionality as an output format -ocount as suggested by Noel would be clearer but I didn't find it easy to do.

Chris
/**********************************************************************
opcount.cpp - Counts objects in files

Copyright(C) 2013 by Chris Morley
 
This file is part of the Open Babel project.
For more information, see <http://openbabel.sourceforge.net/>
 
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation version 2 of the License.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
***********************************************************************/
#include <openbabel/babelconfig.h>
#include<openbabel/op.h>
#include<openbabel/obconversion.h>

namespace OpenBabel
{

using namespace std;

class OpCount : public OBOp
{
public:
  OpCount(const char* ID) : OBOp(ID, false){}

  const char* Description(){ return 
    "Count molecules in files\n"
    "  obabel infile.xxx -onul --count;\n"
    "An output format must be present, although it is not used.\n"
    "There can be multiple input files, of various formats,\n"
    "including compressed files.\n"
    "The input format must have a SkipObjects function, which most\n"
    "multi-molecule formats do./n/n";
  }

  virtual bool WorksWith(OBBase* pOb)const{ return true; } //all OBBase objects
  virtual bool Do(OBBase* pOb, const char* OptionText, OpMap* pmap, OBConversion* pConv);
};

/////////////////////////////////////////////////////////////////
OpCount theOpCount("count"); //Global instance

/////////////////////////////////////////////////////////////////
bool OpCount::Do(OBBase* pOb, const char* OptionText, OpMap* pmap, OBConversion* pConv)
{
  // Use file name without path
  string name(pConv->GetInFilename());
  string::size_type posn = name.find_last_of("/\\");
  if(posn==string::npos)
    posn=0;

  int nOb = pConv->NumInputObjects();
  clog << name.substr(posn+1) << " contains " << nOb << " molecules" << endl;
  pConv->SetOneObjectOnly();
  return false;
}
}//namespace
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to