On Jan 10, 2019, at 15:07, Noel O'Boyle <baoille...@gmail.com> wrote:
> 
> I do appreciate that you're willing to help out with this, but I would go for 
> proposal 4.

Okay, I'm working on the patch for that, plus fix a problem where there is an 
blank output "#source" line even when there is no source filename. (It should 
be omitted.)

I want to add a regression test for these. I'm having problems figuring out how 
to generate an FPS file using the API.

I've tracked the problem down to where obabel starts with the output converter 
having an GetOutputIndex() of 1 while using the API I see that it has a value 
of 0. This causes the output format plugin initialization code to fail.

I can modify fpsformat.c so that it has a correct initializer:
  option 1: use _nbits=0 as the initializer, since the FPS format requires 
_nbits > 0
  option 2: add a new 'initialized' flag in the class

However, I noticed that the FPS format is not the only exporter with that 
issue. For example, the "fingerprintformat.cpp" code uses similar 
initialization:

    if(pConv->IsOption("h") || (pConv->GetOutputIndex()==1 && pConv->IsLast()))

I think the more correct solution is to ensure that OBConversion always starts 
with an initial output Index of 1 instead of 0. 

I don't know where that might be done. Could someone provide advice?


Here is how I diagnosed the problem.

First, I know that FPS output can be generated because 'obabel' works:

% echo "C methane" | obabel -ismi -ofps -xfMACCS
#FPS1
#num_bits=166
#type=OpenBabel-MACCS/1
#software=OpenBabel/2.4.90
#source=
#date=2019-01-14T11:09:07
000000000000000000000000000000000000008000      methane
1 molecule converted

However, I get segfaults when I try to do the same thing through pybel and the 
lower-level Python API:

  === attempt #1 ===
% cat lowlevel.py
import openbabel as ob
mol = ob.OBMol()
conv = ob.OBConversion()
conv.SetInAndOutFormats("smi", "fps")
conv.AddOption("f", ob.OBConversion.OUTOPTIONS, "MACCS");
conv.ReadString(mol, "C=O")
print("About to WriteString()")
print(conv.WriteString(mol))

% python lowlevel.py
About to WriteString()
Segmentation fault

  === attempt #2 (using the default FP2 fingerprints) ===
% python
Python 3.7.1 (default, Dec 14 2018, 13:28:58)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pybel
>>> print(pybel.readstring("smi", "C=O").write("fps"))
Segmentation fault

I thought it might be a BABEL_LIBDIR or BABEL_DATADIR issue but that does not 
seem to be that case.

lldb reports that the failure is in:

* thread #1: tid = 0x30ca43, 0x0000000103923851 
fpsformat.so`OpenBabel::FPSFormat::WriteMolecule(OpenBabel::OBBase*, 
OpenBabel::OBConversion*) + 2737, queue = 'com.apple.main-thread', stop reason 
= EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000103923851 
fpsformat.so`OpenBabel::FPSFormat::WriteMolecule(OpenBabel::OBBase*, 
OpenBabel::OBConversion*) + 2737
fpsformat.so`OpenBabel::FPSFormat::WriteMolecule:

I instrumented fpsformat.cpp so it starts:
   ===============
bool FPSFormat::WriteMolecule(OBBase* pOb, OBConversion* pConv)
{
  ostream &ofs = *pConv->GetOutStream();
  vector<unsigned int> fptvec;

  cerr << "pConv->GetOutputIndex(): " << pConv->GetOutputIndex() << endl;  // 
new instrumentation
  if(pConv->GetOutputIndex()==1)  // <--- This does the initialization
   ===============

The obabel output from this is:


% echo "C methane" | env BABEL_LIBDIR=lib obabel -ismi -ofps -xfMACCS
==============================
*** Open Babel Error  in openLib
  lib/_openbabel.so did not load properly.
 Error: dlopen(lib/_openbabel.so, 9): Symbol not found: _PyBool_Type
  Referenced from: lib/_openbabel.so
  Expected in: flat namespace
 in lib/_openbabel.so
pConv->GetOutputIndex(): 1   <---- note this line here!
 ....

However, when I run the 'lowlevel.py' script I see that GetOutputIndex() 
returns 0.

% env BABEL_LIBDIR=lib python ~/tmp/lowlevel.py
About to WriteString()
pConv->GetOutputIndex(): 0   <----- while here it has 0 instead of 1!

The initialization code in fpsformat.cpp only runs when 
pConv->GetOutputIndex()==1 . Since the first structure when using the API 
returns 0, this is not run, which means that 

    _pFP = OBFingerprint::FindFingerprint(fpid.c_str());

is never run, which means that _pFP is unassigned, so that 

  if(!_pFP->GetFingerprint(pOb, fptvec, _nbits))

causes a segmentation fault.

If I hack the initialization code so it is run with GetOutputIndex() == 0 or 1 
then I get the right output.

A "grep GetOutputIndex /src/formats/*.cpp" shows that many formats use the

  pConv->GetOutputIndex()==1

test for initialization.

I could instead change all of the plugin formats to use a new "initialized" 
flag, if that would be a more appropriate solution then making Index always be 
initialized to 1.

Cheers,

                                Andrew
                                da...@dalkescientific.com




_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to