On Mon, 6 Apr 2009, Jonas Maurus wrote:

seeing that last month had a thread on wrapping PDFBox, obviously
there's some demand for a fully-featured PDF library for Python :-).
So for one of my projects I started working on wrapping iText with JCC
today and I want to state that I'm really impressed! JCC rocks.

Thanks !

Besides the fact that JCC could need a "--help" parameter, it all went
very smoothly, I just ran into two small problems:

Most of the command line flags are documented here
http://lucene.apache.org/pylucene/jcc/documentation/readme.html#use
A --help flag would be nice indeed. Would you like to contribute a patch ?

 * I had to to some trial-and-error recompiling because JCC doesn't
include subtypes in the dependencies which means, for example, that
when you really should use a FileOutputStream, iText usually only
imports a OutputStream (probably a calculated dependency from the
libraries method signatures), so it needs some fiddling with --package
to get all necessary classes in.

Yes, unless the API you're wrapping directly states FileOutputStream, JCC can't guess that that's what you need. If you'd like to have wrappers generated for FileOutputStream but none of the classes you're already generating wrappers for mention it, you need to add it to the list of classes you want wrappers for.

The earlier link also documents this behaviour. The reason for this is to avoid runaway transitive dependency closures. The code generated by JCC can easily get huge.

Is there a good way for forcing the import of a whole package?

No, for two reasons:
  - because there is no good way to find all the classes in a whole package
    (it's limited by what can be found by your classpath)
  - again, runaway wrappers will cause runaway dependencies and a huge
    amount code, most of which not needed, to be generated.

The --package flag tells JCC to generate wrappers for classes in that package found via dependencies. If you don't mention that package and the dependency can be done away with (it's not in the superclasses or interfaces), and a method's signature depends on a class in that package, the method will be skipped. The earlier link also documents this.

The way it is now, I really don't use more than 5% of iText's API and can't really tell if the wrapper contains all necessary external classes

Two situations here:
  1. you're trying to generate wrappers for the 5% you're using:
     get to know those 5% and be sure to have all you need either in what
     these classes pull in or via --package or direct class listing

  2. you're trying to generate wrappers for the entire API (like PyLucene):
     same as above, get to know the entire API, and port all unit tests and
     samples to python to test that you've got the coverage you need
     (assuming the tests and samples have good coverage). It may also be
     likely (as is the case with PyLucene) that you might have to provide
     extension point classes when the Java API you're wrapping expects its
     users to provide subclasses or interface implementations of their own.
     (for example, PyLucene custom analyzers).

to use iText properly. I'm a bit lost there and would welcome pointers and ideas on how to do this correctly. What happens if a Python program uses multiple JCC-wrapped JVMs, would the wrapped types "itext.FileOutputStream" and "lib2.FileOutputStream" collide?

A given process may embed only one JavaVM. If you want to use multiple JCC-built extensions in the same process be sure to use shared mode:
http://lucene.apache.org/pylucene/jcc/documentation/install.html#shared

I also haven't started to identify any iterables or sequences that can
be made "pythonic", using JCC's built-in extensions. Is there a good
way to grep for those?

No, not really. You have to know the API you're generating wrappers for to pick out these. If you don't know them, then you're not missing them and your question is moot unless you're trying to generate wrappers for an entire API. Getting to know intimately the API you're wrapping is going to make it a much more usable Python extension in the long run.

 * when compiling the extension, JCC generated the following code:
namespace com {
   namespace lowagie {
       namespace text {
           namespace pdf {
               [...]
                   static PdfName *DOMAIN;
               [...]

which led to this error:
build/_itext/com/lowagie/text/pdf/PdfName.h:173: error: expected ';'
before numeric constant

This happens, because "DOMAIN", unfortunately, collides with the macro
   #define DOMAIN 1
in <math.h>.

Yes, this is a well known problem. JCC has a hardcoded list of words that can lead to such unfortunate collisions. I need to add another command line argument that makes it possible to add more such reserved words. Any use of words in this list is mangled.
Currently, that list of words is in cpp.py, line 71:

  RESERVED = set(['delete', 'and', 'or', 'not', 'xor', 'union', 'NULL',
                  'register', 'const', 'bool', 'operator'])

JCC 2.2 does a much better job than earlier versions at this already.

Andi..

Reply via email to