Hi Andi,

Thank you for your quick and positive reply :)

On Wed, 1 Feb 2012, Andi Vajda wrote:

 I have been working on integrating Apache Tika (in Java) with our open
 source intranet application (in Python/Django) using JCC...

Using Maven there helped considerably with getting all the pieces on the Java side.

Although I used maven for an initial compile of Tika, I realised that it would work just as well if I downloaded pre-built jar files, which I did from http://repo1.maven.org/maven2/org/apache/tika/.

Your remark about not needing JCC's shared library mode is probably correct right now but as soon as anyone brings in another JCC-built library into the same process as yours, shared mode is going to be required since the Java VM can only be initialized once per process.

I understand that, but I'm prepared to live with that limitation for now, as this is likely to be the only Java library that I integrate into this Python/Django application. I tried hard to find pure Python solutions, but Tika is simply miles ahead of the competition.

No objections to these patches in principle but it would be easier for me to integrate them if you could provide patches computed from the svn repository of JCC: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/ Your patches seem to be small enough so I should be able to do without but it would be nicer if I didn't have to guess...

I think the patch that I attached was already based on trunk. The git repository includes the .svn directories, points to trunk, and I generated the patch using "svn diff".

Also, please write small descriptions for these new command line flags to go into JCC's __main__.py file:
http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/jcc/__main__.py

Done, new patch attached.

This mess of setuptools patching was meant to be *temporary* until setuptools' issue 43 was fixed. As you can see, I filed this bug 3 1/2 years ago, http://bugs.python.org/setuptools/issue43, and my patch for issue 43 still hasn't been accepted, rejected, integrated, anything'ed... Dormant. For over three years.

Sorry about that. I've had similar experience with bugs reported against ubuntu, hibernate, rails... :(

 * Why does JCC use non-standard command line arguments like --build and
 --install? Can it be modified to make it easier to invoke from a
 setup.py-style environment, such as exporting a setup() function as
 setuptools does?

What standard are you referring to ?
The python extension module build/install/deploy story on Python keeps evolving... Add Python 3.x support into the mix, and the mess is complete.

Seriously, though, I think that the right thing to do to better integrate JCC with distutils/setuptools/distribute/pip/etc... is to make it into a distutils 'compiler'. This requires some work, though, and I haven't done it in all thee years. Anyone with the itch to hack on distutils is welcome to take that on.

I'm afraid I don't fully understand how distutils works, it seems to be sparsely documented, and I don't have a lot of time and energy to work on refactoring jcc. I am a bit surprised that we can't just generate a source distribution containing the jars, .cpp files and a setup.py which does the rest like any other Python extension.

I have very little itch to dabble in configure scripts either so I've been dragging my feet. If someone were to step forward with a patch for that, I'd be delighted in ripping out all this patching brittleness.

How would a configure script solve the problem and what would it have to do? Generate the .cpp files? How does it integrate with Python extensions?

That is a whole different project. If I remember correctly, the JPype project is (or was) taking that approach: http://jpype.sourceforge.net

OK, thanks.

 * Could JCC generate a source distribution (sdist) that could be
   uploaded to pypi?

You mean a source distribution that includes the Java sources of all the libraries/classes wrapped ?

I was thinking more of the jars. Something like https://github.com/aptivate/python-tika that doesn't depend on jcc any more.

 * "setup.py develop" is still broken in the current implementation

I'm not familiar with this 'develop' command nor that it is broken. What is it supposed to be doing and how is it broken ?

http://packages.python.org/distribute/setuptools.html#development-mode

It seems that when invoked this way, my setup.py (from python-tika) which calls jcc ends up creating build/_tika as a file (not a directory).

For example, this command:

  sudo pip install -e git+https://github.com/aptivate/python-tika#egg=tika

(note the -e for editable mode) results in this:

  Running setup.py develop for tika
  ...
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/src/tika/setup.py", line 108, in <module>
        cpp.jcc(jcc_args)
File "/usr/local/lib/python2.6/dist-packages/JCC-2.12-py2.6-linux-i686.egg/jcc/cpp.py", line 587, in jcc
        os.makedirs(cppdir)
      File "/usr/lib/python2.6/os.py", line 157, in makedirs
        mkdir(name, mode)
    OSError: [Errno 17] File exists: 'build/_tika'

That file appears to contain the source code for the JCCEnv.cpp wrapper.

A patch could be written to noisily emit a warning on all methods that are skipped. Silently wrapping everything would simply wrap the entire JDK by transitive closure and produce a huge library, assuming you'd have the patience to watch it compile.

The skipping of method whose signature contains types that are not on the 'wrap this' list (explicit or implicit) is by design. Not being able to request emitting a warning is a problem.

Perhaps it's useful to (automatically) emit warnings for classes in the JAR files included with --jar or an explicit class name, but not those in --include files or otherwise automatically included (e.g. the JDK classpath)?

Thank you very much for your interest and contributions !

Thanks again for your help :)

Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 760887
The Humanitarian Centre, Fenner's, Gresham Road, Cambridge CB1 2ES

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.
Index: jcc/__main__.py
===================================================================
--- jcc/__main__.py     (revision 1238664)
+++ jcc/__main__.py     (working copy)
@@ -33,6 +33,12 @@
     --vmarg                 - add extra Java VM initialization parameter
     --resources             - include resource directory in distribution as
                               package data
+    --maxheap               - set the maximum Java heap size, as passing -Xmx*
+                              using --vmarg doesn't do anything
+    --egg-info              - ask distutils setup() to generate egg info, and
+                              don't compile the module (for pip install)
+    --extra-setup-arg       - pass an extra argument on setup.py command line
+                              (pip install uses --egg-base and --record params)
 
   Python wrapper generation options:
     --python NAME           - generate wrappers for use from Python in a module
Index: jcc/python.py
===================================================================
--- jcc/python.py       (revision 1238664)
+++ jcc/python.py       (working copy)
@@ -1563,7 +1563,7 @@
 def compile(env, jccPath, output, moduleName, install, dist, debug, jars,
             version, prefix, root, install_dir, home_dir, use_distutils,
             shared, compiler, modules, wininst, find_jvm_dll, arch, generics,
-            resources, imports):
+            resources, imports, egg_info, extra_setup_args):
 
     try:
         if use_distutils:
@@ -1730,7 +1730,10 @@
             if name.endswith('.cpp'):
                 sources.append(os.path.join(path, name))
 
-    script_args = ['build_ext']
+    if egg_info:
+        script_args = ['egg_info']
+    else:    
+        script_args = ['build_ext']
 
     includes[0:0] = INCLUDES
     compile_args = CFLAGS
@@ -1840,6 +1843,7 @@
         config_vars['CFLAGS'] = ' '.join(cflags)
 
     extensions = [Extension('.'.join([moduleName, extname]), **args)]
+    script_args.extend(extra_setup_args)
 
     args = {
         'name': moduleName,
@@ -1853,4 +1857,6 @@
     if with_setuptools:
         args['zip_safe'] = False
 
+    print "setup args = %s" % args
+
     setup(**args)
Index: jcc/cpp.py
===================================================================
--- jcc/cpp.py  (revision 1238664)
+++ jcc/cpp.py  (working copy)
@@ -349,6 +349,7 @@
     build = False
     install = False
     recompile = False
+    egg_info = False
     output = 'build'
     debug = False
     excludes = []
@@ -372,7 +373,9 @@
     arch = []
     resources = []
     imports = {}
-
+    extra_setup_args = []
+    initvm_args = {'maxstack': '512k'}
+    
     i = 1
     while i < len(args):
         arg = args[i]
@@ -398,6 +401,9 @@
             elif arg == '--vmarg':
                 i += 1
                 vmargs.append(args[i])
+            elif arg == '--maxheap':
+                i += 1
+                initvm_args['maxheap'] = args[i]
             elif arg == '--python':
                 from python import python, module
                 i += 1
@@ -414,6 +420,12 @@
             elif arg == '--compile':
                 from python import compile
                 recompile = True
+            elif arg == '--egg-info':
+                from python import compile
+                egg_info = True
+            elif arg == '--extra-setup-arg':
+                i += 1
+                extra_setup_args.append(args[i])
             elif arg == '--output':
                 i += 1
                 output = args[i]
@@ -492,9 +504,10 @@
     if libpath:
         vmargs.append('-Djava.library.path=' + os.pathsep.join(libpath))
 
-    env = initVM(os.pathsep.join(classpath) or None,
-                 maxstack='512k', vmargs=' '.join(vmargs))
+    initvm_args['vmargs'] = ' '.join(vmargs)
 
+    env = initVM(os.pathsep.join(classpath) or None, **initvm_args)
+
     typeset = set()
     excludes = set(excludes)
 
@@ -504,7 +517,7 @@
         else:
             raise ValueError, "--shared must be used when using --import"
 
-    if recompile or not build and (install or dist):
+    if recompile or not build and (install or dist or egg_info):
         if moduleName is None:
             raise ValueError, 'module name not specified (use --python)'
         else:
@@ -512,7 +525,8 @@
                     install, dist, debug, jars, version,
                     prefix, root, install_dir, home_dir, use_distutils,
                     shared, compiler, modules, wininst, find_jvm_dll,
-                    arch, generics, resources, imports)
+                    arch, generics, resources, imports, egg_info,
+                    extra_setup_args)
     else:
         if imports:
             def walk((include, importset), dirname, names):
@@ -647,12 +661,13 @@
             module(out, allInOne, done, imports, cppdir, moduleName,
                    shared, generics)
             out.close()
-            if build or install or dist:
+            if build or install or dist or egg_info:
                 compile(env, os.path.dirname(args[0]), output, moduleName,
                         install, dist, debug, jars, version,
                         prefix, root, install_dir, home_dir, use_distutils,
                         shared, compiler, modules, wininst, find_jvm_dll,
-                        arch, generics, resources, imports)
+                        arch, generics, resources, imports, egg_info,
+                        extra_setup_args)
 
 
 def header(env, out, cls, typeset, packages, excludes, generics, _dll_export):

Reply via email to