Hi Laurent,

> The libR is used as a shared library.
> Under win32, and AFAIUI, that should translate as using an unbound DLL 
> (otherwise the same version of libR will be required) and hold as long 
> as the names used from the symbol table presented by libR.so do not
change.
> In the case it does, then a new(er) version of rpy2 should be available.

Ok. AFAIK that should work. I wondered why rpy did not work this way. The
only thing I could guess is that rpy wanted to support a large range of
versions of R, including very early versions where the R team was still
deciding on the names and definitions of very core functions. Hopefully
those functions are stable now and you will not need to take the same course
of action with rpy2.

What version of R do you use when compiling rpy2? I noticed a comment saying
rpy2 is compatible with R 2.7.0 and later. Are you compiling using 2.7.0
then?

> On a related note, I'd like to offer the option to have R really 
> embedded in rpy2 (with an R install inside the rpy2 installed module)... 
> so if someone has the time...

This sounds interesting. It would allow Python users to call R without
having to install R separately, and you could ensure there were no version
compatibility problems. On the minus side, it would mean you need to release
a new version of rpy2 whenever a new R was released.

Unfortunately I do not have time to work on this, at least right now.

> Thanks. An ultimate patch would likely be a little more complex
> (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found 
> in the PATH... but I am sure of what PATH is needed for here - can 
> someone with win32 try when just removing the PATH creation ?)

I would be happy to try this for you, but I'm not sure exactly what you want
me to do. Are you unsure whether all three of those directories (bin,
modules, and lib) need to be in the PATH? I can try them in different
combinations and see what happens. Let me know if this is what you want.

I can say that with my R 2.8.1 installation, there is no directory called
lib. There is a directory called library, but that is where all the R
packages go. There are no binaries in there. So I would suggest that you
could remove lib from the PATH, but before you did this, we should check all
the versions of R back to 2.7.0. I can do that if you want.

In the modules directory, there are some shared libraries, such as
lapack.dll. I am not sure what the difference is between lapack.dll there
and the Rlapack.dll in the bin directory, but I do recall there being some
issues with lapack in the past. I suggest you keep modules in the PATH.

In my wrapper around rpy, I have the following related code which might be
of interest to you:

    # Before importing rpy, capture the PATH environment
    # variable. rpy is going to add some R directories to it.
    # Because R imposes a maximum length on environment
    # variables (perhaps 1019 characters), we need to move
    # these directories to the front of the PATH to ensure
    # they are not truncated by the maximum length limiter.
    # This will work around the issue described by MGET ticket
    # #286.

    oldPath = os.environ['PATH'].split(os.pathsep)

    # Now import rpy.

    from GeoEco.AssimilatedModules.rpy import rpy
    RDependency._rpy = rpy

    # Move the paths that rpy appended to the front of the
    # PATH.

    newPath = os.environ['PATH'].split(os.pathsep)
    newPath = os.pathsep.join(newPath[len(oldPath):] +
newPath[:len(oldPath)])

    # To work around MGET ticket #203 (Evaluate R Statements
    # tools fail with "lapack routines cannot be loaded" error
    # when running a glm), set the PATH environment variable
    # seen by the R interpreter to that seen by Python, so R
    # sees the changes that rpy attempted to make to the PATH.

    rpy.r('Sys.setenv(PATH="%s")' % newPath.replace('\\', '\\\\'))

Finally, regarding the memory and handle leak tests:

I used ArcGIS 9.3 SP1, Python 2.5.1, R 2.8.1, rpy2 2.0.3, WinXP SP3 with
latest updates. In ArcGIS, I created a geoprocessing model with a single
instance of the tool I mentioned in my previous message. I configured the
model to run 100000 times and started it. Using Windows Task Manager, I
monitored VM Size (equivalent to Private Bytes in perfmon) and Handles.

I first ran the test a few times with these two lines of the script
commented out:

    #from rpy2 import robjects
    #sqrt_x = robjects.r.sqrt(x)[0]

Then I ran it again with the comments removed. This me to see if there were
leaks when rpy2 was not even imported. Interpreting the results are
difficult because ArcGIS exhibited a bug (how typical) in which it said the
model was complete before the progress bar reached 100000.

Without rpy2:

Iterations  Memory    Memory  Handles   Handles
completed   at start  at end  at start  at end
----------  --------  ------  --------  -------
7921        190 MB    395 MB   1409     1403
7891        395 MB    490 MB   1406     1405
7930        489 MB    599 MB   1405     1405

With rpy2:

Iterations  Memory    Memory  Handles   Handles
Completed   at start  at end  at start  at end
----------  --------  ------  --------  -------
15408       599 MB    692 MB  1408      1408
59168       692 MB    784 MB  1409      1408

The very first time I ran this, it looks like ArcGIS allocated a 200 MB that
it did not immediately release. I do not consider this to necessarily be a
leak. It may have an internal allocator that is configured to hold on to a
bunch of memory for a while. But in every subsequent run, it allocated about
100 MB more, including the runs with rpy2 enabled.

These results are tricky to interpret. First of all, I do not understand why
the progress bar reported many more iterations with rpy2 enabled. It may be
that the progress bar is broken, and that 100000 iterations completed in all
cases, but that the script executed so quickly that progress events were
dropped by ArcGIS, or something like that. This would explain why more
iterations were reported with rpy2, because the script would go slower and
not overwhelm the progress bar as much. It would also explain why about the
same amount of memory is leaked with and without rpy, regardless of the
number of iterations completed.

In any case, it does not appear that substantially more memory was leaked
with rpy2 enabled. This is a good sign, and because of this, I'm not going
to bother trying to determine whether the progress bar is broken or ArcGIS
is truly halting the iteration before 100000 is reached. In either
situation, there is a bug with ArcGIS, not rpy2. ArcGIS has always been a
buggy program, despite its popularity.

Finally, it is clear that no handles are leaked.

There is probably at least one place in rpy2 that is leaking a module
handle, in rinterface/__init__.py:

    win32api.LoadLibrary( Rlib )

This will not cause a handle leak in the usual sense. Instead it will just
cause the process's internal reference count for R.dll to increment every
time rpy2 is imported. This is sub-optimal, but there is probably little
harm. The reference leak will prevent R.dll from ever being unloaded but
given that rpy2 and Python itself do not shut down very cleanly, it might be
very hard to achieve proper unloading of R.dll anyway. I don't think you
need to address this.

These results look pretty good to me. I am going to investigate integrating
rpy2 into our application!

Jason

-----Original Message-----
From: Laurent Gautier [mailto:lgaut...@gmail.com] 
Sent: Friday, March 20, 2009 3:51 AM
To: Jason Roberts
Cc: 'RPy help, support and design discussion list'
Subject: Re: FW: rpy2 in ArcGIS 9.3

Jason Roberts wrote:
> Laurent,
> 
> Thank you very much for the reply.
> 
>> I am not certain of which way the risk probability stand (compile each
>> time, or compile once and hope for the best). Time will tell.
> 
> So rpy2 does not require recompilation every time R is released? How is it
> binding to R then? (I have not looked at the C code yet. If you can just
> point me in the right direction I can figure it out myself.)

The libR is used as a shared library.
Under win32, and AFAIUI, that should translate as using an unbound DLL 
(otherwise the same version of libR will be required) and hold as long 
as the names used from the symbol table presented by libR.so do not change.
In the case it does, then a new(er) version of rpy2 should be available.
Admittedly not an absolute perfect options, but I wanted to avoid
version-specific conditional definitions in the code; rpy had it, but I 
had to start from a simple base. This does not mean this aspect of rpy 
will not be added in the future, but I'd like to explore options first.

On a related note, I'd like to offer the option to have R really 
embedded in rpy2 (with an R install inside the rpy2 installed module)... 
so if someone has the time...

>> You could try with a dummy minimal extension to ArcGIS and tell us.
> 
> I tried this out using ArcGIS 9.3 SP1, Python 2.5.1 (comes with ArcGIS
9.3),
> and rpy2-2.0.3.win32-py2.5.exe. I created a Python-based geoprocessing
tool
> with the following code to exercise rpy2 in a minimal way:
> 
> # Initialize the ArcGIS geoprocessor object, so we can communicate
> # with ArcGIS.
> 
> import arcgisscripting
> gp = arcgisscripting.create()
> 
> # Using rpy2, calculate the square root of the input parameter. If we
> # catch an exception, report a traceback to ArcGIS.
> 
> import os, traceback
> try:
>     x = gp.GetParameter(0)
>     from rpy2 import robjects
>     sqrt_x = robjects.r.sqrt(x)[0]
> except:
>     gp.AddError(traceback.format_exc())
>     raise
> 
> It worked (!!!) and the performance appeared to be quite good. I am
running
> it in a loop now to check for leaks. I'll send a followup on that later.

If you are having an issue, check the following:
http://www.mail-archive.com/rpy-list@lists.sourceforge.net/msg01696.html


> There was one problem that I noticed immediately. Currently, line 37 of
> rinterface/__init.py__ blindly adds R directories to the PATH:
> 
> # Win32-specific code copied from RPy-1.x
> if sys.platform == 'win32':
>     import win32api
>     os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
>     os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
>     os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')

I see.

> The new PATH is persisted in the environment of the calling ArcGIS
process.
> When that process initializes the Python interpreter a second time, this
> code is called again, adding duplicate entries to PATH. This can go on
until
> the PATH reaches 32767 characters, and then putenv will raise an OSError.
In
> my case, my tool ran 335 times before this occurred. I observed the
problem
> happen by adding additional logging statements to my minimal example
above,
> and watched the len(os.environ['PATH']) grow close to 32767 before putenv
> failed.
> 
> To fix, something like this is appropriate:
> 
> # Win32-specific code copied from RPy-1.x
> if sys.platform == 'win32':
>     import win32api
>     if os.path.join(R_HOME, 'bin') not in os.environ['PATH'].split(';'):
>         os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
>     if os.path.join(R_HOME, 'modules') not in
os.environ['PATH'].split(';'):
>         os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
>     if os.path.join(R_HOME, 'lib') not in os.environ['PATH'].split(';'):
>         os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')

Thanks. An ultimate patch would likely be a little more complex
(by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found 
in the PATH... but I am sure of what PATH is needed for here - can 
someone with win32 try when just removing the PATH creation ?)

> I'm currently running it 100000 times, monitoring memory and handles. I'll
> let you know how it turns out.
> 
> I'm pretty hopeful this will work out well. There could be problems with R
> packages that do fancy things (like link to other C libraries) but even if
> that's a problem, just having the ability to do basic R from ArcGIS 9.3 in
a
> performant manner will be very, very nice for us and our users.
> 
> Jason
> 
> 
> -----Original Message-----
> From: Laurent Gautier [mailto:lgaut...@gmail.com] 
> Sent: Thursday, March 19, 2009 1:57 AM
> To: RPy help, support and design discussion list
> Cc: Jason Roberts
> Subject: Re: FW: rpy2 in ArcGIS 9.3
> 
> Jason Roberts wrote:
>> Greetings rpy2 developers,
>>
>>  
>>
>> I am the primary developer of an open source Python package called 
>> Marine Geospatial Ecology Tools 
>> (http://code.env.duke.edu/projects/mget). These tools perform various 
>> jobs that are useful to marine ecologists. Many of the tools are 
>> designed to be invoked from ArcGIS, a desktop GIS application that runs 
>> on Windows.
>>
> 
> rpy2 works best on UNIX-alikes at the moment.
> (features are not working on win32).
> 
>> To date, we have had good success accessing R using rpy. Thank you very 
>> much for making this package freely available.
> 
> I can't take those credits:
> rpy is Walter and Greg's work, with the help of contributors.
> 
>> But we noted last year 
>> that rpy is no longer being maintained, and rpy2 is the new replacement.
> 
> Kind of. I started with rpy2 about a year ago, as what I was trying to
> do did not appear possible with rpy. Rpy is still available, although
> its development on the slow lane at the moment, I think.
> 
>> It will be a big job for us to switch to rpy2, so we have been delaying 
>> the switch. In the interim, we've been compiling rpy every time a new R 
>> release has come out. This is probably increasingly risky, so we're 
>> becoming more motivated to make the switch.
> 
> I am not certain of which way the risk probability stand (compile each
> time, or compile once and hope for the best). Time will tell.
> 
>> In addition, there is an 
>> ArcGIS 9.3 / rpy compatibility problem that is pretty inconvenient. 
>> Basically we are wondering if this problem exists with rpy2.
>>
>>  
>>
>> The problem was discussed last year; see 
>>
>
http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_id
> =48422 
>
<http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_i
> d=48422>. 
>> In brief: Every time ArcGIS 9.3 runs a Python-based tool, it initializes 
>> a new instance of the Python interpreter in the ArcGIS process 
>> (typically ArcCatalog.exe or ArcMap.exe). The interpreter instance 
>> eventually loads the rpy extension module (e.g. _rpy2070.dll). The 
>> interpreter exits when the tool completes. But this does not cause the 
>> rpy extension module to be unloaded from the process, and when ArcGIS 
>> runs the tool a second time, creating a new Python interpreter, rpy 
>> fails to initialize.
>>
>>  
>>
>> In last year's bug report, lgautier mentioned that "the problem was 
>> fixed a few weeks ago" (i.e. last summer). Is it correct then that this 
>> procedure of initializing the interpreter, using rpy2, shutting down the 
>> interpreter, and so on, can be done indefinitely from a single process 
>> without any ill effects?
>>
> 
> May be, may be not.
> I have not looked at whether the C-level part of rpy2 does what it
> should regarding the creating and destruction of Python interpreters.
> 
> You could try with a dummy minimal extension to ArcGIS and tell us.
> 
> 
> 
> Hoping this helps,
> 
> 
> 
> L.
> 
>> Thanks for your help! And thanks again to you guys for developing this 
>> great reusable software.
>>
>>  
>>
>> Jason
>>
>>  
>>
>>  
>>
>> /   /
>>
> 
> 



------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to