Jason Roberts wrote:
> Hi Laurent,
> 
>> The libR is used as a shared library.
>> Under win32, and AFAIUI, that should translate as using an unbound DLL 
>> (otherwise the same version of libR will be required) and hold as long 
>> as the names used from the symbol table presented by libR.so do not
> change.
>> In the case it does, then a new(er) version of rpy2 should be available.
> 
> Ok. AFAIK that should work. I wondered why rpy did not work this way. The
> only thing I could guess is that rpy wanted to support a large range of
> versions of R, including very early versions where the R team was still
> deciding on the names and definitions of very core functions.

I think that this is mostly what happened.
Duncam with RSPython, Walter and Greg with rpy, and Simon Urbanek with 
JRI, probably caused changes in the R API.


> Hopefully
> those functions are stable now and you will not need to take the same course
> of action with rpy2.
> 
> What version of R do you use when compiling rpy2? I noticed a comment saying
> rpy2 is compatible with R 2.7.0 and later. Are you compiling using 2.7.0
> then?

I did so when compiling the first win32 builds.
Laurent Oget has been contributing the win32 builds since the release 
2.0.0b1.

>> On a related note, I'd like to offer the option to have R really 
>> embedded in rpy2 (with an R install inside the rpy2 installed module)... 
>> so if someone has the time...
> 
> This sounds interesting. It would allow Python users to call R without
> having to install R separately, and you could ensure there were no version
> compatibility problems. On the minus side, it would mean you need to release
> a new version of rpy2 whenever a new R was released.

That would be an option someone making a compiled build could switch on
(that does not mean that I will provide such builds, and this for the 
reason you mention).
This is probably of interest for standalone solutions, and will probably 
  have to wait.


> Unfortunately I do not have time to work on this, at least right now.
> 
>> Thanks. An ultimate patch would likely be a little more complex
>> (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found 
>> in the PATH... but I am sure of what PATH is needed for here - can 
>> someone with win32 try when just removing the PATH creation ?)
> 
> I would be happy to try this for you, but I'm not sure exactly what you want
> me to do. Are you unsure whether all three of those directories (bin,
> modules, and lib) need to be in the PATH? I can try them in different
> combinations and see what happens. Let me know if this is what you want.


I meant: just try commenting out the three lines
os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')

...but your code below tells why PATH is indeed needed (and my request 
irrelevant).

> I can say that with my R 2.8.1 installation, there is no directory called
> lib. There is a directory called library, but that is where all the R
> packages go. There are no binaries in there. So I would suggest that you
> could remove lib from the PATH, but before you did this, we should check all
> the versions of R back to 2.7.0. I can do that if you want.
> 
> In the modules directory, there are some shared libraries, such as
> lapack.dll. I am not sure what the difference is between lapack.dll there
> and the Rlapack.dll in the bin directory, but I do recall there being some
> issues with lapack in the past. I suggest you keep modules in the PATH.
> 
> In my wrapper around rpy, I have the following related code which might be
> of interest to you:
> 
>     # Before importing rpy, capture the PATH environment
>     # variable. rpy is going to add some R directories to it.
>     # Because R imposes a maximum length on environment
>     # variables (perhaps 1019 characters), we need to move
>     # these directories to the front of the PATH to ensure
>     # they are not truncated by the maximum length limiter.
>     # This will work around the issue described by MGET ticket
>     # #286.
> 
>     oldPath = os.environ['PATH'].split(os.pathsep)
> 
>     # Now import rpy.
> 
>     from GeoEco.AssimilatedModules.rpy import rpy
>     RDependency._rpy = rpy
> 
>     # Move the paths that rpy appended to the front of the
>     # PATH.
> 
>     newPath = os.environ['PATH'].split(os.pathsep)
>     newPath = os.pathsep.join(newPath[len(oldPath):] +
> newPath[:len(oldPath)])
> 
>     # To work around MGET ticket #203 (Evaluate R Statements
>     # tools fail with "lapack routines cannot be loaded" error
>     # when running a glm), set the PATH environment variable
>     # seen by the R interpreter to that seen by Python, so R
>     # sees the changes that rpy attempted to make to the PATH.
> 
>     rpy.r('Sys.setenv(PATH="%s")' % newPath.replace('\\', '\\\\'))
> 
> Finally, regarding the memory and handle leak tests:
> 
> I used ArcGIS 9.3 SP1, Python 2.5.1, R 2.8.1, rpy2 2.0.3, WinXP SP3 with
> latest updates. In ArcGIS, I created a geoprocessing model with a single
> instance of the tool I mentioned in my previous message. I configured the
> model to run 100000 times and started it. Using Windows Task Manager, I
> monitored VM Size (equivalent to Private Bytes in perfmon) and Handles.
> 
> I first ran the test a few times with these two lines of the script
> commented out:
> 
>     #from rpy2 import robjects
>     #sqrt_x = robjects.r.sqrt(x)[0]
> 
> Then I ran it again with the comments removed. This me to see if there were
> leaks when rpy2 was not even imported. Interpreting the results are
> difficult because ArcGIS exhibited a bug (how typical) in which it said the
> model was complete before the progress bar reached 100000.
> 
> Without rpy2:
> 
> Iterations  Memory    Memory  Handles   Handles
> completed   at start  at end  at start  at end
> ----------  --------  ------  --------  -------
> 7921        190 MB    395 MB   1409     1403
> 7891        395 MB    490 MB   1406     1405
> 7930        489 MB    599 MB   1405     1405
> 
> With rpy2:
> 
> Iterations  Memory    Memory  Handles   Handles
> Completed   at start  at end  at start  at end
> ----------  --------  ------  --------  -------
> 15408       599 MB    692 MB  1408      1408
> 59168       692 MB    784 MB  1409      1408
> 
> The very first time I ran this, it looks like ArcGIS allocated a 200 MB that
> it did not immediately release. I do not consider this to necessarily be a
> leak. It may have an internal allocator that is configured to hold on to a
> bunch of memory for a while. But in every subsequent run, it allocated about
> 100 MB more, including the runs with rpy2 enabled.
> 
> These results are tricky to interpret. First of all, I do not understand why
> the progress bar reported many more iterations with rpy2 enabled. It may be
> that the progress bar is broken, and that 100000 iterations completed in all
> cases, but that the script executed so quickly that progress events were
> dropped by ArcGIS, or something like that. This would explain why more
> iterations were reported with rpy2, because the script would go slower and
> not overwhelm the progress bar as much. It would also explain why about the
> same amount of memory is leaked with and without rpy, regardless of the
> number of iterations completed.
> 
> In any case, it does not appear that substantially more memory was leaked
> with rpy2 enabled. This is a good sign, and because of this, I'm not going
> to bother trying to determine whether the progress bar is broken or ArcGIS
> is truly halting the iteration before 100000 is reached. In either
> situation, there is a bug with ArcGIS, not rpy2. ArcGIS has always been a
> buggy program, despite its popularity.

Isn't GRASS a worthy Open Source alternative to it ? (I am not so much 
into GIS, so you will know better - I am just being curious here)

> Finally, it is clear that no handles are leaked.

Glad to hear that.

> There is probably at least one place in rpy2 that is leaking a module
> handle, in rinterface/__init__.py:
> 
>     win32api.LoadLibrary( Rlib )
> 
> This will not cause a handle leak in the usual sense. Instead it will just
> cause the process's internal reference count for R.dll to increment every
> time rpy2 is imported. This is sub-optimal, but there is probably little
> harm. The reference leak will prevent R.dll from ever being unloaded but
> given that rpy2 and Python itself do not shut down very cleanly, it might be
> very hard to achieve proper unloading of R.dll anyway. I don't think you
> need to address this.

It doesn't harm to do things cleanly either.
Do not hesitate to share what would be better if you have it available.

> These results look pretty good to me. I am going to investigate integrating
> rpy2 into our application!

Good.
Let us know how it goes.


L.





> Jason
> 
> -----Original Message-----
> From: Laurent Gautier [mailto:lgaut...@gmail.com] 
> Sent: Friday, March 20, 2009 3:51 AM
> To: Jason Roberts
> Cc: 'RPy help, support and design discussion list'
> Subject: Re: FW: rpy2 in ArcGIS 9.3
> 
> Jason Roberts wrote:
>> Laurent,
>>
>> Thank you very much for the reply.
>>
>>> I am not certain of which way the risk probability stand (compile each
>>> time, or compile once and hope for the best). Time will tell.
>> So rpy2 does not require recompilation every time R is released? How is it
>> binding to R then? (I have not looked at the C code yet. If you can just
>> point me in the right direction I can figure it out myself.)
> 
> The libR is used as a shared library.
> Under win32, and AFAIUI, that should translate as using an unbound DLL 
> (otherwise the same version of libR will be required) and hold as long 
> as the names used from the symbol table presented by libR.so do not change.
> In the case it does, then a new(er) version of rpy2 should be available.
> Admittedly not an absolute perfect options, but I wanted to avoid
> version-specific conditional definitions in the code; rpy had it, but I 
> had to start from a simple base. This does not mean this aspect of rpy 
> will not be added in the future, but I'd like to explore options first.
> 
> On a related note, I'd like to offer the option to have R really 
> embedded in rpy2 (with an R install inside the rpy2 installed module)... 
> so if someone has the time...
> 
>>> You could try with a dummy minimal extension to ArcGIS and tell us.
>> I tried this out using ArcGIS 9.3 SP1, Python 2.5.1 (comes with ArcGIS
> 9.3),
>> and rpy2-2.0.3.win32-py2.5.exe. I created a Python-based geoprocessing
> tool
>> with the following code to exercise rpy2 in a minimal way:
>>
>> # Initialize the ArcGIS geoprocessor object, so we can communicate
>> # with ArcGIS.
>>
>> import arcgisscripting
>> gp = arcgisscripting.create()
>>
>> # Using rpy2, calculate the square root of the input parameter. If we
>> # catch an exception, report a traceback to ArcGIS.
>>
>> import os, traceback
>> try:
>>     x = gp.GetParameter(0)
>>     from rpy2 import robjects
>>     sqrt_x = robjects.r.sqrt(x)[0]
>> except:
>>     gp.AddError(traceback.format_exc())
>>     raise
>>
>> It worked (!!!) and the performance appeared to be quite good. I am
> running
>> it in a loop now to check for leaks. I'll send a followup on that later.
> 
> If you are having an issue, check the following:
> http://www.mail-archive.com/rpy-list@lists.sourceforge.net/msg01696.html
> 
> 
>> There was one problem that I noticed immediately. Currently, line 37 of
>> rinterface/__init.py__ blindly adds R directories to the PATH:
>>
>> # Win32-specific code copied from RPy-1.x
>> if sys.platform == 'win32':
>>     import win32api
>>     os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
>>     os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
>>     os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')
> 
> I see.
> 
>> The new PATH is persisted in the environment of the calling ArcGIS
> process.
>> When that process initializes the Python interpreter a second time, this
>> code is called again, adding duplicate entries to PATH. This can go on
> until
>> the PATH reaches 32767 characters, and then putenv will raise an OSError.
> In
>> my case, my tool ran 335 times before this occurred. I observed the
> problem
>> happen by adding additional logging statements to my minimal example
> above,
>> and watched the len(os.environ['PATH']) grow close to 32767 before putenv
>> failed.
>>
>> To fix, something like this is appropriate:
>>
>> # Win32-specific code copied from RPy-1.x
>> if sys.platform == 'win32':
>>     import win32api
>>     if os.path.join(R_HOME, 'bin') not in os.environ['PATH'].split(';'):
>>         os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin')
>>     if os.path.join(R_HOME, 'modules') not in
> os.environ['PATH'].split(';'):
>>         os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules')
>>     if os.path.join(R_HOME, 'lib') not in os.environ['PATH'].split(';'):
>>         os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib')
> 
> Thanks. An ultimate patch would likely be a little more complex
> (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found 
> in the PATH... but I am sure of what PATH is needed for here - can 
> someone with win32 try when just removing the PATH creation ?)
> 
>> I'm currently running it 100000 times, monitoring memory and handles. I'll
>> let you know how it turns out.
>>
>> I'm pretty hopeful this will work out well. There could be problems with R
>> packages that do fancy things (like link to other C libraries) but even if
>> that's a problem, just having the ability to do basic R from ArcGIS 9.3 in
> a
>> performant manner will be very, very nice for us and our users.
>>
>> Jason
>>
>>
>> -----Original Message-----
>> From: Laurent Gautier [mailto:lgaut...@gmail.com] 
>> Sent: Thursday, March 19, 2009 1:57 AM
>> To: RPy help, support and design discussion list
>> Cc: Jason Roberts
>> Subject: Re: FW: rpy2 in ArcGIS 9.3
>>
>> Jason Roberts wrote:
>>> Greetings rpy2 developers,
>>>
>>>  
>>>
>>> I am the primary developer of an open source Python package called 
>>> Marine Geospatial Ecology Tools 
>>> (http://code.env.duke.edu/projects/mget). These tools perform various 
>>> jobs that are useful to marine ecologists. Many of the tools are 
>>> designed to be invoked from ArcGIS, a desktop GIS application that runs 
>>> on Windows.
>>>
>> rpy2 works best on UNIX-alikes at the moment.
>> (features are not working on win32).
>>
>>> To date, we have had good success accessing R using rpy. Thank you very 
>>> much for making this package freely available.
>> I can't take those credits:
>> rpy is Walter and Greg's work, with the help of contributors.
>>
>>> But we noted last year 
>>> that rpy is no longer being maintained, and rpy2 is the new replacement.
>> Kind of. I started with rpy2 about a year ago, as what I was trying to
>> do did not appear possible with rpy. Rpy is still available, although
>> its development on the slow lane at the moment, I think.
>>
>>> It will be a big job for us to switch to rpy2, so we have been delaying 
>>> the switch. In the interim, we've been compiling rpy every time a new R 
>>> release has come out. This is probably increasingly risky, so we're 
>>> becoming more motivated to make the switch.
>> I am not certain of which way the risk probability stand (compile each
>> time, or compile once and hope for the best). Time will tell.
>>
>>> In addition, there is an 
>>> ArcGIS 9.3 / rpy compatibility problem that is pretty inconvenient. 
>>> Basically we are wondering if this problem exists with rpy2.
>>>
>>>  
>>>
>>> The problem was discussed last year; see 
>>>
> http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_id
>> =48422 
>>
> <http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_i
>> d=48422>. 
>>> In brief: Every time ArcGIS 9.3 runs a Python-based tool, it initializes 
>>> a new instance of the Python interpreter in the ArcGIS process 
>>> (typically ArcCatalog.exe or ArcMap.exe). The interpreter instance 
>>> eventually loads the rpy extension module (e.g. _rpy2070.dll). The 
>>> interpreter exits when the tool completes. But this does not cause the 
>>> rpy extension module to be unloaded from the process, and when ArcGIS 
>>> runs the tool a second time, creating a new Python interpreter, rpy 
>>> fails to initialize.
>>>
>>>  
>>>
>>> In last year's bug report, lgautier mentioned that "the problem was 
>>> fixed a few weeks ago" (i.e. last summer). Is it correct then that this 
>>> procedure of initializing the interpreter, using rpy2, shutting down the 
>>> interpreter, and so on, can be done indefinitely from a single process 
>>> without any ill effects?
>>>
>> May be, may be not.
>> I have not looked at whether the C-level part of rpy2 does what it
>> should regarding the creating and destruction of Python interpreters.
>>
>> You could try with a dummy minimal extension to ArcGIS and tell us.
>>
>>
>>
>> Hoping this helps,
>>
>>
>>
>> L.
>>
>>> Thanks for your help! And thanks again to you guys for developing this 
>>> great reusable software.
>>>
>>>  
>>>
>>> Jason
>>>
>>>  
>>>
>>>  
>>>
>>> /   /
>>>
>>
> 
> 
> 
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> rpy-list mailing list
> rpy-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rpy-list


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to