Hi Laurent, > The libR is used as a shared library. > Under win32, and AFAIUI, that should translate as using an unbound DLL > (otherwise the same version of libR will be required) and hold as long > as the names used from the symbol table presented by libR.so do not change. > In the case it does, then a new(er) version of rpy2 should be available.
Ok. AFAIK that should work. I wondered why rpy did not work this way. The only thing I could guess is that rpy wanted to support a large range of versions of R, including very early versions where the R team was still deciding on the names and definitions of very core functions. Hopefully those functions are stable now and you will not need to take the same course of action with rpy2. What version of R do you use when compiling rpy2? I noticed a comment saying rpy2 is compatible with R 2.7.0 and later. Are you compiling using 2.7.0 then? > On a related note, I'd like to offer the option to have R really > embedded in rpy2 (with an R install inside the rpy2 installed module)... > so if someone has the time... This sounds interesting. It would allow Python users to call R without having to install R separately, and you could ensure there were no version compatibility problems. On the minus side, it would mean you need to release a new version of rpy2 whenever a new R was released. Unfortunately I do not have time to work on this, at least right now. > Thanks. An ultimate patch would likely be a little more complex > (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found > in the PATH... but I am sure of what PATH is needed for here - can > someone with win32 try when just removing the PATH creation ?) I would be happy to try this for you, but I'm not sure exactly what you want me to do. Are you unsure whether all three of those directories (bin, modules, and lib) need to be in the PATH? I can try them in different combinations and see what happens. Let me know if this is what you want. I can say that with my R 2.8.1 installation, there is no directory called lib. There is a directory called library, but that is where all the R packages go. There are no binaries in there. So I would suggest that you could remove lib from the PATH, but before you did this, we should check all the versions of R back to 2.7.0. I can do that if you want. In the modules directory, there are some shared libraries, such as lapack.dll. I am not sure what the difference is between lapack.dll there and the Rlapack.dll in the bin directory, but I do recall there being some issues with lapack in the past. I suggest you keep modules in the PATH. In my wrapper around rpy, I have the following related code which might be of interest to you: # Before importing rpy, capture the PATH environment # variable. rpy is going to add some R directories to it. # Because R imposes a maximum length on environment # variables (perhaps 1019 characters), we need to move # these directories to the front of the PATH to ensure # they are not truncated by the maximum length limiter. # This will work around the issue described by MGET ticket # #286. oldPath = os.environ['PATH'].split(os.pathsep) # Now import rpy. from GeoEco.AssimilatedModules.rpy import rpy RDependency._rpy = rpy # Move the paths that rpy appended to the front of the # PATH. newPath = os.environ['PATH'].split(os.pathsep) newPath = os.pathsep.join(newPath[len(oldPath):] + newPath[:len(oldPath)]) # To work around MGET ticket #203 (Evaluate R Statements # tools fail with "lapack routines cannot be loaded" error # when running a glm), set the PATH environment variable # seen by the R interpreter to that seen by Python, so R # sees the changes that rpy attempted to make to the PATH. rpy.r('Sys.setenv(PATH="%s")' % newPath.replace('\\', '\\\\')) Finally, regarding the memory and handle leak tests: I used ArcGIS 9.3 SP1, Python 2.5.1, R 2.8.1, rpy2 2.0.3, WinXP SP3 with latest updates. In ArcGIS, I created a geoprocessing model with a single instance of the tool I mentioned in my previous message. I configured the model to run 100000 times and started it. Using Windows Task Manager, I monitored VM Size (equivalent to Private Bytes in perfmon) and Handles. I first ran the test a few times with these two lines of the script commented out: #from rpy2 import robjects #sqrt_x = robjects.r.sqrt(x)[0] Then I ran it again with the comments removed. This me to see if there were leaks when rpy2 was not even imported. Interpreting the results are difficult because ArcGIS exhibited a bug (how typical) in which it said the model was complete before the progress bar reached 100000. Without rpy2: Iterations Memory Memory Handles Handles completed at start at end at start at end ---------- -------- ------ -------- ------- 7921 190 MB 395 MB 1409 1403 7891 395 MB 490 MB 1406 1405 7930 489 MB 599 MB 1405 1405 With rpy2: Iterations Memory Memory Handles Handles Completed at start at end at start at end ---------- -------- ------ -------- ------- 15408 599 MB 692 MB 1408 1408 59168 692 MB 784 MB 1409 1408 The very first time I ran this, it looks like ArcGIS allocated a 200 MB that it did not immediately release. I do not consider this to necessarily be a leak. It may have an internal allocator that is configured to hold on to a bunch of memory for a while. But in every subsequent run, it allocated about 100 MB more, including the runs with rpy2 enabled. These results are tricky to interpret. First of all, I do not understand why the progress bar reported many more iterations with rpy2 enabled. It may be that the progress bar is broken, and that 100000 iterations completed in all cases, but that the script executed so quickly that progress events were dropped by ArcGIS, or something like that. This would explain why more iterations were reported with rpy2, because the script would go slower and not overwhelm the progress bar as much. It would also explain why about the same amount of memory is leaked with and without rpy, regardless of the number of iterations completed. In any case, it does not appear that substantially more memory was leaked with rpy2 enabled. This is a good sign, and because of this, I'm not going to bother trying to determine whether the progress bar is broken or ArcGIS is truly halting the iteration before 100000 is reached. In either situation, there is a bug with ArcGIS, not rpy2. ArcGIS has always been a buggy program, despite its popularity. Finally, it is clear that no handles are leaked. There is probably at least one place in rpy2 that is leaking a module handle, in rinterface/__init__.py: win32api.LoadLibrary( Rlib ) This will not cause a handle leak in the usual sense. Instead it will just cause the process's internal reference count for R.dll to increment every time rpy2 is imported. This is sub-optimal, but there is probably little harm. The reference leak will prevent R.dll from ever being unloaded but given that rpy2 and Python itself do not shut down very cleanly, it might be very hard to achieve proper unloading of R.dll anyway. I don't think you need to address this. These results look pretty good to me. I am going to investigate integrating rpy2 into our application! Jason -----Original Message----- From: Laurent Gautier [mailto:lgaut...@gmail.com] Sent: Friday, March 20, 2009 3:51 AM To: Jason Roberts Cc: 'RPy help, support and design discussion list' Subject: Re: FW: rpy2 in ArcGIS 9.3 Jason Roberts wrote: > Laurent, > > Thank you very much for the reply. > >> I am not certain of which way the risk probability stand (compile each >> time, or compile once and hope for the best). Time will tell. > > So rpy2 does not require recompilation every time R is released? How is it > binding to R then? (I have not looked at the C code yet. If you can just > point me in the right direction I can figure it out myself.) The libR is used as a shared library. Under win32, and AFAIUI, that should translate as using an unbound DLL (otherwise the same version of libR will be required) and hold as long as the names used from the symbol table presented by libR.so do not change. In the case it does, then a new(er) version of rpy2 should be available. Admittedly not an absolute perfect options, but I wanted to avoid version-specific conditional definitions in the code; rpy had it, but I had to start from a simple base. This does not mean this aspect of rpy will not be added in the future, but I'd like to explore options first. On a related note, I'd like to offer the option to have R really embedded in rpy2 (with an R install inside the rpy2 installed module)... so if someone has the time... >> You could try with a dummy minimal extension to ArcGIS and tell us. > > I tried this out using ArcGIS 9.3 SP1, Python 2.5.1 (comes with ArcGIS 9.3), > and rpy2-2.0.3.win32-py2.5.exe. I created a Python-based geoprocessing tool > with the following code to exercise rpy2 in a minimal way: > > # Initialize the ArcGIS geoprocessor object, so we can communicate > # with ArcGIS. > > import arcgisscripting > gp = arcgisscripting.create() > > # Using rpy2, calculate the square root of the input parameter. If we > # catch an exception, report a traceback to ArcGIS. > > import os, traceback > try: > x = gp.GetParameter(0) > from rpy2 import robjects > sqrt_x = robjects.r.sqrt(x)[0] > except: > gp.AddError(traceback.format_exc()) > raise > > It worked (!!!) and the performance appeared to be quite good. I am running > it in a loop now to check for leaks. I'll send a followup on that later. If you are having an issue, check the following: http://www.mail-archive.com/rpy-list@lists.sourceforge.net/msg01696.html > There was one problem that I noticed immediately. Currently, line 37 of > rinterface/__init.py__ blindly adds R directories to the PATH: > > # Win32-specific code copied from RPy-1.x > if sys.platform == 'win32': > import win32api > os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin') > os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules') > os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib') I see. > The new PATH is persisted in the environment of the calling ArcGIS process. > When that process initializes the Python interpreter a second time, this > code is called again, adding duplicate entries to PATH. This can go on until > the PATH reaches 32767 characters, and then putenv will raise an OSError. In > my case, my tool ran 335 times before this occurred. I observed the problem > happen by adding additional logging statements to my minimal example above, > and watched the len(os.environ['PATH']) grow close to 32767 before putenv > failed. > > To fix, something like this is appropriate: > > # Win32-specific code copied from RPy-1.x > if sys.platform == 'win32': > import win32api > if os.path.join(R_HOME, 'bin') not in os.environ['PATH'].split(';'): > os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin') > if os.path.join(R_HOME, 'modules') not in os.environ['PATH'].split(';'): > os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules') > if os.path.join(R_HOME, 'lib') not in os.environ['PATH'].split(';'): > os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib') Thanks. An ultimate patch would likely be a little more complex (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found in the PATH... but I am sure of what PATH is needed for here - can someone with win32 try when just removing the PATH creation ?) > I'm currently running it 100000 times, monitoring memory and handles. I'll > let you know how it turns out. > > I'm pretty hopeful this will work out well. There could be problems with R > packages that do fancy things (like link to other C libraries) but even if > that's a problem, just having the ability to do basic R from ArcGIS 9.3 in a > performant manner will be very, very nice for us and our users. > > Jason > > > -----Original Message----- > From: Laurent Gautier [mailto:lgaut...@gmail.com] > Sent: Thursday, March 19, 2009 1:57 AM > To: RPy help, support and design discussion list > Cc: Jason Roberts > Subject: Re: FW: rpy2 in ArcGIS 9.3 > > Jason Roberts wrote: >> Greetings rpy2 developers, >> >> >> >> I am the primary developer of an open source Python package called >> Marine Geospatial Ecology Tools >> (http://code.env.duke.edu/projects/mget). These tools perform various >> jobs that are useful to marine ecologists. Many of the tools are >> designed to be invoked from ArcGIS, a desktop GIS application that runs >> on Windows. >> > > rpy2 works best on UNIX-alikes at the moment. > (features are not working on win32). > >> To date, we have had good success accessing R using rpy. Thank you very >> much for making this package freely available. > > I can't take those credits: > rpy is Walter and Greg's work, with the help of contributors. > >> But we noted last year >> that rpy is no longer being maintained, and rpy2 is the new replacement. > > Kind of. I started with rpy2 about a year ago, as what I was trying to > do did not appear possible with rpy. Rpy is still available, although > its development on the slow lane at the moment, I think. > >> It will be a big job for us to switch to rpy2, so we have been delaying >> the switch. In the interim, we've been compiling rpy every time a new R >> release has come out. This is probably increasingly risky, so we're >> becoming more motivated to make the switch. > > I am not certain of which way the risk probability stand (compile each > time, or compile once and hope for the best). Time will tell. > >> In addition, there is an >> ArcGIS 9.3 / rpy compatibility problem that is pretty inconvenient. >> Basically we are wondering if this problem exists with rpy2. >> >> >> >> The problem was discussed last year; see >> > http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_id > =48422 > <http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_i > d=48422>. >> In brief: Every time ArcGIS 9.3 runs a Python-based tool, it initializes >> a new instance of the Python interpreter in the ArcGIS process >> (typically ArcCatalog.exe or ArcMap.exe). The interpreter instance >> eventually loads the rpy extension module (e.g. _rpy2070.dll). The >> interpreter exits when the tool completes. But this does not cause the >> rpy extension module to be unloaded from the process, and when ArcGIS >> runs the tool a second time, creating a new Python interpreter, rpy >> fails to initialize. >> >> >> >> In last year's bug report, lgautier mentioned that "the problem was >> fixed a few weeks ago" (i.e. last summer). Is it correct then that this >> procedure of initializing the interpreter, using rpy2, shutting down the >> interpreter, and so on, can be done indefinitely from a single process >> without any ill effects? >> > > May be, may be not. > I have not looked at whether the C-level part of rpy2 does what it > should regarding the creating and destruction of Python interpreters. > > You could try with a dummy minimal extension to ArcGIS and tell us. > > > > Hoping this helps, > > > > L. > >> Thanks for your help! And thanks again to you guys for developing this >> great reusable software. >> >> >> >> Jason >> >> >> >> >> >> / / >> > > ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list