Jason Roberts wrote: > Hi Laurent, > >> The libR is used as a shared library. >> Under win32, and AFAIUI, that should translate as using an unbound DLL >> (otherwise the same version of libR will be required) and hold as long >> as the names used from the symbol table presented by libR.so do not > change. >> In the case it does, then a new(er) version of rpy2 should be available. > > Ok. AFAIK that should work. I wondered why rpy did not work this way. The > only thing I could guess is that rpy wanted to support a large range of > versions of R, including very early versions where the R team was still > deciding on the names and definitions of very core functions.
I think that this is mostly what happened. Duncam with RSPython, Walter and Greg with rpy, and Simon Urbanek with JRI, probably caused changes in the R API. > Hopefully > those functions are stable now and you will not need to take the same course > of action with rpy2. > > What version of R do you use when compiling rpy2? I noticed a comment saying > rpy2 is compatible with R 2.7.0 and later. Are you compiling using 2.7.0 > then? I did so when compiling the first win32 builds. Laurent Oget has been contributing the win32 builds since the release 2.0.0b1. >> On a related note, I'd like to offer the option to have R really >> embedded in rpy2 (with an R install inside the rpy2 installed module)... >> so if someone has the time... > > This sounds interesting. It would allow Python users to call R without > having to install R separately, and you could ensure there were no version > compatibility problems. On the minus side, it would mean you need to release > a new version of rpy2 whenever a new R was released. That would be an option someone making a compiled build could switch on (that does not mean that I will provide such builds, and this for the reason you mention). This is probably of interest for standalone solutions, and will probably have to wait. > Unfortunately I do not have time to work on this, at least right now. > >> Thanks. An ultimate patch would likely be a little more complex >> (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found >> in the PATH... but I am sure of what PATH is needed for here - can >> someone with win32 try when just removing the PATH creation ?) > > I would be happy to try this for you, but I'm not sure exactly what you want > me to do. Are you unsure whether all three of those directories (bin, > modules, and lib) need to be in the PATH? I can try them in different > combinations and see what happens. Let me know if this is what you want. I meant: just try commenting out the three lines os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin') os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules') os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib') ...but your code below tells why PATH is indeed needed (and my request irrelevant). > I can say that with my R 2.8.1 installation, there is no directory called > lib. There is a directory called library, but that is where all the R > packages go. There are no binaries in there. So I would suggest that you > could remove lib from the PATH, but before you did this, we should check all > the versions of R back to 2.7.0. I can do that if you want. > > In the modules directory, there are some shared libraries, such as > lapack.dll. I am not sure what the difference is between lapack.dll there > and the Rlapack.dll in the bin directory, but I do recall there being some > issues with lapack in the past. I suggest you keep modules in the PATH. > > In my wrapper around rpy, I have the following related code which might be > of interest to you: > > # Before importing rpy, capture the PATH environment > # variable. rpy is going to add some R directories to it. > # Because R imposes a maximum length on environment > # variables (perhaps 1019 characters), we need to move > # these directories to the front of the PATH to ensure > # they are not truncated by the maximum length limiter. > # This will work around the issue described by MGET ticket > # #286. > > oldPath = os.environ['PATH'].split(os.pathsep) > > # Now import rpy. > > from GeoEco.AssimilatedModules.rpy import rpy > RDependency._rpy = rpy > > # Move the paths that rpy appended to the front of the > # PATH. > > newPath = os.environ['PATH'].split(os.pathsep) > newPath = os.pathsep.join(newPath[len(oldPath):] + > newPath[:len(oldPath)]) > > # To work around MGET ticket #203 (Evaluate R Statements > # tools fail with "lapack routines cannot be loaded" error > # when running a glm), set the PATH environment variable > # seen by the R interpreter to that seen by Python, so R > # sees the changes that rpy attempted to make to the PATH. > > rpy.r('Sys.setenv(PATH="%s")' % newPath.replace('\\', '\\\\')) > > Finally, regarding the memory and handle leak tests: > > I used ArcGIS 9.3 SP1, Python 2.5.1, R 2.8.1, rpy2 2.0.3, WinXP SP3 with > latest updates. In ArcGIS, I created a geoprocessing model with a single > instance of the tool I mentioned in my previous message. I configured the > model to run 100000 times and started it. Using Windows Task Manager, I > monitored VM Size (equivalent to Private Bytes in perfmon) and Handles. > > I first ran the test a few times with these two lines of the script > commented out: > > #from rpy2 import robjects > #sqrt_x = robjects.r.sqrt(x)[0] > > Then I ran it again with the comments removed. This me to see if there were > leaks when rpy2 was not even imported. Interpreting the results are > difficult because ArcGIS exhibited a bug (how typical) in which it said the > model was complete before the progress bar reached 100000. > > Without rpy2: > > Iterations Memory Memory Handles Handles > completed at start at end at start at end > ---------- -------- ------ -------- ------- > 7921 190 MB 395 MB 1409 1403 > 7891 395 MB 490 MB 1406 1405 > 7930 489 MB 599 MB 1405 1405 > > With rpy2: > > Iterations Memory Memory Handles Handles > Completed at start at end at start at end > ---------- -------- ------ -------- ------- > 15408 599 MB 692 MB 1408 1408 > 59168 692 MB 784 MB 1409 1408 > > The very first time I ran this, it looks like ArcGIS allocated a 200 MB that > it did not immediately release. I do not consider this to necessarily be a > leak. It may have an internal allocator that is configured to hold on to a > bunch of memory for a while. But in every subsequent run, it allocated about > 100 MB more, including the runs with rpy2 enabled. > > These results are tricky to interpret. First of all, I do not understand why > the progress bar reported many more iterations with rpy2 enabled. It may be > that the progress bar is broken, and that 100000 iterations completed in all > cases, but that the script executed so quickly that progress events were > dropped by ArcGIS, or something like that. This would explain why more > iterations were reported with rpy2, because the script would go slower and > not overwhelm the progress bar as much. It would also explain why about the > same amount of memory is leaked with and without rpy, regardless of the > number of iterations completed. > > In any case, it does not appear that substantially more memory was leaked > with rpy2 enabled. This is a good sign, and because of this, I'm not going > to bother trying to determine whether the progress bar is broken or ArcGIS > is truly halting the iteration before 100000 is reached. In either > situation, there is a bug with ArcGIS, not rpy2. ArcGIS has always been a > buggy program, despite its popularity. Isn't GRASS a worthy Open Source alternative to it ? (I am not so much into GIS, so you will know better - I am just being curious here) > Finally, it is clear that no handles are leaked. Glad to hear that. > There is probably at least one place in rpy2 that is leaking a module > handle, in rinterface/__init__.py: > > win32api.LoadLibrary( Rlib ) > > This will not cause a handle leak in the usual sense. Instead it will just > cause the process's internal reference count for R.dll to increment every > time rpy2 is imported. This is sub-optimal, but there is probably little > harm. The reference leak will prevent R.dll from ever being unloaded but > given that rpy2 and Python itself do not shut down very cleanly, it might be > very hard to achieve proper unloading of R.dll anyway. I don't think you > need to address this. It doesn't harm to do things cleanly either. Do not hesitate to share what would be better if you have it available. > These results look pretty good to me. I am going to investigate integrating > rpy2 into our application! Good. Let us know how it goes. L. > Jason > > -----Original Message----- > From: Laurent Gautier [mailto:lgaut...@gmail.com] > Sent: Friday, March 20, 2009 3:51 AM > To: Jason Roberts > Cc: 'RPy help, support and design discussion list' > Subject: Re: FW: rpy2 in ArcGIS 9.3 > > Jason Roberts wrote: >> Laurent, >> >> Thank you very much for the reply. >> >>> I am not certain of which way the risk probability stand (compile each >>> time, or compile once and hope for the best). Time will tell. >> So rpy2 does not require recompilation every time R is released? How is it >> binding to R then? (I have not looked at the C code yet. If you can just >> point me in the right direction I can figure it out myself.) > > The libR is used as a shared library. > Under win32, and AFAIUI, that should translate as using an unbound DLL > (otherwise the same version of libR will be required) and hold as long > as the names used from the symbol table presented by libR.so do not change. > In the case it does, then a new(er) version of rpy2 should be available. > Admittedly not an absolute perfect options, but I wanted to avoid > version-specific conditional definitions in the code; rpy had it, but I > had to start from a simple base. This does not mean this aspect of rpy > will not be added in the future, but I'd like to explore options first. > > On a related note, I'd like to offer the option to have R really > embedded in rpy2 (with an R install inside the rpy2 installed module)... > so if someone has the time... > >>> You could try with a dummy minimal extension to ArcGIS and tell us. >> I tried this out using ArcGIS 9.3 SP1, Python 2.5.1 (comes with ArcGIS > 9.3), >> and rpy2-2.0.3.win32-py2.5.exe. I created a Python-based geoprocessing > tool >> with the following code to exercise rpy2 in a minimal way: >> >> # Initialize the ArcGIS geoprocessor object, so we can communicate >> # with ArcGIS. >> >> import arcgisscripting >> gp = arcgisscripting.create() >> >> # Using rpy2, calculate the square root of the input parameter. If we >> # catch an exception, report a traceback to ArcGIS. >> >> import os, traceback >> try: >> x = gp.GetParameter(0) >> from rpy2 import robjects >> sqrt_x = robjects.r.sqrt(x)[0] >> except: >> gp.AddError(traceback.format_exc()) >> raise >> >> It worked (!!!) and the performance appeared to be quite good. I am > running >> it in a loop now to check for leaks. I'll send a followup on that later. > > If you are having an issue, check the following: > http://www.mail-archive.com/rpy-list@lists.sourceforge.net/msg01696.html > > >> There was one problem that I noticed immediately. Currently, line 37 of >> rinterface/__init.py__ blindly adds R directories to the PATH: >> >> # Win32-specific code copied from RPy-1.x >> if sys.platform == 'win32': >> import win32api >> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin') >> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules') >> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib') > > I see. > >> The new PATH is persisted in the environment of the calling ArcGIS > process. >> When that process initializes the Python interpreter a second time, this >> code is called again, adding duplicate entries to PATH. This can go on > until >> the PATH reaches 32767 characters, and then putenv will raise an OSError. > In >> my case, my tool ran 335 times before this occurred. I observed the > problem >> happen by adding additional logging statements to my minimal example > above, >> and watched the len(os.environ['PATH']) grow close to 32767 before putenv >> failed. >> >> To fix, something like this is appropriate: >> >> # Win32-specific code copied from RPy-1.x >> if sys.platform == 'win32': >> import win32api >> if os.path.join(R_HOME, 'bin') not in os.environ['PATH'].split(';'): >> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'bin') >> if os.path.join(R_HOME, 'modules') not in > os.environ['PATH'].split(';'): >> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'modules') >> if os.path.join(R_HOME, 'lib') not in os.environ['PATH'].split(';'): >> os.environ['PATH'] += ';' + os.path.join(R_HOME, 'lib') > > Thanks. An ultimate patch would likely be a little more complex > (by checking that os.path.join(R_HOME, 'bin') is not the _first_ R found > in the PATH... but I am sure of what PATH is needed for here - can > someone with win32 try when just removing the PATH creation ?) > >> I'm currently running it 100000 times, monitoring memory and handles. I'll >> let you know how it turns out. >> >> I'm pretty hopeful this will work out well. There could be problems with R >> packages that do fancy things (like link to other C libraries) but even if >> that's a problem, just having the ability to do basic R from ArcGIS 9.3 in > a >> performant manner will be very, very nice for us and our users. >> >> Jason >> >> >> -----Original Message----- >> From: Laurent Gautier [mailto:lgaut...@gmail.com] >> Sent: Thursday, March 19, 2009 1:57 AM >> To: RPy help, support and design discussion list >> Cc: Jason Roberts >> Subject: Re: FW: rpy2 in ArcGIS 9.3 >> >> Jason Roberts wrote: >>> Greetings rpy2 developers, >>> >>> >>> >>> I am the primary developer of an open source Python package called >>> Marine Geospatial Ecology Tools >>> (http://code.env.duke.edu/projects/mget). These tools perform various >>> jobs that are useful to marine ecologists. Many of the tools are >>> designed to be invoked from ArcGIS, a desktop GIS application that runs >>> on Windows. >>> >> rpy2 works best on UNIX-alikes at the moment. >> (features are not working on win32). >> >>> To date, we have had good success accessing R using rpy. Thank you very >>> much for making this package freely available. >> I can't take those credits: >> rpy is Walter and Greg's work, with the help of contributors. >> >>> But we noted last year >>> that rpy is no longer being maintained, and rpy2 is the new replacement. >> Kind of. I started with rpy2 about a year ago, as what I was trying to >> do did not appear possible with rpy. Rpy is still available, although >> its development on the slow lane at the moment, I think. >> >>> It will be a big job for us to switch to rpy2, so we have been delaying >>> the switch. In the interim, we've been compiling rpy every time a new R >>> release has come out. This is probably increasingly risky, so we're >>> becoming more motivated to make the switch. >> I am not certain of which way the risk probability stand (compile each >> time, or compile once and hope for the best). Time will tell. >> >>> In addition, there is an >>> ArcGIS 9.3 / rpy compatibility problem that is pretty inconvenient. >>> Basically we are wondering if this problem exists with rpy2. >>> >>> >>> >>> The problem was discussed last year; see >>> > http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_id >> =48422 >> > <http://sourceforge.net/tracker/?func=detail&atid=453021&aid=2062627&group_i >> d=48422>. >>> In brief: Every time ArcGIS 9.3 runs a Python-based tool, it initializes >>> a new instance of the Python interpreter in the ArcGIS process >>> (typically ArcCatalog.exe or ArcMap.exe). The interpreter instance >>> eventually loads the rpy extension module (e.g. _rpy2070.dll). The >>> interpreter exits when the tool completes. But this does not cause the >>> rpy extension module to be unloaded from the process, and when ArcGIS >>> runs the tool a second time, creating a new Python interpreter, rpy >>> fails to initialize. >>> >>> >>> >>> In last year's bug report, lgautier mentioned that "the problem was >>> fixed a few weeks ago" (i.e. last summer). Is it correct then that this >>> procedure of initializing the interpreter, using rpy2, shutting down the >>> interpreter, and so on, can be done indefinitely from a single process >>> without any ill effects? >>> >> May be, may be not. >> I have not looked at whether the C-level part of rpy2 does what it >> should regarding the creating and destruction of Python interpreters. >> >> You could try with a dummy minimal extension to ArcGIS and tell us. >> >> >> >> Hoping this helps, >> >> >> >> L. >> >>> Thanks for your help! And thanks again to you guys for developing this >>> great reusable software. >>> >>> >>> >>> Jason >>> >>> >>> >>> >>> >>> / / >>> >> > > > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com > _______________________________________________ > rpy-list mailing list > rpy-list@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rpy-list ------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list