Hi Brian,
Thank you for posting your solution here, I will try this on my testing server and do some load tests. Also thank you for pointing out some leaks inside libhdfs. Actually I'm writing a Python extension for HDFS and noticed some Memory Leaks, but I was not sure if it's the bug of my extension or somewhere else.

Regards,
Huy Phan

Brian Bockelman wrote:
Hey Huy,

Heres what we do:

1) include hdfsJniHelper.h
2) Do the following when you're done with the filesystem:

    if (NULL != fs) {
      //Get the JNIEnv* corresponding to current thread
      JNIEnv* env = getJNIEnv();

      if (env == NULL) {
        ret = -EIO;
      } else {

        //Parameters
        jobject jFS = (jobject)fs;

        //Release unnecessary references
        (*env)->DeleteGlobalRef(env, jFS);
      }
    }

I also recommend the below patch to remove a few other leaks. This saves about .5KB / file open in leaked memory.

Index: src/c++/libhdfs/hdfs.c
===================================================================
--- src/c++/libhdfs/hdfs.c      (revision 806186)
+++ src/c++/libhdfs/hdfs.c      (working copy)
@@ -248,6 +249,7 @@
       destroyLocalReference(env, jUserString);
       destroyLocalReference(env, jGroups);
       destroyLocalReference(env, jUgi);
+      destroyLocalReference(env, jAttrString);
     }
 #else

Index: src/c++/libhdfs/hdfsJniHelper.c
===================================================================
--- src/c++/libhdfs/hdfsJniHelper.c     (revision 806186)
+++ src/c++/libhdfs/hdfsJniHelper.c     (working copy)
@@ -239,6 +241,7 @@
       fprintf(stderr, "ERROR: jelem == NULL\n");
     }
     (*env)->SetObjectArrayElement(env, result, i, jelem);
+    (*env)->DeleteLocalRef(env, jelem);
   }
   return result;
 }


Of course, this is not an official solution, not supported, may explode, etc.

Brian

On Oct 13, 2009, at 12:40 PM, Huy Phan wrote:

Hi Eli,
You're right that the problem is resolved in 0.20 with function newInstance(), unfortunately my system's running on Hadoop 0.18.3 and i'm still looking for a way to patch this version without affecting the current system.

Regards,
Huy Phan

Eli Collins wrote:
Hey Huy,

What version of hadoop are you using?  I think HADOOP-4655 may have
resolved the issue you're seeing but I think is only in 20 and later.

Thanks,
Eli

On Mon, Oct 12, 2009 at 8:52 PM, Huy Phan <dac...@gmail.com> wrote:

Hi All,
I'm writing a multi-thread application using libhdfs in C, a known issue of HDFS is that the FileSystem API caches FileSystem handles and always returned the same FileSystem handle when called from different threads. It means even though I called hdfsConnect for many times, I should not call
hdfsDisconnect in any single thread.
This may lead to memory leak on system, do you know any workaround for this
issue ?






Reply via email to