There is a python interface to access HDFS files if that helps your case : http://wiki.apache.org/hadoop/HDFS-APIs
thanks, dhruba On Tue, Oct 13, 2009 at 12:00 PM, Huy Phan <dac...@gmail.com> wrote: > Hi Brian, > Thank you for posting your solution here, I will try this on my testing > server and do some load tests. > Also thank you for pointing out some leaks inside libhdfs. Actually I'm > writing a Python extension for HDFS and noticed some Memory Leaks, but I was > not sure if it's the bug of my extension or somewhere else. > > Regards, > Huy Phan > > Brian Bockelman wrote: > >> Hey Huy, >> >> Heres what we do: >> >> 1) include hdfsJniHelper.h >> 2) Do the following when you're done with the filesystem: >> >> if (NULL != fs) { >> //Get the JNIEnv* corresponding to current thread >> JNIEnv* env = getJNIEnv(); >> >> if (env == NULL) { >> ret = -EIO; >> } else { >> >> //Parameters >> jobject jFS = (jobject)fs; >> >> //Release unnecessary references >> (*env)->DeleteGlobalRef(env, jFS); >> } >> } >> >> I also recommend the below patch to remove a few other leaks. This saves >> about .5KB / file open in leaked memory. >> >> Index: src/c++/libhdfs/hdfs.c >> =================================================================== >> --- src/c++/libhdfs/hdfs.c (revision 806186) >> +++ src/c++/libhdfs/hdfs.c (working copy) >> @@ -248,6 +249,7 @@ >> destroyLocalReference(env, jUserString); >> destroyLocalReference(env, jGroups); >> destroyLocalReference(env, jUgi); >> + destroyLocalReference(env, jAttrString); >> } >> #else >> >> Index: src/c++/libhdfs/hdfsJniHelper.c >> =================================================================== >> --- src/c++/libhdfs/hdfsJniHelper.c (revision 806186) >> +++ src/c++/libhdfs/hdfsJniHelper.c (working copy) >> @@ -239,6 +241,7 @@ >> fprintf(stderr, "ERROR: jelem == NULL\n"); >> } >> (*env)->SetObjectArrayElement(env, result, i, jelem); >> + (*env)->DeleteLocalRef(env, jelem); >> } >> return result; >> } >> >> >> Of course, this is not an official solution, not supported, may explode, >> etc. >> >> Brian >> >> On Oct 13, 2009, at 12:40 PM, Huy Phan wrote: >> >> Hi Eli, >>> You're right that the problem is resolved in 0.20 with function >>> newInstance(), unfortunately my system's running on Hadoop 0.18.3 and i'm >>> still looking for a way to patch this version without affecting the current >>> system. >>> >>> Regards, >>> Huy Phan >>> >>> Eli Collins wrote: >>> >>>> Hey Huy, >>>> >>>> What version of hadoop are you using? I think HADOOP-4655 may have >>>> resolved the issue you're seeing but I think is only in 20 and later. >>>> >>>> Thanks, >>>> Eli >>>> >>>> On Mon, Oct 12, 2009 at 8:52 PM, Huy Phan <dac...@gmail.com> wrote: >>>> >>>> Hi All, >>>>> I'm writing a multi-thread application using libhdfs in C, a known >>>>> issue >>>>> of HDFS is that the FileSystem API caches FileSystem handles and always >>>>> returned the same FileSystem handle when called from different threads. >>>>> It >>>>> means even though I called hdfsConnect for many times, I should not >>>>> call >>>>> hdfsDisconnect in any single thread. >>>>> This may lead to memory leak on system, do you know any workaround for >>>>> this >>>>> issue ? >>>>> >>>>> >>>>> >>>> >>>> >> > -- Connect to me at http://www.facebook.com/dhruba