Sahil Takiar created HDFS-14304:
-----------------------------------

             Summary: High lock contention on hdfsHashMutex in libhdfs
                 Key: HDFS-14304
                 URL: https://issues.apache.org/jira/browse/HDFS-14304
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs-client, libhdfs, native
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


While doing some performance profiling of an application using libhdfs, we 
noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in 
{{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}}

The issue is that every JNI method invocation done by {{hdfs.c}} goes through a 
helper method called {{invokeMethod}}. {{invokeMethod}} calls 
{{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a 
lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) 
(the lock is acquired for both reads and writes). The hash table maps {{char 
*className}} to {{jclass}} objects, it seems the goal of the hash table is to 
avoid repeatedly creating {{jclass}} objects for each JNI call.

For multi-threaded applications, this lock severely limits that rate at which 
Java methods can be invoked. pstacks show a lot of time being spent on 
{{hdfsHashMutex}}
{code:java}
#0  0x00007fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000027d8386 in mutexLock ()
#4  0x00000000027d0e7b in globalClassReference ()
#5  0x00000000027d1160 in invokeMethod ()
#6  0x00000000027d4176 in readDirect ()
#7  0x00000000027d4325 in hdfsRead ()
{code}
Same with {{perf report}}
{code:java}
+   63.36%     0.01%  [k] system_call_fastpath
+   61.60%     0.12%  [k] sys_futex 
+   61.45%     0.13%  [k] do_futex 
+   57.54%     0.49%  [k] _raw_qspin_lock
+   57.07%     0.01%  [k] queued_spin_lock_slowpath
+   55.47%    55.47%  [k] native_queued_spin_lock_slowpath
-   35.68%     0.00%  [k] 0x6f6f6461682f6568
   - 0x6f6f6461682f6568 
      - 30.55% __lll_lock_wait       
         - 29.40% system_call_fastpath      
            - 29.39% sys_futex      
               - 29.35% do_futex   
                  - 29.27% futex_wait     
                     - 28.17% futex_wait_setup
                        - 27.05% _raw_qspin_lock 
                           - 27.05% queued_spin_lock_slowpath
                                26.30% native_queued_spin_lock_slowpath 
                              + 0.67% ret_from_intr 
                     + 0.71% futex_wait_queue_me
      - 2.00% methodIdFromClass
         - 1.94% jni_GetMethodID  
            - 1.71% get_method_id   
                 0.96% SymbolTable::lookup_only 
      - 1.61% invokeMethod
         - 0.62% jni_CallLongMethodV 
              0.52% jni_invoke_nonstatic 
        0.75% pthread_mutex_lock
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to