Quanlong Huang created IMPALA-13994: ---------------------------------------
Summary: Thrift client hive_client shouldn't be used in multiple threads Key: IMPALA-13994 URL: https://issues.apache.org/jira/browse/IMPALA-13994 Project: IMPALA Issue Type: Bug Components: Test Reporter: Quanlong Huang In ImpalaTestSuite, we create a ThriftHiveMetastore.Client as hive_client: [https://github.com/apache/impala/blob/648209b17258cf610f4e73a3ed63de665216074f/tests/common/impala_test_suite.py#L255] Different to other clients we create for Impala, this Thrift client is not thread-safe and shouldn't be used in parallel tests. See THRIFT-2283 and this email thread: [https://lists.apache.org/thread/4rsjdtlpv8zrgknpf43vo5rg9q83b6wp] {quote}The Thrift transport layer is not thread-safe. It is essentially a wrapper on a socket. You can't interleave writing things to a single socket from multiple threads without locking. You also don't know what order the responses will come back in. {quote} Here are some exceptions I hit when using it in two threads in https://gerrit.cloudera.org/c/22816/3: {noformat} Exception in thread Thread-4: Traceback (most recent call last): File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/quanlong/workspace/Impala/tests/metadata/test_event_processing.py", line 636, in drop_table_in_hive self.hive_client.drop_table(db, tbl_name, deleteData=True) File "/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py", line 3913, in drop_table self.recv_drop_table() File "/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py", line 3937, in recv_drop_table raise result.o1 NoSuchObjectException: NoSuchObjectException(message='null: null') Exception in thread Thread-3: Traceback (most recent call last): File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/quanlong/workspace/Impala/toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/quanlong/workspace/Impala/tests/metadata/test_event_processing.py", line 636, in drop_table_in_hive self.hive_client.drop_table(db, tbl_name, deleteData=True) File "/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py", line 3913, in drop_table self.recv_drop_table() File "/home/quanlong/workspace/Impala/shell/gen-py/impala_thrift_gen/hive_metastore/ThriftHiveMetastore.py", line 3927, in recv_drop_table (fname, mtype, rseqid) = iprot.readMessageBegin() File "/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin sz = self.readI32() File "/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 217, in readI32 buff = self.trans.readAll(4) File "/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 62, in readAll chunk = self.read(sz - have) File "/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 164, in read self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size))) File "/home/quanlong/workspace/Impala/infra/python/env-gcc10.4.0/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 164, in read raise TTransportException(message="unexpected exception", inner=e) TTransportException: unexpected exception {noformat} ERRORs in HMS side indicating the request data is abnormal {noformat} 2025-04-25T13:49:50,021 INFO [TThreadPoolServer WorkerProcess-188] metastore.HiveMetaStore: 203: source:127.0.0.1 drop_table : tbl=null.null.null 2025-04-25T13:49:50,021 INFO [TThreadPoolServer WorkerProcess-188] HiveMetaStore.audit: ugi=quanlong ip=127.0.0.1 cmd=source:127.0.0.1 drop_table : tbl=null.null.null 2025-04-25T13:49:50,022 WARN [TThreadPoolServer WorkerProcess-188] metastore.ObjectStore: Falling back to ORM path due to direct SQL failure (this is not an error): null at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getDatabase(MetaStoreDirectSql.java:393) at org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:896) 2025-04-25T13:49:50,022 ERROR [TThreadPoolServer WorkerProcess-188] metastore.ObjectStore: java.lang.NullPointerException: null at org.apache.hadoop.hive.metastore.utils.StringUtils.normalizeIdentifier(StringUtils.java:94) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:853) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore.getJDODatabase(ObjectStore.java:911) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore$1.getJdoResult(ObjectStore.java:901) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore$1.getJdoResult(ObjectStore.java:893) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4302) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore.getDatabaseInternal(ObjectStore.java:903) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:875) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_432] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_432] at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at com.sun.proxy.$Proxy33.getDatabase(Unknown Source) ~[?:?] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:3253) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160]{noformat} Another kind of ERROR log: {noformat} 2025-04-25T13:49:50,054 ERROR [TThreadPoolServer WorkerProcess-188] server.TThreadPoolServer: Thrift Error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:254) ~[libthrift-0.16.0.jar:0.16.0] at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:76) ~[hive-standalone-metastore-3.1.3000.7.3.1.0-160.jar:3.1.3000.7.3.1.0-160] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:250) ~[libthrift-0.16.0.jar:0.16.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_432] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_432] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_432]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)