[ https://issues.apache.org/jira/browse/HIVE-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sercan Tekin updated HIVE-28450: -------------------------------- Affects Version/s: 4.0.0 3.1.3 > Follow the array size of JVM in Hive transferable objects > --------------------------------------------------------- > > Key: HIVE-28450 > URL: https://issues.apache.org/jira/browse/HIVE-28450 > Project: Hive > Issue Type: Improvement > Affects Versions: 3.1.3, 4.0.0 > Reporter: Sercan Tekin > Priority: Major > > We are experiencing an issue with a partitioned table in Hive. When querying > the table via the Hive CLI, the data retrieval works as expected without any > errors. However, when attempting to query the same table through Spark, we > encounter the following error in the HMS logs: > {code:java} > 2024-01-30 23:03:59,052 main DEBUG > org.apache.logging.log4j.core.util.SystemClock does not support precise > timestamps. > Exception in thread "pool-7-thread-4" java.lang.OutOfMemoryError: Requested > array size exceeds VM limit > at java.util.Arrays.copyOf(Arrays.java:3236) > at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) > at > java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) > at > org.apache.thrift.transport.TSaslTransport.write(TSaslTransport.java:473) > at > org.apache.thrift.transport.TSaslServerTransport.write(TSaslServerTransport.java:42) > at > org.apache.thrift.protocol.TBinaryProtocol.writeString(TBinaryProtocol.java:227) > at > org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:517) > at > org.apache.hadoop.hive.metastore.api.FieldSchema$FieldSchemaStandardScheme.write(FieldSchema.java:456) > at > org.apache.hadoop.hive.metastore.api.FieldSchema.write(FieldSchema.java:394) > at > org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1423) > at > org.apache.hadoop.hive.metastore.api.StorageDescriptor$StorageDescriptorStandardScheme.write(StorageDescriptor.java:1250) > at > org.apache.hadoop.hive.metastore.api.StorageDescriptor.write(StorageDescriptor.java:1116) > at > org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:1033) > at > org.apache.hadoop.hive.metastore.api.Partition$PartitionStandardScheme.write(Partition.java:890) > at > org.apache.hadoop.hive.metastore.api.Partition.write(Partition.java:786) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result$get_partitions_resultStandardScheme.write(ThriftHiveMetastore.java) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_partitions_result.write(ThriftHiveMetastore.java) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:603) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:600) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:600) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Exception in thread "pool-7-thread-6" java.lang.OutOfMemoryError: Requested > array size exceeds VM limit > Exception in thread "pool-7-thread-9" java.lang.OutOfMemoryError: Requested > array size exceeds VM limit > {code} > This error appears to be related to the JVM’s conservative approach to array > size allocation, which limits the maximum size of arrays to prevent > OutOfMemoryError exceptions. For reference, you can see a similar > implementation in the JVM source code here: > https://github.com/openjdk/jdk/blob/0e0dfca21f64ecfcb3e5ed7cdc2a173834faa509/src/java.base/share/classes/java/io/InputStream.java#L307-L313 > Spark side implemented similar limit on their side, it would be good to > implement the same thing on Hive side - > https://github.com/apache/spark/blob/e5a5921968c84601ce005a7785bdd08c41a2d862/common/utils/src/main/scala/org/apache/spark/unsafe/array/ByteArrayUtils.java > Workaround: > As a temporary workaround, I have been able to mitigate the issue by setting > the hive.metastore.batch.retrieve.table.partition.max configuration to a > lower value. -- This message was sent by Atlassian Jira (v8.20.10#820010)