[ 
https://issues.apache.org/jira/browse/IMPALA-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11402.
-------------------------------------
    Target Version: Impala 5.0.0
        Resolution: Fixed

Resolving this. Thank [~kdeschle], [~daniel.becker] and [~rizaon] for the 
review!

> getPartialCatalogObject fails with OOM with huge number of files
> ----------------------------------------------------------------
>
>                 Key: IMPALA-11402
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11402
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> The response size of getPartialCatalogObject depends on the number of 
> partitions in the request. Even with the optimization of IMPALA-7501, the 
> response size could still exceeds the 2GB byte array limit if requesting all 
> partitions of a huge table. E.g.
> {noformat}
> I0224 02:30:32.183627 28707 jni-util.cc:321] java.lang.OutOfMemoryError
>       at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>       at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>       at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>       at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>       at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:197)
>       at 
> org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:236)
>       at 
> org.apache.impala.thrift.THdfsFileDesc$THdfsFileDescStandardScheme.write(THdfsFileDesc.java:450)
>       at 
> org.apache.impala.thrift.THdfsFileDesc$THdfsFileDescStandardScheme.write(THdfsFileDesc.java:405)
>       at org.apache.impala.thrift.THdfsFileDesc.write(THdfsFileDesc.java:346)
>       at 
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:1647)
>       at 
> org.apache.impala.thrift.TPartialPartitionInfo$TPartialPartitionInfoStandardScheme.write(TPartialPartitionInfo.java:1433)
>       at 
> org.apache.impala.thrift.TPartialPartitionInfo.write(TPartialPartitionInfo.java:1265)
>       at 
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:1402)
>       at 
> org.apache.impala.thrift.TPartialTableInfo$TPartialTableInfoStandardScheme.write(TPartialTableInfo.java:1215)
>       at 
> org.apache.impala.thrift.TPartialTableInfo.write(TPartialTableInfo.java:1061)
>       at 
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:1157)
>       at 
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse$TGetPartialCatalogObjectResponseStandardScheme.write(TGetPartialCatalogObjectResponse.java:1010)
>       at 
> org.apache.impala.thrift.TGetPartialCatalogObjectResponse.write(TGetPartialCatalogObjectResponse.java:876)
>       at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>       at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:91)
>       at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>       at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>       at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>       at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:259)
>       at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:436){noformat}
> We should add flag to limit the number of partitions in a single 
> getPartialCatalogObject request. When more partitions are required, fetch 
> them in different batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to