[ https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated HIVE-22856: ------------------------------------ Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when > ArrowStreamReader returns a 0 length batch. > ---------------------------------------------------------------------------------------------------------------- > > Key: HIVE-22856 > URL: https://issues.apache.org/jira/browse/HIVE-22856 > Project: Hive > Issue Type: Bug > Reporter: mahesh kumar behera > Assignee: mahesh kumar behera > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch > > > LlapArrowBatchRecordReader returns false when the ArrowStreamReader > loadNextBatch returns column vector with 0 length. But we should keep reading > data until loadNextBatch returns false. Some batch may return column vector > of length 0, but we should ignore and wait for the next batch. > The batch size of 0 is possible in the case when a split read by ORC reader > has all deleted or aborted data. The VectorizedOrcAcidRowBatchReader , reads > the data from split info and then filters the rows which are not visible to > the read transaction. So it may happen that, none of the records satisfy the > filter. In that case VectorizedOrcAcidRowBatchReader sends a batch size of 0. > With 0 batch size, VectorFileSinkArrowOperator creates a batch of just > metadata and set the value count to 0. This kind of batch should be ignore by > the client and should wait for next batch. -- This message was sent by Atlassian Jira (v8.3.4#803005)