[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

ASF GitHub Bot (Jira) Wed, 09 Dec 2020 03:25:08 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24504?focusedWorklogId=522185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522185
 ]


ASF GitHub Bot logged work on HIVE-24504:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Dec/20 11:24
            Start Date: 09/Dec/20 11:24
    Worklog Time Spent: 10m 
      Work Description: pvary opened a new pull request #1758:
URL: https://github.com/apache/hive/pull/1758


   
   ### What changes were proposed in this pull request?
   Use an empty batch to generate the schema for the empty results
   
   ### Why are the changes needed?
   Clients expect the full schema even for empty results
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Unit and other test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 522185)
    Remaining Estimate: 0h
            Time Spent: 10m

> VectorFileSinkArrowOperator does not serialize complex types correctly
> ----------------------------------------------------------------------
>
>                 Key: HIVE-24504
>                 URL: https://issues.apache.org/jira/browse/HIVE-24504
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the table has complex types and the result has 0 records the 
> VectorFileSinkArrowOperator only serializes the primitive types correctly. 
> For complex types only the main type is set which causes issues for clients 
> trying to read data.
> Got the following HWC exception:
> {code:java}
> Previous exception in task: Unsupported data type: Null
>       
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowType(ArrowUtils.scala:71)
>       
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:106)
>       
> org.apache.spark.sql.execution.arrow.ArrowUtils$.fromArrowField(ArrowUtils.scala:98)
>       
> org.apache.spark.sql.execution.arrow.ArrowUtils.fromArrowField(ArrowUtils.scala)
>       
> org.apache.spark.sql.vectorized.ArrowColumnVector.<init>(ArrowColumnVector.java:135)
>       
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:105)
>       
> com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReader.get(HiveWarehouseDataReader.java:29)
>       
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.next(DataSourceRDD.scala:59)
>       
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
>       
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.datasourcev2scan_nextBatch_0$(Unknown
>  Source)
>       
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>       
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>       
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>       
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>       
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>       
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>       org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>       org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>       org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>       org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>       org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>       org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>       org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>       org.apache.spark.scheduler.Task.run(Task.scala:109)
>       org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>       
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       java.lang.Thread.run(Thread.java:745)
>       at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:139)
>       at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
>       at org.apache.spark.scheduler.Task.run(Task.scala:119)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24504) VectorFileSinkArrowOperator does not serialize complex types correctly

Reply via email to