[ https://issues.apache.org/jira/browse/HIVE-21861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881595#comment-16881595 ]
Rajkumar Singh commented on HIVE-21861: --------------------------------------- As per my understanding, it looks that it is expected to get the lazystring during assignrow. this is how the code flow looks like. Kafka Serde delegate to LazySimpleSerDe https://github.com/apache/hive/blob/eba668eed6fecce6bae87fb77ca056b8e34ad5e2/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java#L102 LazySimpleSerDe initialize and set https://github.com/apache/hive/blob/8a606abdec0f92d60653d892b2f92ff729f1c020/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java#L119 and set the object inspector to LazySimpleStructObjectInspector<org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector> now during deserialization, lazy serde deserialize the text writable and return row as lazystruct https://github.com/apache/hive/blob/eba668eed6fecce6bae87fb77ca056b8e34ad5e2/kafka-handler/src/java/org/apache/hadoop/hive/kafka/VectorizedKafkaRecordReader.java#L157 https://github.com/apache/hive/blob/eba668eed6fecce6bae87fb77ca056b8e34ad5e2/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java#L206 since delegateDeserializerOI is LazyString object inspector hence the following piece of code return the lazy string. https://github.com/apache/hive/blob/eba668eed6fecce6bae87fb77ca056b8e34ad5e2/kafka-handler/src/java/org/apache/hadoop/hive/kafka/KafkaSerDe.java#L210 > ClassCastException during CTAS over external table using KafkaStorageHandler > ---------------------------------------------------------------------------- > > Key: HIVE-21861 > URL: https://issues.apache.org/jira/browse/HIVE-21861 > Project: Hive > Issue Type: Bug > Components: kafka integration > Affects Versions: 4.0.0 > Reporter: Justin Leet > Assignee: Rajkumar Singh > Priority: Major > Attachments: HIVE-21861.patch > > > To reproduce, create a table similar to the following: > {code} > CREATE EXTERNAL TABLE <table> > (raw_value STRING) > ROW FORMAT DELIMITED > LINES TERMINATED BY '\n' > STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' > TBLPROPERTIES( > "kafka.topic"="<kafka_topic>", > "kafka.bootstrap.servers"="<bootstrap_servers>", > "kafka.consumer.security.protocol"="PLAINTEXT", > "kafka.serde.class"="org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"); > {code} > Note the SerDe isn't the default SerDe. Additionally, this error occurs when > vectorization is enabled. > Basic queries work fine: > {code} > SELECT * FROM <table> LIMIT 1; > {code} > Doing a CTAS to bring it into a managed table fails: > {code} > CREATE TABLE <managed_table> AS > SELECT * FROM <table>; > {code} > The exception is: > {code} > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to > org.apache.hadoop.io.TextCaused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.lazy.LazyString cannot be cast to > org.apache.hadoop.io.Text at > org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:471) > at > org.apache.hadoop.hive.ql.exec.vector.VectorAssignRow.assignRowColumn(VectorAssignRow.java:350) > at > org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.readNextBatch(VectorizedKafkaRecordReader.java:159) > at > org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.next(VectorizedKafkaRecordReader.java:113) > at > org.apache.hadoop.hive.kafka.VectorizedKafkaRecordReader.next(VectorizedKafkaRecordReader.java:47) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360) > ... 24 more > {code} > A workaround to this is to disable vectorization via: > {code} > set hive.vectorized.execution.enabled = false; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)