[I] [Bug] [spark] The op field conflicts with the internal field of the Seatunnel Spark engine and cannot be read from the data source. [seatunnel]

via GitHub Tue, 13 Aug 2024 06:50:18 -0700


lizc9 opened a new issue, #7392:
URL: https://github.com/apache/seatunnel/issues/7392


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   When I read the op field from json, I get the following error:
   `java.lang.ClassCastException: java.lang.Byte cannot be cast to 
org.apache.spark.unsafe.types.UTF8String`
   This is because of the following code：
   
org.apache.seatunnel.translation.spark.serialization.InternalRowConverter#convert(org.apache.seatunnel.api.table.type.SeaTunnelRow,
 org.apache.seatunnel.api.table.type.SeaTunnelRowType)
   
   
   
   ### SeaTunnel Version
   
   2.3.6
   
   ### SeaTunnel Config
   
   ```conf
   # Defining the runtime environment
   env {
     parallelism = 1
     job.mode = "BATCH"
   }
   source {
     Kafka {
       topic = "test"
       bootstrap.servers = "localhost:9092"
       consumer.group = "seatunnel"
       format = "json"
       kafka.config = {
         max.poll.records = 1000
         auto.offset.reset = "earliest"
         enable.auto.commit = "false"
       },
       schema = {
           columns = [
               {
                 name = before
                 type = string
                 nullable = true
               },
               {
                 name = after
                 type = string
                 nullable = true
               },
               {
                 name = source
                 type = string
                 nullable = true
               },
               {
                 name = op
                 type = string
                 nullable = true
               },
               {
                 name = ts_ms
                 type = bigint
                 nullable = true
               }
           ]
       }
     }
   }
   
   
   transform {
   }
   
   sink {
     Console {
     }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark-3-connector-v2.sh --master "local[4]" 
--deploy-mode client --config ./config/batch.conf
   ```
   
   
   ### Error Exception
   
   ```log
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 
in stage 0.0 (TID 0) (local executor driver): java.lang.ClassCastException: 
java.lang.Byte cannot be cast to org.apache.spark.unsafe.types.UTF8String
        at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
        at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
        at 
org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.getUTF8String(SpecificInternalRow.scala:193)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:32)
        at 
org.apache.seatunnel.core.starter.spark.execution.TransformExecuteProcessor$TransformIterator.hasNext(TransformExecuteProcessor.java:174)
        at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$1(WriteToDataSourceV2Exec.scala:435)
        at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
        at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:480)
        at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:381)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   Spark: 3.3
   
   ### Java or Scala Version
   
   java 1.8
   
   ### Screenshots
   
   
![image](https://github.com/user-attachments/assets/248cc2fa-8c3f-4aed-a5a9-9aafa61cf7c6)
   
![image](https://github.com/user-attachments/assets/cd5e2928-34e0-449c-ac6e-a56fb668fbe4)
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Bug] [spark] The op field conflicts with the internal field of the Seatunnel Spark engine and cannot be read from the data source. [seatunnel]

Reply via email to