ShreyeshArangath commented on issue #2029: URL: https://github.com/apache/datafusion-comet/issues/2029#issuecomment-3075918084
Update: Looking the executor logs it looks like its likely because its not using HDFS as the data source. The file reader is using S3 related configs? ``` ... 36 more 25/07/15 22:09:40 INFO task 0.0 in stage 79.0 (TID 7530) ReadOptions: File reader auto configured 'fs.s3a.connection.maximum=256' 25/07/15 22:09:40 INFO task 0.0 in stage 79.0 (TID 7530) ReadOptions: File reader auto configured 'fs.s3a.readahead.range=1048576' 25/07/15 22:09:40 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 7568 25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) Executor: Running task 52.0 in stage 79.0 (TID 7568) 25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) CometExecIterator: Calculated per-task memory limit of 0 (0 * 1.0 / 8.0) 25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) CometExecIterator: Calculated per-task memory limit of 0 (0 * 1.0 / 8.0) 25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) FileScanRDD: Reading File path: hdfs://cluster/jobs/x/y/tpcds-unpartitioned/tpcds-1000/store_returns/part-00017-85e008a2-66d5-4ca4-a957-da1b024bc0ae-c000.snappy.parquet, range: 45493820-90987640, partition values: [empty row] 25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) ReadOptions: File reader auto configured 'fs.s3a.connection.maximum=256' 25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) ReadOptions: File reader auto configured 'fs.s3a.readahead.range=1048576' 25/07/15 22:09:40 ERROR task 345.0 in stage 78.0 (TID 7413) Executor: Exception in task 345.0 in stage 78.0 (TID 7413) org.apache.spark.SparkException: Parquet column cannot be converted in file hdfs://cluster/jobs/x/y/tpcds-unpartitioned/tpcds-1000/store_returns/part-00035-85e008a2-66d5-4ca4-a957-da1b024bc0ae-c000.snappy.parquet. Column: [sr_return_amt], Expected: decimal(7,2), Found: DOUBLE. at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:855) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:302) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:142) at org.apache.spark.sql.comet.CometScanExec$$anon$1.hasNext(CometScanExec.scala:268) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org