pyh8023 opened a new issue, #10587:
URL: https://github.com/apache/gravitino/issues/10587

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
   When using the Table Maintenance Service (Optimizer) and triggering rewrite 
submission with submit-strategy-jobs, the following error occurs: Cannot 
translate Spark expression: (isnotnull(event_time#19) AND 
(day(cast(event_time#19 as date)) = 2005)) to data source filter.
   
   ### Error message and/or stacktrace
   
   The log information in output.log is as follows:
   Applied custom Spark configurations: {spark.master=local[2], 
spark.hadoop.fs.defaultFS=hdfs://196.105.0.49:9000}
   Executing Iceberg rewrite_data_files procedure: CALL 
iceberg.system.rewrite_data_files(table => 'db.user_events', strategy => 
'binpack', where => '(day(event_time) = 2026-03-18)')
   
   The log information in error.log  is as follows:
   26/03/30 16:30:30 INFO BaseMetastoreCatalog: Table loaded by catalog: 
iceberg.db.user_events
   Error executing rewrite data files job: Cannot translate Spark expression: 
(isnotnull(event_time#19) AND (day(cast(event_time#19 as date)) = 2005)) to 
data source filter
   java.lang.IllegalArgumentException: Cannot translate Spark expression: 
(isnotnull(event_time#19) AND (day(cast(event_time#19 as date)) = 2005)) to 
data source filter
           at 
org.apache.spark.sql.execution.datasources.SparkExpressionConverter$.convertToIcebergExpression(SparkExpressionConverter.scala:48)
           at 
org.apache.spark.sql.execution.datasources.SparkExpressionConverter.convertToIcebergExpression(SparkExpressionConverter.scala)
           at 
org.apache.iceberg.spark.procedures.BaseProcedure.filterExpression(BaseProcedure.java:171)
           at 
org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.checkAndApplyFilter(RewriteDataFilesProcedure.java:131)
           at 
org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.lambda$call$0(RewriteDataFilesProcedure.java:120)
           at 
org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107)
           at 
org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:88)
           at 
org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.call(RewriteDataFilesProcedure.java:111)
           at 
org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:34)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
           at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
           at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
           at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
           at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
           at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
           at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
           at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
           at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
           at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
           at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
           at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
           at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
           at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
           at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
           at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
           at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
           at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
           at 
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
           at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
           at 
org.apache.gravitino.maintenance.jobs.iceberg.IcebergRewriteDataFilesJob.main(IcebergRewriteDataFilesJob.java:173)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
           at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
           at java.base/java.lang.reflect.Method.invoke(Method.java:580)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   
   
   ### How to reproduce
   
   When executing the following command, an error occurs:
   ./bin/gravitino-optimizer.sh \
     --type submit-strategy-jobs \
     --identifiers iceberg.db.user_events \
     --strategy-name iceberg_compaction_default \
     --limit 10 \
     --conf-path ./conf/gravitino-optimizer-submit.conf
   
   The table structure is as follows:
   CREATE TABLE IF NOT EXISTS user_events (
       event_id BIGINT COMMENT 'Unique event ID',
       user_id INT COMMENT 'User ID',
       event_type STRING COMMENT 'Type of event',
       event_time TIMESTAMP COMMENT 'Event timestamp',
       event_data STRING COMMENT 'Detailed event data'
   )
   USING iceberg
   PARTITIONED BY (days(event_time))
   TBLPROPERTIES (
       'format-version' = '2',
       'write.upsert.enabled' = 'true'
   )
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to