pyh8023 opened a new issue, #10587:
URL: https://github.com/apache/gravitino/issues/10587
### Version
main branch
### Describe what's wrong
When using the Table Maintenance Service (Optimizer) and triggering rewrite
submission with submit-strategy-jobs, the following error occurs: Cannot
translate Spark expression: (isnotnull(event_time#19) AND
(day(cast(event_time#19 as date)) = 2005)) to data source filter.
### Error message and/or stacktrace
The log information in output.log is as follows:
Applied custom Spark configurations: {spark.master=local[2],
spark.hadoop.fs.defaultFS=hdfs://196.105.0.49:9000}
Executing Iceberg rewrite_data_files procedure: CALL
iceberg.system.rewrite_data_files(table => 'db.user_events', strategy =>
'binpack', where => '(day(event_time) = 2026-03-18)')
The log information in error.log is as follows:
26/03/30 16:30:30 INFO BaseMetastoreCatalog: Table loaded by catalog:
iceberg.db.user_events
Error executing rewrite data files job: Cannot translate Spark expression:
(isnotnull(event_time#19) AND (day(cast(event_time#19 as date)) = 2005)) to
data source filter
java.lang.IllegalArgumentException: Cannot translate Spark expression:
(isnotnull(event_time#19) AND (day(cast(event_time#19 as date)) = 2005)) to
data source filter
at
org.apache.spark.sql.execution.datasources.SparkExpressionConverter$.convertToIcebergExpression(SparkExpressionConverter.scala:48)
at
org.apache.spark.sql.execution.datasources.SparkExpressionConverter.convertToIcebergExpression(SparkExpressionConverter.scala)
at
org.apache.iceberg.spark.procedures.BaseProcedure.filterExpression(BaseProcedure.java:171)
at
org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.checkAndApplyFilter(RewriteDataFilesProcedure.java:131)
at
org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.lambda$call$0(RewriteDataFilesProcedure.java:120)
at
org.apache.iceberg.spark.procedures.BaseProcedure.execute(BaseProcedure.java:107)
at
org.apache.iceberg.spark.procedures.BaseProcedure.modifyIcebergTable(BaseProcedure.java:88)
at
org.apache.iceberg.spark.procedures.RewriteDataFilesProcedure.call(RewriteDataFilesProcedure.java:111)
at
org.apache.spark.sql.execution.datasources.v2.CallExec.run(CallExec.scala:34)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
at
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
at
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
at
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:220)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$4(SparkSession.scala:691)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:682)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:713)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:744)
at
org.apache.gravitino.maintenance.jobs.iceberg.IcebergRewriteDataFilesJob.main(IcebergRewriteDataFilesJob.java:173)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
### How to reproduce
When executing the following command, an error occurs:
./bin/gravitino-optimizer.sh \
--type submit-strategy-jobs \
--identifiers iceberg.db.user_events \
--strategy-name iceberg_compaction_default \
--limit 10 \
--conf-path ./conf/gravitino-optimizer-submit.conf
The table structure is as follows:
CREATE TABLE IF NOT EXISTS user_events (
event_id BIGINT COMMENT 'Unique event ID',
user_id INT COMMENT 'User ID',
event_type STRING COMMENT 'Type of event',
event_time TIMESTAMP COMMENT 'Event timestamp',
event_data STRING COMMENT 'Detailed event data'
)
USING iceberg
PARTITIONED BY (days(event_time))
TBLPROPERTIES (
'format-version' = '2',
'write.upsert.enabled' = 'true'
)
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]