Hi Guys, I am working with iceberg 11.1 version iceberg with spark 3.0.1 and when i run removeOrphanFiles either using Actions or SparkActions class and its functions it works with hadoop catalog when run locally and i face below exception when run on EMR with glue catalog. Could you please help me with what I am missing here?
code snippet. Actions.forTable(table).removeOrphanFiles().olderThan(removeOrphanFilesOlderThan).execute(); or SparkActions.get().deleteOrphanFiles(table).olderThan(removeOrphanFilesOlderThan).execute(); issue (when run on EMR): 21/08/19 08:12:56 INFO RemoveOrphanFilesMaintenanceJob: Running RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp, Status:Started, tenant: 1, table:raghu3.cars, removeOrphanFilesOlderThan: {1629360476572}. 21/08/19 08:12:56 ERROR RemoveOrphanFilesMaintenanceJob: Error in RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp, Illegal Arguments in table properties - Can't parse null value from table properties, tenant: tenantId1, table: raghu3.cars, removeOrphanFilesOlderThan: 1629360476572, Status: Failed, Reason: {}. java.lang.IllegalArgumentException: Cannot find the metadata table for glue_catalog.raghu3.cars of type ALL_MANIFESTS at org.apache.iceberg.spark.actions.BaseSparkAction.loadMetadataTable(BaseSparkAction.java:191) at org.apache.iceberg.spark.actions.BaseSparkAction.buildValidDataFileDF(BaseSparkAction.java:121) at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:154) at org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:101) at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141) at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFilesOlderThanTimestamp(RemoveOrphanFilesMaintenanceJob.java:274) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFiles(RemoveOrphanFilesMaintenanceJob.java:133) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.maintain(RemoveOrphanFilesMaintenanceJob.java:58) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.run(LakeHouseTableMaintenanceJob.java:117) at com.salesforce.cdp.spark.core.job.SparkJob.submitAndRun(SparkJob.java:76) at com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.main(LakeHouseTableMaintenanceJob.java:247) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735) Table does exists [image: image.png] Did any one face this? What is the fix? Is it a bug or am I missing something here? Thanks, Raghu