Hi Guys,

I am working with iceberg 11.1 version iceberg with spark 3.0.1 and when i
run removeOrphanFiles either using Actions or SparkActions class and its
functions it works with hadoop catalog when run locally and i face below
exception when run on EMR with glue catalog. Could you please help me with
what I am missing here?

code snippet.

Actions.forTable(table).removeOrphanFiles().olderThan(removeOrphanFilesOlderThan).execute();

or

SparkActions.get().deleteOrphanFiles(table).olderThan(removeOrphanFilesOlderThan).execute();

issue (when run on EMR):

21/08/19 08:12:56 INFO RemoveOrphanFilesMaintenanceJob: Running
RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp,
Status:Started, tenant: 1, table:raghu3.cars,
removeOrphanFilesOlderThan: {1629360476572}.

21/08/19 08:12:56 ERROR RemoveOrphanFilesMaintenanceJob: Error in
RemoveOrphanFilesMaintenanceJob - removeOrphanFilesOlderThanTimestamp,
Illegal Arguments in table properties - Can't parse null value from
table properties, tenant: tenantId1, table: raghu3.cars,
removeOrphanFilesOlderThan: 1629360476572, Status: Failed, Reason: {}.

java.lang.IllegalArgumentException: Cannot find the metadata table for
glue_catalog.raghu3.cars of type ALL_MANIFESTS
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.loadMetadataTable(BaseSparkAction.java:191)
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.buildValidDataFileDF(BaseSparkAction.java:121)
        at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.doExecute(BaseDeleteOrphanFilesSparkAction.java:154)
        at 
org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:101)
        at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:141)
        at 
org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.execute(BaseDeleteOrphanFilesSparkAction.java:76)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFilesOlderThanTimestamp(RemoveOrphanFilesMaintenanceJob.java:274)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.removeOrphanFiles(RemoveOrphanFilesMaintenanceJob.java:133)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.job.RemoveOrphanFilesMaintenanceJob.maintain(RemoveOrphanFilesMaintenanceJob.java:58)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.run(LakeHouseTableMaintenanceJob.java:117)
        at 
com.salesforce.cdp.spark.core.job.SparkJob.submitAndRun(SparkJob.java:76)
        at 
com.salesforce.cdp.lakehouse.spark.tablemaintenance.LakeHouseTableMaintenanceJob.main(LakeHouseTableMaintenanceJob.java:247)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:735)


Table does exists

[image: image.png]

Did any one face this? What is the fix? Is it a bug or am I missing
something here?

Thanks,
Raghu

Reply via email to