Gaurav, Is your data partitioned by date? If so, you can compact subsets of partitions at a time. To do this using the Spark procedure, you pass a where clause:
spark.sql("CALL catalog_name.system.rewrite_data_files(table => '...', where => '...')") If you use the RewriteDataFilesSparkAction, you call filter(Expression), but then you have to pass in your where clause as an Iceberg Expression. You can use https://github.com/apache/iceberg/blob/apache-iceberg-1.2.1/spark/v3.3/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala as shown in https://github.com/apache/iceberg/blob/apache-iceberg-1.2.1/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java#L133-L135 . - Wing Yew On Tue, May 23, 2023 at 10:13 PM Gaurav Agarwal <gaurav130...@gmail.com> wrote: > > On Wed, May 24, 2023, 10:41 AM Gaurav Agarwal <gaurav130...@gmail.com> > wrote: > >> I have one more query we are trying to compact files currently it is >> taking time as have never compacted till now this is the first time we are >> trying to perform compaction after 5 months of continuously loading data >> We change the format of the table from 1 to 2 also in bwtween >> The issue is we are sparkrewriteaction Java API to perform the collate >> but it is taking 24 hours for us to complete the job will there be a way in >> that api that i can pass date range options are there but what parameters >> should i pass there to make it date range >> >> Thanks >> >