Thank you Yew On Wed, May 24, 2023, 11:19 PM Wing Yew Poon <wyp...@cloudera.com.invalid> wrote:
> Gaurav, > > Is your data partitioned by date? If so, you can compact subsets of > partitions at a time. To do this using the Spark procedure, you pass a > where clause: > > spark.sql("CALL catalog_name.system.rewrite_data_files(table => '...', > where => '...')") > > If you use the RewriteDataFilesSparkAction, you call filter(Expression), > but then you have to pass in your where clause as an Iceberg Expression. > You can use > https://github.com/apache/iceberg/blob/apache-iceberg-1.2.1/spark/v3.3/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala > as shown in > https://github.com/apache/iceberg/blob/apache-iceberg-1.2.1/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java#L133-L135 > . > > - Wing Yew > > > On Tue, May 23, 2023 at 10:13 PM Gaurav Agarwal <gaurav130...@gmail.com> > wrote: > >> >> On Wed, May 24, 2023, 10:41 AM Gaurav Agarwal <gaurav130...@gmail.com> >> wrote: >> >>> I have one more query we are trying to compact files currently it is >>> taking time as have never compacted till now this is the first time we are >>> trying to perform compaction after 5 months of continuously loading data >>> We change the format of the table from 1 to 2 also in bwtween >>> The issue is we are sparkrewriteaction Java API to perform the collate >>> but it is taking 24 hours for us to complete the job will there be a way in >>> that api that i can pass date range options are there but what parameters >>> should i pass there to make it date range >>> >>> Thanks >>> >>