Re: rewrite action for collate how can we pass date range?

Gaurav Agarwal Wed, 24 May 2023 20:58:32 -0700

Thank you Yew

On Wed, May 24, 2023, 11:19 PM Wing Yew Poon <wyp...@cloudera.com.invalid>
wrote:


> Gaurav,
>
> Is your data partitioned by date? If so, you can compact subsets of
> partitions at a time. To do this using the Spark procedure, you pass a
> where clause:
>
> spark.sql("CALL catalog_name.system.rewrite_data_files(table => '...',
> where => '...')")
>
> If you use the RewriteDataFilesSparkAction, you call filter(Expression),
> but then you have to pass in your where clause as an Iceberg Expression.
> You can use
> https://github.com/apache/iceberg/blob/apache-iceberg-1.2.1/spark/v3.3/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkExpressionConverter.scala
> as shown in
> https://github.com/apache/iceberg/blob/apache-iceberg-1.2.1/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java#L133-L135
> .
>
> - Wing Yew
>
>
> On Tue, May 23, 2023 at 10:13 PM Gaurav Agarwal <gaurav130...@gmail.com>
> wrote:
>
>>
>> On Wed, May 24, 2023, 10:41 AM Gaurav Agarwal <gaurav130...@gmail.com>
>> wrote:
>>
>>> I have one more query we are trying to compact files currently it is
>>> taking time as have never compacted till now this is the first time we are
>>> trying to perform compaction after 5 months of continuously loading data
>>> We change the format of the table from 1 to 2 also in bwtween
>>> The issue is we are sparkrewriteaction Java API to perform the collate
>>> but it is taking 24 hours for us to complete the job will there be a way in
>>> that api that i can pass date range options are there but what parameters
>>> should i pass there to make it date range
>>>
>>> Thanks
>>>
>>

Re: rewrite action for collate how can we pass date range?

Reply via email to