Yeah does seem like we may have more use cases for this. The more Peter and
I discuss this the more I think it makes sense to add in.
On Mon, May 22, 2023 at 8:24 AM Péter Váry
wrote:
> The feature could be useful for Spark as well. See:
> https://github.com/apache/iceberg/pull/7636#pullrequestr
The feature could be useful for Spark as well. See:
https://github.com/apache/iceberg/pull/7636#pullrequestreview-1434981224
Maybe we should add this as a topic for the next Iceberg Community Sync.
Also when trying out possible solutions, I have found that some of the
statistics are modifiable. I
The proposal here is essentially column stats projection pushdown. For some
Flink jobs with watermark alignment, Flink source is only interested in the
column stats (min-max) for one timestamp column. Hence the column stats
projection can really help reduce memory footprint for wide tables (with
hu
Thanks Ryan, Russell,
Let me explain the situation a bit further.
We have time series data written to an Iceberg table, then there is a Flink
job which uses this Iceberg table as a source to read the incoming data
continuously.
*Downstream job -> Iceberg table -> Flink job *
The Flink job
Thanks Ryan, Russel for the quick response!
In our Flink job we have TumblingEventTimeWindow to filter out old data.
There was a temporary issue with accessing the Catalog, and our Flink job
was not able to read the data from the Iceberg table for a while.
When the Flink job was able to access th
Yes, I agree with Russell. You'd want to push the filter into planning
rather than returning stats. That's why we strip out stats when the file
metadata is copied. It also would be expensive to copy some, but not all of
the file stats. It's better not to store the stats you don't need.
What about
I think currently the recommendation would be to filter the iterator rather
than pulling the whole object with stat's into memory. Is there a
requirement that all of the DataFiles be pulled into memory before
filtering?
On Mon, May 15, 2023 at 9:49 AM Péter Váry
wrote:
> Hi Team,
>
> We have a F