Re: Welcome Huaxin Gao as a committer!

2025-02-06 Thread Xingyuan Lin
Congrats Huaxin! On Thu, Feb 6, 2025 at 11:11 AM Denny Lee wrote: > Congratulations Huaxin!!! > > On Thu, Feb 6, 2025 at 7:47 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> Congratulations Huaxin! >> >> On Thu, Feb 6, 2025 at 8:41 AM Kevin Liu wrote: >> >>> Congratulations Huaxin!! Looking

Re: [PROPOSAL] Add manifest-level statistics for CBO estimation

2024-10-18 Thread Xingyuan Lin
want to > use that for CBO. What if we have a selective predicate that drastically > narrows down the scope of the operation? In that case, the file stats will > give us much more precise information. > > - Anton > > вт, 15 жовт. 2024 р. о 17:01 Xingyuan Lin > пи

[PROPOSAL] Add manifest-level statistics for CBO estimation

2024-10-15 Thread Xingyuan Lin
Hi everyone, Here's a doc for [Proposal] Add manifest-level statistics for CBO estimation . It's for more efficient derivation of stats for the CBO process. Original discussion thread

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Xingyuan Lin
ne discussions, and no formal > proposal has been drafted yet. That said, it seems like a strong candidate > for inclusion in the v4 spec. > > Yufei > > > On Thu, Sep 26, 2024 at 4:01 PM Xingyuan Lin > wrote: > >> Thanks Yufei for taking a look. >> >> Yes I

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Xingyuan Lin
ration may not always reflect > the latest snapshot, but overall, I think this approach should work well. > > [1] https://iceberg.apache.org/spec/#partition-statistics-file > > Yufei > > > On Thu, Sep 26, 2024 at 9:59 AM Xingyuan Lin > wrote: > >> Hi team, >

Re: [DISCUSS] Optimize for CBO

2024-09-26 Thread Xingyuan Lin
get table stats during the CBO process. Thanks, Xingyuan On Mon, Sep 23, 2024 at 8:32 PM Xingyuan Lin wrote: > Hi Iceberg dev team, > > Cost-based optimizers need these stats: > >- record count >- null count >- number of distinct values (NDVs) >- min/max v

[DISCUSS] Optimize for CBO

2024-09-23 Thread Xingyuan Lin
Hi Iceberg dev team, Cost-based optimizers need these stats: - record count - null count - number of distinct values (NDVs) - min/max values - column sizes Today, to get these stats, an engine must process manifest files, which can be an expensive operation when the table is large

Re: An exception is thrown when calling snapshot.manifests().forEach()

2019-12-05 Thread Xingyuan Lin
Never mind, it was probably because a wrong runtime library was used From: Xingyuan Lin Date: Wednesday, December 4, 2019 at 4:56 PM To: "dev@iceberg.apache.org" Subject: An exception is thrown when calling snapshot.manifests().forEach() Hi Iceberg team, I’m encountering a weird e