Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-26 Thread Ryan Blue
Great, thanks for the update! I'm glad that cleaning that up fixed the problem. On Tue, Jan 26, 2021 at 11:46 AM Gautam wrote: > Hey Ryan & David, > I believe this change from you [1] indirectly achieves this. > David's issue is that every table.load() is instantiating one FS handl

Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-26 Thread David Wilcox
Collazo Mojica ; Grp-XAD ; David Wilcox Subject: Re: Ways To Alleviate Load For Tables With Many Snapshots + dawilcox On Tue, Jan 26, 2021 at 11:46 AM Gautam mailto:gautamkows...@gmail.com>> wrote: Hey Ryan & David, I believe this change from you [1] indirectly achieve

Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-26 Thread David Wilcox
From: David Wilcox Sent: Tuesday, January 26, 2021 12:59 PM To: Gautam ; Iceberg Dev List ; Ryan Blue Cc: Gautam Kowshik ; Xabriel Collazo Mojica ; Grp-XAD Subject: Re: Ways To Alleviate Load For Tables With Many Snapshots Ahh. This is better. I hadn't gotten any email

Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-26 Thread Gautam
+ dawilcox On Tue, Jan 26, 2021 at 11:46 AM Gautam wrote: > Hey Ryan & David, > I believe this change from you [1] indirectly achieves this. > David's issue is that every table.load() is instantiating one FS handle for > each snapshot, and in your change, by converting the File ref

Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-26 Thread Gautam
Hey Ryan & David, I believe this change from you [1] indirectly achieves this. David's issue is that every table.load() is instantiating one FS handle for each snapshot, and in your change, by converting the File reference into location string this is already a lazy read (in a way?).

Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-26 Thread Ryan Blue
David, We could probably make it so that Snapshot instances are lazily created from the metadata file, but that would be a fairly large change. If you're interested, we can definitely make it happen. I agree with Vivekanand, though. A much easier solution is to reduce the number of snapshots in t

Re: Ways To Alleviate Load For Tables With Many Snapshots

2021-01-21 Thread Vivekanand Vellanki
Just curious, what is the need to retain all those snapshots? I would assume that there is a mechanism to expire snapshots and delete data/manifest files that are no longer required. On Thu, Jan 21, 2021 at 11:01 PM David Wilcox wrote: > Hi Iceberg Devs, > > I have a process that reads Tables s