Re: Welcome new committer and PPMC member Ratandeep Ratti

2020-02-17 Thread Sud
Congratulations Ratandeep!! Keep up the good work! On Mon, Feb 17, 2020 at 6:26 AM Anjali Norwood wrote: > Congratulations Ratandeep!! > > regards. > Anjali. > > On Mon, Feb 17, 2020 at 12:19 AM Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > >> Congratulations 🎉!! >> >> On Sun, Feb

pull request build is failing

2020-05-19 Thread Sud
I am trying to prepare pull request for review but looks like python build is failing because of unrelated error. I have seen some other build fail with this error too. any one knows how to fix this issue? https://github.com/apache/incubator-iceberg/pull/1046 https://travis-ci.org/github/apache/in

Re: pull request build is failing

2020-05-19 Thread Sud
y 20, 2020 at 10:25 AM Sud wrote: > >> I am trying to prepare pull request for review but looks like python >> build is failing because of unrelated error. >> I have seen some other build fail with this error too. any one knows how >> to fix this issue? >> >> h

Re: pull request build is failing

2020-05-19 Thread Sud
Thank you! On Tue, May 19, 2020 at 9:49 PM Ryan Blue wrote: > Merged! Sorry I didn't notice the problem today. Hopefully you're > unblocked. > > On Tue, May 19, 2020 at 7:41 PM Sud wrote: > >> Thanks for quick reply. I will wait for this to get merged to

Need help reviewing PR https://github.com/apache/incubator-iceberg/pull/1046

2020-05-20 Thread Sud
Hello Devs, I want help reviewing the PR https://github.com/apache/incubator-iceberg/pull/1046 There can be more complex scenarios for union schema please feel free to make suggestions and I will add tests. there is no rush to merge this master but I wanted to validate approach and get feedback fr

question about reader task planning & BinPacking

2020-07-16 Thread Sud
HI Iceberg-devs We are trying to root cause issue where driver get stuck when trying to read comparatively large tables ( > 2000 snapshots) When I tried to look at the thread dump of the driver's main thread I see that thread is stuck in planning tasks. I also noticed that iceberg-worker-pool is

Re: question about reader task planning & BinPacking

2020-07-17 Thread Sud
Li wrote: > Hi Sud, > > The batch read of the Iceberg table should just read the latest snapshot. > I think this case is that your large tables have a large number of > manifest files. > > 1.The simple way is reducing manifest file numbers: > - For reducing manifest

Re: question about reader task planning & BinPacking

2020-07-17 Thread Sud
table scan: %s", scan); } } return tasks; } On Fri, Jul 17, 2020 at 9:35 AM Sud wrote: > Thanks @Jingsong for reply > > Yes one additional data point about the table. > This table is avro table and generated from stream ingestion. We expect a > couple of thousan

Re: question about reader task planning using SupportsReportStatistics

2020-07-17 Thread Sud
can push operators before getting stats. */ On Fri, Jul 17, 2020 at 12:35 PM Sud wrote: > ok after adding more instrumentation I see that Reader::estimateStatistics > may be a culprit. > > looks like estimated stats may be performing full table estimate and thats > why it is so

Re: question about reader task planning using SupportsReportStatistics

2020-07-19 Thread Sud
issues for TODOs -- Thanks On Fri, Jul 17, 2020 at 9:25 PM Jingsong Li wrote: > Thanks Sud for in-depth debugging. And thanks Ryan for the explanation. > > +1 to have a table property to disable stats estimation. > > IIUC, the difference between stats estimation and scan with filt

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Sud
We are using incremental read for iceberg tables which gets quite few appends ( ~500- 1000 per hour) . but instead of using timestamp we use snapshot ids and track state of last read snapshot Id. We are using timestamp as fallback when the state is incorrect, but as you mentioned if timestamps are

Connection leak (unclosed reader) leading to s3a timeout exceptions

2020-09-17 Thread Sud
HI Iceberg-dev's I am investigating the connection leak issue we are seeing after upgrading to the latest iceberg. I have narrowed down investigation to following PR and testing fix now https://github.com/apache/iceberg/commit/7060c928390c59e24dc207ec86f99132f6c1a828#diff-9726b2a5391d8755f6c5c849

Re: Connection leak (unclosed reader) leading to s3a timeout exceptions

2020-09-17 Thread Sud
submitted PR with fix please review https://github.com/apache/iceberg/pull/1474/files On Thu, Sep 17, 2020 at 3:56 PM Sud wrote: > HI Iceberg-dev's > > I am investigating the connection leak issue we are seeing after upgrading > to the latest iceberg. > I have narrowed d

Re: S3 strong read-after-write consistency

2020-12-02 Thread Sud
This feature will definitely help cases where we saw a file not found exception after creating the new file using s3a (spark use to retry task in that case). On Wed, Dec 2, 2020 at 2:11 AM Jungtaek Lim wrote: > What about S3FileIO implementation? I see some issue filed that even with > Hive cata