Re: [DISCUSS] Spark version support strategy

2021-09-23 Thread Steven Wu
During the sync meeting, people talked about if and how we can have the same version support model across engines like Flink and Spark. I can provide some input from the Flink side. Flink only supports two minor versions. E.g., right now Flink 1.13 is the latest released version. That means only F

RE: Error when writing large number of rows with S3FileIO

2021-09-23 Thread Mayur Srivastava
I’ll try to upgrade the version and retry. Thanks, Mayur From: Jack Ye Sent: Thursday, September 23, 2021 2:35 PM To: Iceberg Dev List Subject: Re: Error when writing large number of rows with S3FileIO Thanks, while I am looking into this, this seems to be a very old version, is there any rea

Re: Error when writing large number of rows with S3FileIO

2021-09-23 Thread Jack Ye
Thanks, while I am looking into this, this seems to be a very old version, is there any reason to use that version specifically? Have you tried a newer version? I know there have been quite a few updates to the S3 package related to uploading since then, maybe upgrading can solve the problem. -Jac

RE: Error when writing large number of rows with S3FileIO

2021-09-23 Thread Mayur Srivastava
No problem Jack. I’m using https://mvnrepository.com/artifact/software.amazon.awssdk/s3/2.10.53 Thanks, Mayur From: Jack Ye Sent: Thursday, September 23, 2021 1:24 PM To: Iceberg Dev List Subject: Re: Error when writing large number of rows with S3FileIO Hi Mayur, Thanks for reporting this i

Re: Iceberg disaster recovery and relative path sync-up

2021-09-23 Thread Anurag Mantripragada
Hi Russell, I don’t have see any major issues with your approach other than that it may break some custimizability of locations. If I understand correctly, today write.object-storage.path or write.metadata.path can be outside of the table base location. With your suggestion, are we saying that

Re: Error when writing large number of rows with S3FileIO

2021-09-23 Thread Jack Ye
Hi Mayur, Thanks for reporting this issue, could you report what version of AWS SDK V2 you are using? Best, Jack Ye On Thu, Sep 23, 2021 at 8:39 AM Mayur Srivastava < mayur.srivast...@twosigma.com> wrote: > Hi, > > > > I've an Iceberg table partitioned by a single "time" (monthly partitioned) >

Space Filling Curves / Z-Order Proposal - Roadmap

2021-09-23 Thread Russell Spitzer
Hi everybody! I've been working through some prototypes for getting Z-Order into OSS Iceberg. I have written out a basic plan for implementation and I was hoping everyone who is interested would take a look. I've also divided the work into what I think are independent sections so we can hopeful

Error when writing large number of rows with S3FileIO

2021-09-23 Thread Mayur Srivastava
Hi, I've an Iceberg table partitioned by a single "time" (monthly partitioned) column that has 400+ columns and >100k rows. I'm using parquet files and PartitionedWriter + S3FileIO to write the data. When I write <~50k rows, the writer works. But it fails with the exception below if I write mor