Re: Guidance on implementing Hybrid CDC Pattern from "CDC patterns in Apache Iceberg" talk

2023-09-22 Thread Samarth Jain
> What is the recommendation for storing the latest snapshot ID that is successfully merged into *table*? Ideally this is committed in the same transaction as the MERGE so that reprocessing is minimized. Does Iceberg support storing this as table metadata? I do not see any related information in th

Re: Approaching Vectorized Reading in Iceberg ..

2019-07-30 Thread Samarth Jain
Hey Gautam, Wanted to check back with you and see if you had any success running the benchmark and if you have any numbers to share. On Fri, Jul 26, 2019 at 4:34 PM Gautam wrote: > Got it. Commented out that module and it works. Was just curious why it > doesn't work on master branch either.

Re: Iceberg using V1 Vectorized Reader over Parquet ..

2019-09-09 Thread Samarth Jain
I wanted to share progress made so far with improving the performance of the Iceberg Arrow vectorized read path. BIGINT column Benchmark Cnt Score Error Units IcebergSourceFlatParquetDataReadBenchmark.readFileSourceVectorized 5 4.642 ± 1.629 s/op IcebergSourceFlatParquetDataReadBenchmar

Re: Iceberg using V1 Vectorized Reader over Parquet ..

2019-11-13 Thread Samarth Jain
dictionary encoded numeric data types like BIGINT, we are currently 7% slower. On Mon, Sep 9, 2019 at 4:55 PM Samarth Jain wrote: > I wanted to share progress made so far with improving the performance of > the Iceberg Arrow vectorized read path. > > BIGINT column > > Benchmark

Re: [VOTE] Release Apache Iceberg 0.8.0-incubating RC2

2020-04-30 Thread Samarth Jain
+1 (non-binding) all checks passed On Thu, Apr 30, 2020 at 4:06 PM John Zhuge wrote: > +1 (non-binding) > >1. Checked signature and checksum >2. Checked license >3. Built and ran unit tests. > > > On Thu, Apr 30, 2020 at 2:24 PM Owen O'Malley > wrote: > >> +1 >> >>1. Checked sig