Re: Iceberg articles for you

2020-03-12 Thread John Zhuge
Excellent blogs! Thank you, Saisai, Junjie, and Xiang. On Thu, Mar 12, 2020 at 8:12 PM 李响 wrote: > +1 for adding a new tab on website for blog/talk collections > > On Fri, Mar 13, 2020 at 10:17 AM OpenInx wrote: > >> Great work, Junjie. >> >> Maybe we could add a tab named [blog] under https://

Re: Iceberg articles for you

2020-03-12 Thread 李响
+1 for adding a new tab on website for blog/talk collections On Fri, Mar 13, 2020 at 10:17 AM OpenInx wrote: > Great work, Junjie. > > Maybe we could add a tab named [blog] under https://iceberg.apache.org/ > and put those English version posts > under there, so that people from world wide could

Re: Iceberg articles for you

2020-03-12 Thread OpenInx
Great work, Junjie. Maybe we could add a tab named [blog] under https://iceberg.apache.org/ and put those English version posts under there, so that people from world wide could read them (also indexed them by search engine). Thanks. On Fri, Mar 13, 2020 at 9:55 AM Junjie Chen wrote: > Hi devs

Iceberg articles for you

2020-03-12 Thread Junjie Chen
Hi devs We recently posted some articles in WeChat public platform to promote Iceberg against China developers community, Here are links: 1. Why I choose Apache Iceberg

Re: AvroFileAppender metrics

2020-03-12 Thread Luis Otero
Hi Ryan, I'll give it a try. Regards, L. On Thu, 12 Mar 2020 at 18:16, Ryan Blue wrote: > Hi Luis, > > You're right about what's happening. Because the Avro appender doesn't > track column-level stats, Iceberg can't determine that the file only > contains matching data rows and can be deleted.

Re: AvroFileAppender metrics

2020-03-12 Thread Ryan Blue
Hi Luis, You're right about what's happening. Because the Avro appender doesn't track column-level stats, Iceberg can't determine that the file only contains matching data rows and can be deleted. Parquet does keep those stats, so even though the partitioning doesn't guarantee the delete is safe,

AvroFileAppender metrics

2020-03-12 Thread Luis Otero
Hi, AvroFileAppender doesn't report min/max values ( https://github.com/apache/incubator-iceberg/blob/80cbc60ee55911ee627a7ad3013804394d7b5e9a/core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java#L60 ). As a side effect (I think) overwrite operations (if there are data files with the

Re: Has the topic of CDC (change data capture) been considered for Iceberg? If not, should it?

2020-03-12 Thread OpenInx
Hi Filip We (alibaba & tencent) are doing the apache iceberg row-level update/deletes POC, syncing the change log (such as row-level binlog) into iceberg data lake is the classic case we are trying to implement (another classic case would be one streaming job or batch job with one or more update