Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

2021-03-04 Thread Guy Khazma
Hi Miao, I am looking forward to discuss this in the meeting. I think these are valid concerns and there is a tradeoff between the convenience of collecting and tracking the indexes per file independently to the performance overhead of keeping them separately when used in run time. One possible

Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

2021-03-04 Thread Ryan Blue
Done. Great to hear that Iceberg is now supported in Hyperspace! On Thu, Mar 4, 2021 at 9:22 AM Miao Wang wrote: > @rb...@netflix.com can you add @Andrei Taleanu > to the sync up? > > > > He mainly works on indexing in our team. > > > > His PR to Hyperspace has been merged, > https://github.co

Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

2021-03-04 Thread Miao Wang
@rb...@netflix.com can you add @Andrei Taleanu to the sync up? He mainly works on indexing in our team. His PR to Hyperspace has been merged, https://github.com/microsoft/hyperspace/pull/358 Iceberg is now supported in Hyperspace for covering

Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

2021-03-04 Thread Ryan Blue
Great, I'm glad everyone can make it. I've sent out an invite to the list of people on the regular syncs. If you need to be added, please let me know. On Thu, Mar 4, 2021 at 3:01 AM Paula Ta-Shma wrote: > Hi all, > > This time works for Guy, Gal and myself, looking forward > > thanks! > Paula >

RE: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

2021-03-04 Thread Paula Ta-Shma
Hi all, This time works for Guy, Gal and myself, looking forward thanks! Paula Paula Ta-Shma, Ph.D. Cloud Storage and Analytics IBM Research - Haifa Phone: +972.74.7929402 Email: pa...@il.ibm.com From: Miao Wang To: "dev@iceberg.apache.org" , "rb...@netflix.com" , OpenInx Cc: Ic

Re: Hive query with join of Iceberg table and Hive table

2021-03-04 Thread Vivekanand Vellanki
Our concern is not specific to Iceberg. I am concerned about the memory requirement in caching a large number of splits. With Iceberg, estimating row counts when the query has predicates requires scanning the manifest list and manifest files to identify all the data files; and compute the row coun