RE: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

Paula Ta-Shma Thu, 04 Mar 2021 03:01:50 -0800

Hi all,

This time works for Guy, Gal and myself, looking forward


thanks!
Paula

Paula Ta-Shma, Ph.D.
Cloud Storage and Analytics
IBM Research - Haifa
Phone: +972.74.7929402
Email: pa...@il.ibm.com




From:   Miao Wang <miw...@adobe.com.INVALID>
To:     "dev@iceberg.apache.org" <dev@iceberg.apache.org>, 
"rb...@netflix.com" <rb...@netflix.com>, OpenInx <open...@gmail.com>
Cc:     Iceberg Dev List <dev@iceberg.apache.org>
Date:   04/03/2021 06:31
Subject:        [EXTERNAL] Re: Secondary Indexes - Pluggable File Filter 
interface for Apache Iceberg



It works for me. With a quick thought, there may be a few concerns about 
consolidated fashion storage. 1). Maintaining the consolidated storage may 
be a bit more complex; 2). It may make collecting index while writing data 
file (i.e., online 
It works for me.
 
With a quick thought, there may be a few concerns about consolidated 
fashion storage.
 
1). Maintaining the consolidated storage may be a bit more complex; 
2). It may make collecting index while writing data file (i.e., online 
index building) more complex (e.g., we need to consider that multiple 
writers write to the same consolidated index file in parallel);
3). We need to have some auxiliary structure in the index file to quickly 
locate relevant index given some key (e.g., a data file name);
 
However, I do think consolidated fashion storage is some meaningful 
optimization on the disk. If we properly design splitable and mergeable 
index file format, the consolidation fashion and 1-data-file-1-index (1:1 
index file) are not mutual exclusive. Therefore, 1:1 index file can be the 
building block for larger consolidated index files and index at different 
levels, like partition level index.
 
Our team member went through one pass of the design and shared some 
thoughts with me. I will complete my pass.
 
Thanks!
 
Miao
 
 
From: Ryan Blue <rb...@netflix.com.INVALID>
Date: Wednesday, March 3, 2021 at 6:08 PM
To: OpenInx <open...@gmail.com>
Cc: Iceberg Dev List <dev@iceberg.apache.org>
Subject: Re: Secondary Indexes - Pluggable File Filter interface for 
Apache Iceberg
Great, thank you for planning to join! I definitely want to get your input 
on this as well.
 
On Wed, Mar 3, 2021 at 6:06 PM OpenInx <open...@gmail.com> wrote:
It will be  1:00 AM (China Standard Time) on 18 March,  and it works for 
our Asia people.   I'd love to attend this discussion, Thanks.
 
On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue <rb...@netflix.com.invalid> 
wrote:
Thanks for putting this together, Guy! I just did a pass over the doc and 
it looks like a really reasonable proposal for being able to inject custom 
file filter implementations.
 
One of the main things we need to think about is how to store and track 
the index data. There's a comment in the doc about storing them in a 
"consolidated fashion" and I'd like to hear more about what you're 
thinking there. The index-per-file approach that Adobe is working on is a 
good way to track index data because we get a clear lifecycle for index 
data because it is tied to a data file that is immutable. On the other 
hand, the drawback is that we have a lot of index files -- one per data 
file.
 
Let's set up a time to go talk through the options. Would 9AM PST (17:00 
UTC) on 17 March work for everyone? I'm thinking in the morning so 
everyone from IBM can attend. We can do a second discussion at a time that 
works more for people in Asia later on as well.
 
If that day works, then I'll send out an invite.
 
On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <guyk...@gmail.com> wrote:
Hi All,

Following up on our discussion from Wednesday sync here attached is a 
proposal to enhance iceberg with a pluggable interface for data skipping 
indexes to enable use of existing indexes in job planning.

https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing


We will be glad to get you feedback.

Thanks,
Guy

 
-- 
Ryan Blue
Software Engineer
Netflix

 
-- 
Ryan Blue
Software Engineer
Netflix

RE: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

Reply via email to