Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

Ryan Blue Thu, 04 Mar 2021 09:20:40 -0800

Great, I'm glad everyone can make it. I've sent out an invite to the list
of people on the regular syncs. If you need to be added, please let me know.


On Thu, Mar 4, 2021 at 3:01 AM Paula Ta-Shma <[email protected]> wrote:

> Hi all,
>
> This time works for Guy, Gal and myself, looking forward
>
> thanks!
> Paula
>
> *Paula Ta-Shma, Ph.D.*
> Cloud Storage and Analytics
> IBM Research - Haifa
> Phone:+972.74.7929402
> Email: [email protected]
>
>
>
>
> From:        Miao Wang <[email protected]>
> To:        "[email protected]" <[email protected]>, "
> [email protected]" <[email protected]>, OpenInx <[email protected]>
> Cc:        Iceberg Dev List <[email protected]>
> Date:        04/03/2021 06:31
> Subject:        [EXTERNAL] Re: Secondary Indexes - Pluggable File Filter
> interface for Apache Iceberg
> ------------------------------
>
>
>
> It works for me. With a quick thought, there may be a few concerns about
> consolidated fashion storage. 1). Maintaining the consolidated storage may
> be a bit more complex; 2). It may make collecting index while writing data
> file (i.e., online
>
> It works for me.
>
>
>
> With a quick thought, there may be a few concerns about consolidated
> fashion storage.
>
>
>
> 1). Maintaining the consolidated storage may be a bit more complex;
>
> 2). It may make collecting index while writing data file (i.e., online
> index building) more complex (e.g., we need to consider that multiple
> writers write to the same consolidated index file in parallel);
>
> 3). We need to have some auxiliary structure in the index file to quickly
> locate relevant index given some key (e.g., a data file name);
>
>
>
> However, I do think consolidated fashion storage is some meaningful
> optimization on the disk. If we properly design splitable and mergeable
> index file format, the consolidation fashion and 1-data-file-1-index (1:1
> index file) are not mutual exclusive. Therefore, 1:1 index file can be the
> building block for larger consolidated index files and index at different
> levels, like partition level index.
>
>
>
> Our team member went through one pass of the design and shared some
> thoughts with me. I will complete my pass.
>
>
>
> Thanks!
>
>
>
> Miao
>
>
>
>
>
> *From: *Ryan Blue <[email protected]>
> *Date: *Wednesday, March 3, 2021 at 6:08 PM
> *To: *OpenInx <[email protected]>
> *Cc: *Iceberg Dev List <[email protected]>
> *Subject: *Re: Secondary Indexes - Pluggable File Filter interface for
> Apache Iceberg
>
> Great, thank you for planning to join! I definitely want to get your input
> on this as well.
>
>
>
> On Wed, Mar 3, 2021 at 6:06 PM OpenInx <*[email protected]*
> <[email protected]>> wrote:
>
> It will be  1:00 AM (China Standard Time) on 18 March,  and it works for
> our Asia people.   I'd love to attend this discussion, Thanks.
>
>
>
> On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue <[email protected]>
> wrote:
>
> Thanks for putting this together, Guy! I just did a pass over the doc and
> it looks like a really reasonable proposal for being able to inject custom
> file filter implementations.
>
>
>
> One of the main things we need to think about is how to store and track
> the index data. There's a comment in the doc about storing them in a
> "consolidated fashion" and I'd like to hear more about what you're thinking
> there. The index-per-file approach that Adobe is working on is a good way
> to track index data because we get a clear lifecycle for index data because
> it is tied to a data file that is immutable. On the other hand, the
> drawback is that we have a lot of index files -- one per data file.
>
>
>
> Let's set up a time to go talk through the options. Would 9AM PST (17:00
> UTC) on 17 March work for everyone? I'm thinking in the morning so everyone
> from IBM can attend. We can do a second discussion at a time that works
> more for people in Asia later on as well.
>
>
>
> If that day works, then I'll send out an invite.
>
>
>
> On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <*[email protected]*
> <[email protected]>> wrote:
>
> Hi All,
>
> Following up on our discussion from Wednesday sync here attached is a
> proposal to enhance iceberg with a pluggable interface for data skipping
> indexes to enable use of existing indexes in job planning.
>
>
> *https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__nam04.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fdocs.google.com-252Fdocument-252Fd-252F11o3T7XQVITY-5F5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY-252Fedit-253Fusp-253Dsharing-26data-3D04-257C01-257Cmiwang-2540adobe.com-257C9ce4b2e7876c4e23a8ac08d8deb26ffc-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C637504205348408643-257CUnknown-257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0-253D-257C1000-26sdata-3DvFOaNdSwCYQO1p-252FDeX5glae-252BSo9aOF3S-252BR2bU2O1tM0-253D-26reserved-3D0&d=DwMFAg&c=jf_iaSHvJObTbx-siA1ZOg&r=CCpi23S9sLfwLmJGiLj8eA&m=77-PoT5uLV9A_GQstMtrliWMN9LVMmXTfCjE-YR8Jsk&s=U4d2aQuDmG9yk4Y_IOQvLKweqbrAQWDGIpxaw8pvUeM&e=>
>
> We will be glad to get you feedback.
>
> Thanks,
> Guy
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>
>
> --
>
> Ryan Blue
>
> Software Engineer
>
> Netflix
>
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Secondary Indexes - Pluggable File Filter interface for Apache Iceberg

Reply via email to