I updated the invites. Sorry for the mixup! On Fri, Mar 5, 2021 at 2:10 AM webdev.andrei <[email protected]> wrote:
> Hi all, > > I would like to attend the discussion. I'm very interested into it as I'm > working with Miao's team on indexing. The PR for Iceberg support in > Hyperspace referred by Miao is my work. > > If needed I can explain how Hyperspace works and what’s the plan with > Hyperspace for the near future. > > You can add me either with this email (personal email) or > [email protected], or both. > > Thanks! > > Andrei Ionescu > > > > > > On Thu, Mar 4, 2021 at 8:55 PM Guy Khazma <[email protected]> wrote: > >> Hi Miao, >> >> I am looking forward to discuss this in the meeting. >> I think these are valid concerns and there is a tradeoff between the >> convenience of collecting and tracking the indexes per file independently >> to the performance overhead of keeping them separately when used in run >> time. >> One possible approach is to use iceberg to save the metadata, and to use >> compaction in iceberg in order to merge the indexes to a consolidated >> location. >> >> Thanks, >> Guy >> >> On 2021/03/04 04:30:53, Miao Wang <[email protected]> wrote: >> > It works for me. >> > >> > With a quick thought, there may be a few concerns about consolidated >> fashion storage. >> > >> > 1). Maintaining the consolidated storage may be a bit more complex; >> > 2). It may make collecting index while writing data file (i.e., online >> index building) more complex (e.g., we need to consider that multiple >> writers write to the same consolidated index file in parallel); >> > 3). We need to have some auxiliary structure in the index file to >> quickly locate relevant index given some key (e.g., a data file name); >> > >> > However, I do think consolidated fashion storage is some meaningful >> optimization on the disk. If we properly design splitable and mergeable >> index file format, the consolidation fashion and 1-data-file-1-index (1:1 >> index file) are not mutual exclusive. Therefore, 1:1 index file can be the >> building block for larger consolidated index files and index at different >> levels, like partition level index. >> > >> > Our team member went through one pass of the design and shared some >> thoughts with me. I will complete my pass. >> > >> > Thanks! >> > >> > Miao >> > >> > >> > From: Ryan Blue <[email protected]> >> > Date: Wednesday, March 3, 2021 at 6:08 PM >> > To: OpenInx <[email protected]> >> > Cc: Iceberg Dev List <[email protected]> >> > Subject: Re: Secondary Indexes - Pluggable File Filter interface for >> Apache Iceberg >> > Great, thank you for planning to join! I definitely want to get your >> input on this as well. >> > >> > On Wed, Mar 3, 2021 at 6:06 PM OpenInx <[email protected]<mailto: >> [email protected]>> wrote: >> > It will be 1:00 AM (China Standard Time) on 18 March, and it works >> for our Asia people. I'd love to attend this discussion, Thanks. >> > >> > On Thu, Mar 4, 2021 at 9:50 AM Ryan Blue <[email protected]> >> wrote: >> > Thanks for putting this together, Guy! I just did a pass over the doc >> and it looks like a really reasonable proposal for being able to inject >> custom file filter implementations. >> > >> > One of the main things we need to think about is how to store and track >> the index data. There's a comment in the doc about storing them in a >> "consolidated fashion" and I'd like to hear more about what you're thinking >> there. The index-per-file approach that Adobe is working on is a good way >> to track index data because we get a clear lifecycle for index data because >> it is tied to a data file that is immutable. On the other hand, the >> drawback is that we have a lot of index files -- one per data file. >> > >> > Let's set up a time to go talk through the options. Would 9AM PST >> (17:00 UTC) on 17 March work for everyone? I'm thinking in the morning so >> everyone from IBM can attend. We can do a second discussion at a time that >> works more for people in Asia later on as well. >> > >> > If that day works, then I'll send out an invite. >> > >> > On Fri, Feb 19, 2021 at 8:49 AM Guy Khazma <[email protected]<mailto: >> [email protected]>> wrote: >> > Hi All, >> > >> > Following up on our discussion from Wednesday sync here attached is a >> proposal to enhance iceberg with a pluggable interface for data skipping >> indexes to enable use of existing indexes in job planning. >> > >> > >> https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY/edit?usp=sharing >> < >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY%2Fedit%3Fusp%3Dsharing&data=04%7C01%7Cmiwang%40adobe.com%7C9ce4b2e7876c4e23a8ac08d8deb26ffc%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637504205348408643%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vFOaNdSwCYQO1p%2FDeX5glae%2BSo9aOF3S%2BR2bU2O1tM0%3D&reserved=0 >> > >> > >> > We will be glad to get you feedback. >> > >> > Thanks, >> > Guy >> > >> > >> > -- >> > Ryan Blue >> > Software Engineer >> > Netflix >> > >> > >> > -- >> > Ryan Blue >> > Software Engineer >> > Netflix >> > >> > -- Ryan Blue Software Engineer Netflix
