gt;>>>> how to invoke these method (in a distributed manner or single thread
>>>>>>> manner, in async or sync).
>>>>>>> Take our use case as an example, we plan to have a new DDL syntax
>>>>>>> "create index id_1 on table col_1 using bloom"/"update
ill invoke the index related method provided by
>>>>>> iceberg.
>>>>>>
>>>>>> Storage): Does the index data have to be a file? Wondering if we want
>>>>>> to design the index data storage interface in such way that people c
;>>>>
>>>>>>
>>>>>> I still remember some conclusions from previous discussions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1). Index types support: We planned to support Skipping Index first.
&
h reduces index reading overhead. Index file can be applied when
>>>>> generating the scan task.
>>>>>
>>>>>
>>>>>
>>>>> 2). As Ryan mentioned, Sequence number will be used to indicate
>>>>> whether an index is valid. Sequence
format which includes
>>>> Column Name/ID, Index Type (String), Index content length, and binary
>>>> content. It is not necessary to use Parquet to store index. Initial thought
>>>> was 1 data file mapping to 1 index file. It can be merged to 1 partition
>>
the index reading and writing
>>> interface with Iceberg and leave the actual building logic as Engine
>>> specific (i.e., we can use different compute to build Index without
>>> changing anything inside Iceberg).
>>>
>>>
>>>
>>> Misc:
; 4). How to build index: We want to keep the index reading and writing
>>> interface with Iceberg and leave the actual building logic as Engine
>>> specific (i.e., we can use different compute to build Index without
>>> changing anything inside Iceberg).
>>>
>
Index support API for DSv2 in Spark 3.x code base.
>>
>> Design doc:
>> https://docs.google.com/document/d/1qnq1X08Zb4NjCm4Nl_XYjAofwUgXUB03WDLM61B3a_8/edit
>>
>> PR should have been merged.
>>
>> Guy from IBM did a partial PoC and provided a private doc. I
>
> We can continue the discussion and breaking down the big tasks into
> tickets.
>
>
>
> Thanks!
>
>
>
> Miao
>
> *From: *Ryan Blue
> *Date: *Tuesday, January 25, 2022 at 5:08 PM
> *To: *Iceberg Dev List
> *Subject: *Re: Continuing the Secondary
did a partial PoC and provided a private doc. I will ask if he can
make it public.
We can continue the discussion and breaking down the big tasks into tickets.
Thanks!
Miao
From: Ryan Blue
Date: Tuesday, January 25, 2022 at 5:08 PM
To: Iceberg Dev List
Subject: Re: Continuing the Secondary
Thanks for raising this for discussion, Jack! It would be great to start
adding more indexes.
> Scope of native index support
The way I think about it, the biggest challenge here is how to know when
you can use an index. For example, if you have a partition index that is up
to date as of snapshot
Hi everyone,
Based on the conversation in the last community sync and the Iceberg Slack
channel, it seems like multiple parties have interest in continuing the
effort related to the secondary index in Iceberg, so I would like to
restart the thread to continue the discussion.
So far most people re
12 matches
Mail list logo