gt;>>>> how to invoke these method (in a distributed manner or single thread
>>>>>>> manner, in async or sync).
>>>>>>> Take our use case as an example, we plan to have a new DDL syntax
>>>>>>> "create index id_1 on table col_1 using bloom"/"update
ill invoke the index related method provided by
>>>>>> iceberg.
>>>>>>
>>>>>> Storage): Does the index data have to be a file? Wondering if we want
>>>>>> to design the index data storage interface in such way that people c
;>>>>
>>>>>>
>>>>>> I still remember some conclusions from previous discussions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 1). Index types support: We planned to support Skipping Index first.
&
h reduces index reading overhead. Index file can be applied when
>>>>> generating the scan task.
>>>>>
>>>>>
>>>>>
>>>>> 2). As Ryan mentioned, Sequence number will be used to indicate
>>>>> whether an index is valid. Sequence
format which includes
>>>> Column Name/ID, Index Type (String), Index content length, and binary
>>>> content. It is not necessary to use Parquet to store index. Initial thought
>>>> was 1 data file mapping to 1 index file. It can be merged to 1 partition
>>
the index reading and writing
>>> interface with Iceberg and leave the actual building logic as Engine
>>> specific (i.e., we can use different compute to build Index without
>>> changing anything inside Iceberg).
>>>
>>>
>>>
>>> Misc:
; 4). How to build index: We want to keep the index reading and writing
>>> interface with Iceberg and leave the actual building logic as Engine
>>> specific (i.e., we can use different compute to build Index without
>>> changing anything inside Iceberg).
>>>
>
Index support API for DSv2 in Spark 3.x code base.
>>
>> Design doc:
>> https://docs.google.com/document/d/1qnq1X08Zb4NjCm4Nl_XYjAofwUgXUB03WDLM61B3a_8/edit
>>
>> PR should have been merged.
>>
>> Guy from IBM did a partial PoC and provided a private doc. I
>
> We can continue the discussion and breaking down the big tasks into
> tickets.
>
>
>
> Thanks!
>
>
>
> Miao
>
> *From: *Ryan Blue
> *Date: *Tuesday, January 25, 2022 at 5:08 PM
> *To: *Iceberg Dev List
> *Subject: *Re: Continuing the Secondary
did a partial PoC and provided a private doc. I will ask if he can
make it public.
We can continue the discussion and breaking down the big tasks into tickets.
Thanks!
Miao
From: Ryan Blue
Date: Tuesday, January 25, 2022 at 5:08 PM
To: Iceberg Dev List
Subject: Re: Continuing the Secondary
Thanks for raising this for discussion, Jack! It would be great to start
adding more indexes.
> Scope of native index support
The way I think about it, the biggest challenge here is how to know when
you can use an index. For example, if you have a partition index that is up
to date as of snapshot
11 matches
Mail list logo