Re: Ad-hoc partition bucketing

Joseph Allemandou Thu, 06 Jul 2023 23:26:51 -0700

Thank you Manu and Russell for your answers,

Would there be any document / ticket / commit where I could find some
information or example on how the partition transforms are implemented, and
the various code places they involve touching ?
Many thanks again :)


Joseph

On Wed, Jul 5, 2023 at 1:43 PM <[email protected]> wrote:

> We have been discussing something like this as well, either an arbitrary
> partitioning scheme or just a more extensive and customizable transform.
>
> An example I’m interested in is a geo hash index where we store offsets on
> a large grid to denote partitions. The total offset file for the whole
> planet still only ends up being in the low megabytes while accounting for
> high density in cities and low density over oceans
>
> Sent from my iPhone
>
> On Jul 4, 2023, at 8:08 AM, Joseph Allemandou <[email protected]>
> wrote:
>
> 
> Hi Iceberg team,
>
> I'm working at the WikimediaFoundation, and we started using Iceberg for
> some of our big-data tables - we love it :)
>
> One of the needs we'll have in the future would be to partition data using
> a specific bucketing function.
> How complex would that be to add a new function to the ones already
> present in the Iceberg partitioning mechanism? Is there any docs on doing
> that?
> Bonus points: Are there any plans to make it possible for users to
> reference their own bucketing functions at table definition?
>
> Many thanks for the awesome project<3
>
> --
> Joseph Allemandou (joal) (he / him)
> Staff Data Engineer
> Wikimedia Foundation
>
>

-- 
Joseph Allemandou (joal) (he / him)
Staff Data Engineer
Wikimedia Foundation

Re: Ad-hoc partition bucketing

Reply via email to