Hive can leverage

Gourav Sengupta Sun, 16 Jun 2019 22:28:33 -0700

Hi Daniel,

not quite sure of this, but does Glue Data Catalogue support bucketing yet?
You might want to find that out first.



Regards,
Gourav

On Sat, Jun 15, 2019 at 1:30 PM Daniel Mateus Pires <dmate...@gmail.com>
wrote:

> Hi there!
>
> I am trying to optimize joins on data created by Spark, so I'd like to
> bucket the data to avoid shuffling.
>
> I am writing to immutable partitions every day by writing data to a local
> HDFS and then copying this data to S3, is there a combination of bucketBy
> options and DDL that I can use so that Presto/Athena JOINs leverage the
> special layout of the data?
>
> e.g.
> CREATE EXTERNAL TABLE ...(on Presto/Athena)
> df.write.bucketBy(...).partitionBy(...). (in spark)
> then copy this data to S3 with s3-dist-cp
> then MSCK REPAIR TABLE (on Presto/Athena)
>
> Daniel
>
>

Re: Creating Spark buckets that Presto / Athena / Hive can leverage

Reply via email to