date:20201231

Re: Question on bucketing vs sorting

2020-12-31 Thread Peyman Mohajerian

So there's the hive partitions, that's at rest partitioning, vs Spark partitioning, make sure you're not confusing the two. If the cardinality of the column you want to bucket by isn't too high and you don't have data skewness with respect to the buckets then you should use it (and each partition h

Re: Question on bucketing vs sorting

2020-12-31 Thread Patrik Iselind

Thank you Peyman for clarifying this for me. Would you say there's a case for using bucketing in this case at all, or should I simply focus completely on the sorting solution? If so, when would you say bucketing is the preferred solution? Patrik Iselind On Thu, Dec 31, 2020 at 4:15 PM Peyman Moh

Apache Spark is left out by Airbus

2020-12-31 Thread LInda hackkanan

Looking at the Big Picture https://backbutton.co.uk/about.html This guy gives his reasons for choosing Flink over Spark. https://youtu.be/sYlbD_OoHhs Airbus makes more of the sky with Flink - Jesse Anderson & Hassene Ben Salem Is he leading people up the wrong garden path by making a

Re: Question on bucketing vs sorting

2020-12-31 Thread Peyman Mohajerian

You can save your data to hdfs or other targets using either a sorted or bucketed dataframe. In the case of bucketing you will have a different data skipping mechanism when you read back the data compared to the sorted version. On Thu, Dec 31, 2020 at 5:40 AM Patrik Iselind wrote: > Hi everyone,

Question on bucketing vs sorting

2020-12-31 Thread Patrik Iselind

Hi everyone, I am trying to push by understanding of bucketing vs sorting. I hope I can get som clarification from this list. Bucketing as I've come to understand it is primarily intended for when preparing the dataframe for join operations. Where the goal is to get data that will be joined toget

About The Big Picture

2020-12-31 Thread LInda hackkanan

Holden Karau https://www.amazon.co.uk/High-Performance-Spark-Practices-Optimizing-ebook/dp/B0725YT69J made the same point earlier (can be found in the archives) as seen in the Big Picture https://backbutton.co.uk/about.html when she said Apache Spark and Apache Flink are not "enemies",

Re: Question on bucketing vs sorting

Re: Question on bucketing vs sorting

Apache Spark is left out by Airbus

Re: Question on bucketing vs sorting

Question on bucketing vs sorting

About The Big Picture

6 matches

Site Navigation

Mail list logo

Footer information