Re: Support for Hive buckets

2018-12-02 Thread welder404
https://issues.apache.org/jira/browse/SPARK-19256 is an active umbrella feature. But as of 2.2, you can invoke APIs on DataFrames today to bucketize them on serialization using Hive. If you invoke val bucketCount = 100 df1 .repartition(bucketCount, col("a"), col("b")) .bucketBy(bucketCount, "a"

Re: Support for Hive buckets

2014-12-24 Thread tanejagagan
billion rows -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Support-for-Hive-buckets-tp8421p9905.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To

Re: Support for Hive buckets

2014-09-22 Thread Michael Armbrust
Hi Cody, There are currently no concrete plans for adding buckets to Spark SQL, but thats mostly due to lack of resources / demand for this feature. Adding full support is probably a fair amount of work since you'd have to make changes throughout parsing/optimization/execution. That said, there

Support for Hive buckets

2014-09-14 Thread Cody Koeninger
I noticed that the release notes for 1.1.0 said that spark doesn't support Hive buckets "yet". I didn't notice any jira issues related to adding support. Broadly speaking, what would be involved in supporting buckets, especially the bucketmapjoin and sortedmerge optimizations?