Hi All,

first off apologies if this is not the correct place to ask this!

I've been following SPARK-19256
<https://issues.apache.org/jira/browse/SPARK-19256> (Hive Bucketing
Support) with interest for some time now as we do a relatively large amount
of our data processing in Spark but use Hive for business analytics.  As a
result we end up writing a non-trivial amount of data out twice; once in
parquet optimized for Spark and once in once in orc optimized for Hive!
The hope is that SPARK-19256 will put an end to this.

I've noticed that there a PR (https://github.com/apache/spark/pull/19001)
that's been open for almost a year now, with the last comment being over a
month ago.  Does anyone know if I should remain hopeful that this support
will be added in the near future or is it one of those things that's
realistically going to be some distance off.

thanks,

Chris

Reply via email to