Hi All,
first off apologies if this is not the correct place to ask this! I've been following SPARK-19256 <https://issues.apache.org/jira/browse/SPARK-19256> (Hive Bucketing Support) with interest for some time now as we do a relatively large amount of our data processing in Spark but use Hive for business analytics. As a result we end up writing a non-trivial amount of data out twice; once in parquet optimized for Spark and once in once in orc optimized for Hive! The hope is that SPARK-19256 will put an end to this. I've noticed that there a PR (https://github.com/apache/spark/pull/19001) that's been open for almost a year now, with the last comment being over a month ago. Does anyone know if I should remain hopeful that this support will be added in the near future or is it one of those things that's realistically going to be some distance off. thanks, Chris