Re: Scaling partitioned Hive table support

2016-08-09 Thread Michael Allman
Hi Eric, I've rebased my first patch to master and created a Jira issue for tracking: https://issues.apache.org/jira/browse/SPARK-16980 . As mentioned in the issue, I will open a PR for discussion and design review, and include you in the conv

Re: Scaling partitioned Hive table support

2016-08-08 Thread Michael Allman
Hi Eric, Thanks for your feedback. I'm rebasing my code for the first approach on a more recent Spark master and am resolving some conflicts. I'll have a better understanding of the relationship to your PR once my rebase is complete. Cheers, Michael > On Aug 8, 2016, at 12:51 PM, Eric Liang

Re: Scaling partitioned Hive table support

2016-08-08 Thread Eric Liang
I like the former approach -- it seems more generally applicable to other catalogs and IIUC would let you defer pruning until execution time. Pruning is work that should be done by the catalog anyways, as is the case when querying over an (unconverted) hive table. You might also want to look at ht

Scaling partitioned Hive table support

2016-08-08 Thread Michael Allman
Hello, I'd like to propose a modification in the way Hive table partition metadata are loaded and cached. Currently, when a user reads from a partitioned Hive table whose metadata are not cached (and for which Hive table conversion is enabled and supported), all partition metadata is fetched fr