HI Michael, It's not publicly available right now, though we can probably chat about it offline. It's not a super novel concept or anything, in fact I had proposed it a long time ago on the mailing lists.
-Evan On Mon, Mar 24, 2014 at 1:34 PM, Michael Armbrust <mich...@databricks.com> wrote: > Hi Evan, > > Index support is definitely something we would like to add, and it is > possible that adding support for your custom indexing solution would not be > too difficult. > > We already push predicates into hive table scan operators when the > predicates are over partition keys. You can see an example of how we > collect filters and decide which can pushed into the scan using the > HiveTableScan query planning strategy. > > I'd like to know more about your indexing solution. Is this something > publicly available? One concern here is that the query planning code is not > considered a public API and so is likely to change quite a bit as we improve > the optimizer. Its not currently something that we plan to expose for > external components to modify. > > Michael > > > On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan <e...@ooyala.com> wrote: >> >> Hi Michael, >> >> Congrats, this is really neat! >> >> What thoughts do you have regarding adding indexing support and >> predicate pushdown to this SQL framework? Right now we have custom >> bitmap indexing to speed up queries, so we're really curious as far as >> the architectural direction. >> >> -Evan >> >> >> On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust >> <mich...@databricks.com> wrote: >> >> >> >> It will be great if there are any examples or usecases to look at ? >> >> >> > There are examples in the Spark documentation. Patrick posted and >> > updated >> > copy here so people can see them before 1.0 is released: >> > >> > http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html >> > >> >> Does this feature has different usecases than shark or more cleaner as >> >> hive dependency is gone? >> >> >> > Depending on how you use this, there is still a dependency on Hive (By >> > default this is not the case. See the above documentation for more >> > details). However, the dependency is on a stock version of Hive instead >> > of >> > one modified by the AMPLab. Furthermore, Spark SQL has its own >> > optimizer, >> > instead of relying on the Hive optimizer. Long term, this is going to >> > give >> > us a lot more flexibility to optimize queries specifically for the Spark >> > execution engine. We are actively porting over the best parts of shark >> > (specifically the in-memory columnar representation). >> > >> > Shark still has some features that are missing in Spark SQL, including >> > SharkServer (and years of testing). Once SparkSQL graduates from Alpha >> > status, it'll likely become the new backend for Shark. >> >> >> >> -- >> -- >> Evan Chan >> Staff Engineer >> e...@ooyala.com | > > -- -- Evan Chan Staff Engineer e...@ooyala.com |