Re: new Catalyst/SQL component merged into master

Evan Chan Tue, 25 Mar 2014 09:09:13 -0700

HI Michael,

It's not publicly available right now, though we can probably chat
about it offline.   It's not a super novel concept or anything, in
fact I had proposed it a long time ago on the mailing lists.


-Evan

On Mon, Mar 24, 2014 at 1:34 PM, Michael Armbrust
<mich...@databricks.com> wrote:
> Hi Evan,
>
> Index support is definitely something we would like to add, and it is
> possible that adding support for your custom indexing solution would not be
> too difficult.
>
> We already push predicates into hive table scan operators when the
> predicates are over partition keys.  You can see an example of how we
> collect filters and decide which can pushed into the scan using the
> HiveTableScan query planning strategy.
>
> I'd like to know more about your indexing solution.  Is this something
> publicly available?  One concern here is that the query planning code is not
> considered a public API and so is likely to change quite a bit as we improve
> the optimizer.  Its not currently something that we plan to expose for
> external components to modify.
>
> Michael
>
>
> On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan <e...@ooyala.com> wrote:
>>
>> Hi Michael,
>>
>> Congrats, this is really neat!
>>
>> What thoughts do you have regarding adding indexing support and
>> predicate pushdown to this SQL framework?    Right now we have custom
>> bitmap indexing to speed up queries, so we're really curious as far as
>> the architectural direction.
>>
>> -Evan
>>
>>
>> On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
>> <mich...@databricks.com> wrote:
>> >>
>> >> It will be great if there are any examples or usecases to look at ?
>> >>
>> > There are examples in the Spark documentation.  Patrick posted and
>> > updated
>> > copy here so people can see them before 1.0 is released:
>> >
>> > http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
>> >
>> >> Does this feature has different usecases than shark or more cleaner as
>> >> hive dependency is gone?
>> >>
>> > Depending on how you use this, there is still a dependency on Hive (By
>> > default this is not the case.  See the above documentation for more
>> > details).  However, the dependency is on a stock version of Hive instead
>> > of
>> > one modified by the AMPLab.  Furthermore, Spark SQL has its own
>> > optimizer,
>> > instead of relying on the Hive optimizer.  Long term, this is going to
>> > give
>> > us a lot more flexibility to optimize queries specifically for the Spark
>> > execution engine.  We are actively porting over the best parts of shark
>> > (specifically the in-memory columnar representation).
>> >
>> > Shark still has some features that are missing in Spark SQL, including
>> > SharkServer (and years of testing).  Once SparkSQL graduates from Alpha
>> > status, it'll likely become the new backend for Shark.
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |
>
>



-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: new Catalyst/SQL component merged into master

Reply via email to