Re: Spark + Druid

Harish Butani Fri, 18 Sep 2015 09:52:49 -0700

Hi,

I have just posted a Blog on this:
https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani


regards,
Harish Butani.

On Tue, Sep 1, 2015 at 11:46 PM, Paolo Platter <paolo.plat...@agilelab.it>
wrote:

> Fantastic!!! I will look into that and I hope to contribute
>
> Paolo
>
> Inviata dal mio Windows Phone
> ------------------------------
> Da: Harish Butani <rhbutani.sp...@gmail.com>
> Inviato: ‎02/‎09/‎2015 06:04
> A: user <user@spark.apache.org>
> Oggetto: Spark + Druid
>
> Hi,
>
> I am working on the Spark Druid Package:
> https://github.com/SparklineData/spark-druid-olap.
> For scenarios where a 'raw event' dataset is being indexed in Druid it
> enables you to write your Logical Plans(queries/dataflows) against the 'raw
> event' dataset and it rewrites parts of the plan to execute as a Druid
> Query. In Spark the configuration of a Druid DataSource is somewhat like
> configuring an OLAP index in a traditional DB. Early results show
> significant speedup of pushing slice and dice queries to Druid.
>
> It comprises of a Druid DataSource that wraps the 'raw event' dataset and
> has knowledge of the Druid Index; and a DruidPlanner which is a set of plan
> rewrite strategies to convert Aggregation queries into a Plan having a
> DruidRDD.
>
> Here
> <https://github.com/SparklineData/spark-druid-olap/blob/master/docs/SparkDruid.pdf>
>  is
> a detailed design document, which also describes a benchmark of
> representative queries on the TPCH dataset.
>
> Looking for folks who would be willing to try this out and/or contribute.
>
> regards,
> Harish Butani.
>

Re: Spark + Druid

Reply via email to