It makes sense only if Elasticsearch has an efficient join operator (actually 
semi-join in this case). I found a document[1] suggesting that you should only 
do joins to nested data sets.

I don’t know what Elasticsearch is capable of, but the place I’d start is to 
hand-write the Elasticsearch query that you’d like to generate. Then work 
backwards and write Calcite RelNodes and transformation rules to generate it.

Julian

[1] 
https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html
 
<https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html>

> On Jan 22, 2017, at 8:15 AM, Lior Perry <[email protected]> wrote:
> 
> Hi
> 
> I'm Newbie to *Calcite*...
> 
> I'd like to enhance the existing elasticsearch adapter to allow some join
> optimization:
> 
> Consider the next query (joins 3 tables):
> 
> Tables:
> customer - 10K docs
> 
> catalog_sales - 1M docks
> 
> inventory - 100K docs
> 
> Query:
> 
> "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year"
> , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer
> "
> 
>        "join "catalog_sales"."catalog_sales" "catalog_sales" on
> "catalog_sales"."cs_bill_customer_sk" =  "customer"."c_customer_sk"
>        "join "inventory"."inventory" "inventory"  on
> "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk"
> 
> 
> Now I'd like the execution plan to pushdown the results from the smallest
> table as where clause (filter) parameters to the next joined table:
> 
> Optimzed query:
> 
> "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year"
> , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer
> "
> 
>        "join "catalog_sales"."catalog_sales" "catalog_sales" on
> "catalog_sales"."cs_bill_customer_sk" =  "customer"."c_customer_sk"
> and "customer"."c_customer_sk" in ("13413","4341343"....)
>        "join "inventory"."inventory" "inventory"  on
> "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk"  and
> "catalog_sales"."cs_item_sk" in ("1343343","43213413"....)
> 
> 
> Is it possible and is so what is the best way to do this ?
> 
> Thanks so much...
> 
> Lior
> 
> 
> בתאריך 22 בינו׳ 2017 2:50 אחה״צ,‏ "Lior Perry" <[email protected]> כתב:
> 
> Hi
> 
> I'm Newbie to *Calcite*...
> 
> I'd like to enhance the existing elasticsearch adapter to allow some join
> optimization:
> 
> Consider the next query (joins 3 tables):
> 
> Tables:
> customer - 10K docs
> 
> catalog_sales - 1M docks
> 
> inventory - 100K docs
> 
> Query:
> 
> "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year"
> , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer"
> 
>        "join "catalog_sales"."catalog_sales" "catalog_sales" on
> "catalog_sales"."cs_bill_customer_sk" =  "customer"."c_customer_sk"
>        "join "inventory"."inventory" "inventory"  on
> "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk"
> 
> 
> Now I'd like the execution plan to pushdown the results from the smallest
> table as where clause (filter) parameters to the next joined table:
> 
> Optimzed query:
> 
> "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year"
> , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer
> "
> 
>        "join "catalog_sales"."catalog_sales" "catalog_sales" on
> "catalog_sales"."cs_bill_customer_sk" =  "customer"."c_customer_sk"
> and "customer"."c_customer_sk" in ("13413","4341343"....)
>        "join "inventory"."inventory" "inventory"  on
> "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk"  and
> "catalog_sales"."cs_item_sk" in ("1343343","43213413"....)
> 
> 
> Is it possible and if so what is the best way to do this ?
> 
> Thanks so much...
> 
> Lior

Reply via email to