It makes sense only if Elasticsearch has an efficient join operator (actually semi-join in this case). I found a document[1] suggesting that you should only do joins to nested data sets.
I don’t know what Elasticsearch is capable of, but the place I’d start is to hand-write the Elasticsearch query that you’d like to generate. Then work backwards and write Calcite RelNodes and transformation rules to generate it. Julian [1] https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html <https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html> > On Jan 22, 2017, at 8:15 AM, Lior Perry <[email protected]> wrote: > > Hi > > I'm Newbie to *Calcite*... > > I'd like to enhance the existing elasticsearch adapter to allow some join > optimization: > > Consider the next query (joins 3 tables): > > Tables: > customer - 10K docs > > catalog_sales - 1M docks > > inventory - 100K docs > > Query: > > "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year" > , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer > " > > "join "catalog_sales"."catalog_sales" "catalog_sales" on > "catalog_sales"."cs_bill_customer_sk" = "customer"."c_customer_sk" > "join "inventory"."inventory" "inventory" on > "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk" > > > Now I'd like the execution plan to pushdown the results from the smallest > table as where clause (filter) parameters to the next joined table: > > Optimzed query: > > "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year" > , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer > " > > "join "catalog_sales"."catalog_sales" "catalog_sales" on > "catalog_sales"."cs_bill_customer_sk" = "customer"."c_customer_sk" > and "customer"."c_customer_sk" in ("13413","4341343"....) > "join "inventory"."inventory" "inventory" on > "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk" and > "catalog_sales"."cs_item_sk" in ("1343343","43213413"....) > > > Is it possible and is so what is the best way to do this ? > > Thanks so much... > > Lior > > > בתאריך 22 בינו׳ 2017 2:50 אחה״צ, "Lior Perry" <[email protected]> כתב: > > Hi > > I'm Newbie to *Calcite*... > > I'd like to enhance the existing elasticsearch adapter to allow some join > optimization: > > Consider the next query (joins 3 tables): > > Tables: > customer - 10K docs > > catalog_sales - 1M docks > > inventory - 100K docs > > Query: > > "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year" > , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer" > > "join "catalog_sales"."catalog_sales" "catalog_sales" on > "catalog_sales"."cs_bill_customer_sk" = "customer"."c_customer_sk" > "join "inventory"."inventory" "inventory" on > "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk" > > > Now I'd like the execution plan to pushdown the results from the smallest > table as where clause (filter) parameters to the next joined table: > > Optimzed query: > > "select "c_customer_sk", "c_first_name", "c_birth_country", "c_birth_year" > , "cs_sales_price", "inv_item_sk" from "customer_raw"."customer" "customer > " > > "join "catalog_sales"."catalog_sales" "catalog_sales" on > "catalog_sales"."cs_bill_customer_sk" = "customer"."c_customer_sk" > and "customer"."c_customer_sk" in ("13413","4341343"....) > "join "inventory"."inventory" "inventory" on > "inventory"."inv_item_sk" = "catalog_sales"."cs_item_sk" and > "catalog_sales"."cs_item_sk" in ("1343343","43213413"....) > > > Is it possible and if so what is the best way to do this ? > > Thanks so much... > > Lior
