Dude, please, attach the execution plan of the query and details about the indexes.
2015-03-06 9:07 GMT-03:00 anu <anamika.guo...@gmail.com>: > I have a query that's like: > > Could you help in providing me pointers as to how to start to optimize it > w.r.t. spark sql: > > > sqlContext.sql(" > > SELECT dw.DAY_OF_WEEK, dw.HOUR, avg(dw.SDP_USAGE) AS AVG_SDP_USAGE > > FROM ( > SELECT sdp.WID, DAY_OF_WEEK, HOUR, SUM(INTERVAL_VALUE) AS > SDP_USAGE > > FROM ( > SELECT * > > FROM date_d dd JOIN interval_f intf > > ON intf.DATE_WID = dd.WID > > WHERE intf.DATE_WID >= 20141116 AND > intf.DATE_WID <= 20141125 AND CAST(INTERVAL_END_TIME AS STRING) >= > '2014-11-16 00:00:00.000' AND CAST(INTERVAL_END_TIME > AS STRING) <= '2014-11-26 > 00:00:00.000' AND MEAS_WID = 3 > > ) test JOIN sdp_d sdp > > ON test.SDP_WID = sdp.WID > > WHERE sdp.UDC_ID = 'SP-1931201848' > > GROUP BY sdp.WID, DAY_OF_WEEK, HOUR, sdp.UDC_ID > > ) dw > > GROUP BY dw.DAY_OF_WEEK, dw.HOUR") > > > > Currently the query takes 15 minutes execution time where interval_f table > holds approx 170GB of raw data, date_d --> 170 MB and sdp_d --> 490MB > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Optimizing-SQL-Query-tp21948.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >