Hi Shawn, How long is it taking to run the actual query if you create a temp table or something with the result? the time I mentioned was only for the actual queries run.
How many rows are returned? as I mentioned depends on the dates we give if its 20days then 20 records based on the id. If one year then 365/366 days as per id for each table(we have 7tables and fetch all together). Need to narrow down if it’s the fetch taken a while or the actual query. so to narrow down the actual query, its lil complicated as our data is weather data and based on dates available in actual tables we have to fetch from actual table and to fetch the rest we have forecast table to fetch from. The queries are queried as follows: These are seven tables until the dates available in actual tables: SELECT data FROM db.tbl1 WHERE cell_id = 637448 and time_stamp between '2018-07-01' and '2018-07-20' & SELECT data FROM db.tbl2_deg WHERE cell_id = 2466 and time_stamp between '2018-07-01' and '2018-07-20' & SELECT data FROM db.tbl3 WHERE cell_id = 2466 and time_stamp between '2018-07-01' and '2018-07-20' & SELECT data FROM db.tbl4 WHERE cell_id = 24360 and time_stamp between '2018-07-01' and '2018-07-20' & SELECT data FROM db.tbl5 WHERE cell_id = 24360 and time_stamp between '2018-07-01' and '2018-07-20' & SELECT data FROM db.tbl6 WHERE cell_id = 24360 and time_stamp between '2018-07-01' and '2018-07-20' & SELECT data FROM db.tbl7 WHERE cell_id = 24360 and time_stamp between '2018-07-01' and '2018-07-20' lets say the dates available from actual tables are: {'tbl6': '2018-07-15', 'tbl5': '2018-07-15', 'tbl4': '2018-07-15', 'tbl3': '2018-07-15', 'tbl2': '2018-07-20', 'tbl7': '2018-07-22', 'tbl1': '2018-07-20'} hence the code generates forecast queries based on the dates above: SELECT a[0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-23' and '2018-07-20' and init_hour = 0 & SELECT b[0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-21' and '2018-07-20' and init_hour = 0 & SELECT c[0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-21' and '2018-07-20' and init_hour = 0 & SELECT d[0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-16' and '2018-07-20' and init_hour = 0 & SELECT e[0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-16' and '2018-07-20' and init_hour = 0 & SELECT f[0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-16' and '2018-07-20' and init_hour = 0 & SELECT g [0] FROM db.forecast WHERE cell_id = 9341 AND run_date between '2018-07-16' and '2018-07-20' and init_hour = 0 On Mon, Jul 23, 2018 at 10:07 AM, Shawn Weeks <swe...@weeksconsulting.us> wrote: > How long is it taking to run the actual query if you create a temp table > or something with the result? How many rows are returned? Need to narrow > down if it’s the fetch taken a while or the actual query. > > > > Thanks > > Shawn > > > > *From:* Sowjanya Kakarala <sowja...@agrible.com> > *Sent:* Monday, July 23, 2018 10:01 AM > *To:* user@hive.apache.org > *Subject:* Performance in hive fetch > > > > Hi Guys, > > > > I am trying to fetch the data from hive through python code based on dates > and id. > > > > For fetch of 20days till current day for 7tables together it is taking > 30seconds. > > for fetching for an year worth data for 7tables together it is taking > 3minutes26seconds. > > My tables are stored as orc and transactional set to true. > > > > so our goal is to make the fetch for an year data within a second or > 2seconds. > > > > I have tried it two ways: > > 1. cursor.execute("set hive.support.concurrency=true") > > cursor.execute("set hive.exec.dynamic.partition.mode=nonstrict") > > cursor.execute("SET hive.exec.parallel=true") > > cursor.execute("set tez.grouping.split-count=85") > > > > 2. cursor.execute("set hive.fetch.task.conversion=more") > > > > either way is performing the same, is there any better way for reaching > our goal? > > > > Any help is appreciable. > > > > > > Thanks > > Sowjanya > > >