Can you run your query with following config: hive> set hive.fetch.task.conversion=none;
and run your two queries with this. Lets see if this makes a difference. My expectation is this will result in MR job getting launched and thus runtimes might be different. On Sat, Jan 10, 2015 at 4:54 PM, Abhishek kumar <[email protected]> wrote: > First I tried running the query: select * from table1 where id = 'value'; > It was very fast, as expected since Hbase replied the results very fast. > In this case, I observed no map/reduce task getting spawned. > > Now, for the query, select * from table1 where id > 'zzz', I expected the > filter push down to happen (at least the 0.14 code says). And since, there > were no results found, so Hbase will again reply very fast and thus hive > should output the query's result very fast. But, this is not happening, and > from the logs of datanode, it looks like a lot of reads are happening > (close to full table scan of 10GBs of data). I expected the response time > to be very close to the above query's time. > > I will check about the number of task getting launched. > > My questions are: > * Why there was no any filter pushdown (id > 'zzz') happening for this > very simple query. > * Since this query can only be resolved from HBase, will Hive launch map > tasks (last time, I guess I observed no map task getting launched) > > -- > Abhishek > > On Sat, Jan 10, 2015 at 4:14 AM, Ashutosh Chauhan <[email protected]> > wrote: > >> Hi Abhishek, >> >> How are you determining its resulting in full table scan? One way to >> ascertain that filter got pushed down is to see how many tasks were >> launched for your query, with and without filter. One would expect lower # >> of splits (and thus tasks) for query having filter. >> >> Thanks, >> Ashutosh >> >> On Sun, Dec 28, 2014 at 8:38 PM, Abhishek kumar <[email protected] >> > wrote: >> >>> Hi, >>> >>> I am using hive 0.14 which runs over hbase (having ~10 GB of data). I am >>> facing issues in terms of slowness when querying over Hbase. My query looks >>> like following: >>> >>> select * from table1 where id > 'zzzz'; (id is the row-key) >>> >>> As per the hive-code, id > 'zzz', is getting pushed to Hbase scanner as >>> 'startKey'. Now given there are no such rows-keys (id) which satisfies this >>> criteria, this query should be extremely fast. But hive is taking a lot of >>> time, looks like full hbase table scan. >>> Can someone let me know where am I wrong in understanding the whole >>> thing? >>> >>> -- >>> Abhishek >>> >> >> >
