This can be a follow-up to HIVE-2925. Navis, if you want, I can work on it.
On 7/29/12 7:58 PM, "Namit Jain" <nj...@fb.com> wrote: >I like Navis's idea. The timeout can be configurable. > > >On 7/29/12 6:47 AM, "Navis류승우" <navis....@nexr.com> wrote: > >>I was thinking of timeout for fetching, 2000msec for example. How about >>that? >> >>2012년 7월 29일 일요일에 Edward Capriolo<edlinuxg...@gmail.com>님이 작성: >>> If where condition is too complex , selecting specific columns seems >>simple >>> enough and useful. >>> >>> On Saturday, July 28, 2012, Namit Jain <nj...@fb.com> wrote: >>>> Currently, hive does not launch map-reduce jobs for the following >>queries: >>>> >>>> select * from <T> where <condition on partition columns> (limit <n>)? >>>> >>>> This behavior is not configurable, and cannot be altered. >>>> >>>> HIVE-2925 wants to extend this behavior. The goal is not to spawn >>> map-reduce jobs for the following queries: >>>> >>>> Select <expr> from <T> where <any condition> (limit <n>)? >>>> >>>> It is currently controlled by one parameter: >>> hive.aggressive.fetch.task.conversion, based on which it is decided, >>> whether to spawn >>>> map-reduce jobs or not for the queries of the above type. Note that >>>>this >>> can be beneficial for certain types of queries, since it is >>>> avoiding the expensive step of spawning map-reduce. However, it can be >>> pretty expensive for certain types of queries: selecting >>>> a very large number of rows, the query having a very selective filter >>> (which is satisfied by a very number of rows, and therefore involves >>>> scanning a very large table) etc. The user does not have any control >>>>on >>> this. Note that it cannot be done by hooks, since the pre-semantic >>>> hooks does not have enough information: type of the query, inputs etc. >>> and it is too late to do anything in the post-semantic hook (the >>>> query plan has already been altered). >>>> >>>> I would like to propose the following configuration parameters to >>>>control >>> this behavior. >>>> hive.fetch.task.conversion: true, false, auto >>>> >>>> If the value is true, then all queries with only selects and filters >>>>will >>> be converted >>>> If the value is false, then no query will be converted >>>> If the value is auto (which should be the default behavior), there >>>>should >>> be additional parameters to control the semantics. >>>> >>>> hive.fetch.task.auto.limit.threshold ---> integer value >>>>X1 >>>> hive.fetch.task.auto.inputsize.threshold ---> integer value X2 >>>> >>>> If either the query has a limit lower than X1, or the input size is >>> smaller than X2, the queries containing only filters and selects will >>>be >>> converted to not use >>>> map-reudce jobs. >>>> >>>> >>>> Comments… >>>> >>>> -namit >>>> >>>> >>>> >>> >