> rogue queries

so this really isn't limited to just hive is it?  any dbms system perhaps
has to contend with this.  even malicious rogue queries as a matter of fact.

timeouts are cheap way systems handle this - assuming time is related to
resource. i'm sure beeline or whatever client you use has a timeout feature.

maybe one could write a separate service - say a governor - that watches
over YARN (or hdfs or whatever resource is rare) - and terminates the
process if it goes beyond a threshold.  think OOM killer.

but, yeah, i admittedly don't know of something out there already you can
just tap into but YARN's Resource Manager seems to be place i'd research
for starters. Just look look at its name. :)

my unsolicited 2 cents.



On Wed, Aug 31, 2016 at 10:24 PM, ravi teja <raviort...@gmail.com> wrote:

> Thanks Mich,
>
> Unfortunately we have many insert queries.
> Are there any other ways?
>
> Thanks,
> Ravi
>
> On Wed, Aug 31, 2016 at 9:45 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Trt this
>>
>> hive.limit.optimize.fetch.max
>>
>>    - Default Value: 50000
>>    - Added In: Hive 0.8.0
>>
>> Maximum number of rows allowed for a smaller subset of data for simple
>> LIMIT, if it is a fetch query. Insert queries are not restricted by this
>> limit.
>>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 31 August 2016 at 13:42, ravi teja <raviort...@gmail.com> wrote:
>>
>>> Hi Community,
>>>
>>> Many users run adhoc hive queries on our platform.
>>> Some rogue queries managed to fill up the hdfs space and causing
>>> mainstream queries to fail.
>>>
>>> We wanted to limit the data generated by these adhoc queries.
>>> We are aware of strict param which limits the data being scanned, but it
>>> is of less help as huge number of user tables aren't partitioned.
>>>
>>> Is there a way we can limit the data generated from hive per query, like
>>> a hve parameter for setting HDFS quotas for job level *scratch*
>>> directory or any other approach?
>>> What's the general approach to gaurdrail such multi-tenant cases.
>>>
>>> Thanks in advance,
>>> Ravi
>>>
>>
>>
>

Reply via email to