Re: Hive query over JDBC not honoring fetch size

Emil Berglind Wed, 19 Aug 2015 11:10:42 -0700

Also, I tried setting the "hive.fetch.task.conversion" property in the JDBC
URL, like so: jdbc:hive2://
192.168.132.128:10000/default?hive.fetch.task.conversion=none, but it is
still creating mapreduce tasks for the query, so it effectively seems to be
ignoring that property.


On Wed, Aug 19, 2015 at 12:20 PM, Emil Berglind <papasw...@gmail.com> wrote:

> When I run the "SELECT * FROM <tablename>" query it is running it as a
> mapreduce job. I can see it in the Yarn Manager and also in the Tez UI.
> This is also when the fetch size is not honored and it tries to basically
> return all results at once. Is there a way to make this work?
>
> On Wed, Aug 19, 2015 at 10:53 AM, Prem Yadav <ipremya...@gmail.com> wrote:
>
>> actually it should be something like 
>> getHandleIdentifier()=hfhkjhfjhkjfh-dsdsad-sdsd--dsada:
>> fetchResults()
>>
>> On Wed, Aug 19, 2015 at 3:49 PM, Prem Yadav <ipremya...@gmail.com> wrote:
>>
>>> Hi Emil,
>>> for either of the queries, there will be no mapreduce job. the query
>>> engine understands that in both case, it need not do any computation and
>>> just needs to fetch all the data from the files.
>>>
>>> The fetch size should be honored in both cases. Hope you are using
>>> hiveserver2.
>>> You can try connections using excel and cloudera's odbc driver with the
>>> required parameters for your testing. For each batch that hive returns, you
>>> should be able to see in hive lg something like: returning results for id
>>> <hash>
>>>
>>> On Wed, Aug 19, 2015 at 2:54 PM, Emil Berglind <papasw...@gmail.com>
>>> wrote:
>>>
>>>> I have a small Java app that I wrote that uses JDBC to run a hive
>>>> query. The Hive table that I'm running it against has 30+ million rows, and
>>>> I want to pull them all back to verify the data. If I run a simple "SELECT
>>>> * FROM <table>" and set a fetch size of 30,000 then the fetch size is not
>>>> honored and it seems to want to bring back all 30+ million rows at once,
>>>> which is definitely not going to work. If I set a LIMIT on the SQL, like
>>>> "SELECT * FROM <table> LIMIT 9999999", then it honors the fetch size just
>>>> fine. However, when I set the LIMIT on there, it does not run as a map
>>>> reduce job but rather seems to stream the data back. Is this how it's
>>>> supposed to work? I'm new to the Hadoop eco-system and I'm really just
>>>> trying to figure out what the best way to bring this data back in chunks
>>>> is. Maybe I'm going about this all wrong?
>>>>
>>>
>>>
>>
>

Re: Hive query over JDBC not honoring fetch size

Reply via email to