Also, I tried setting the "hive.fetch.task.conversion" property in the JDBC URL, like so: jdbc:hive2:// 192.168.132.128:10000/default?hive.fetch.task.conversion=none, but it is still creating mapreduce tasks for the query, so it effectively seems to be ignoring that property.
On Wed, Aug 19, 2015 at 12:20 PM, Emil Berglind <papasw...@gmail.com> wrote: > When I run the "SELECT * FROM <tablename>" query it is running it as a > mapreduce job. I can see it in the Yarn Manager and also in the Tez UI. > This is also when the fetch size is not honored and it tries to basically > return all results at once. Is there a way to make this work? > > On Wed, Aug 19, 2015 at 10:53 AM, Prem Yadav <ipremya...@gmail.com> wrote: > >> actually it should be something like >> getHandleIdentifier()=hfhkjhfjhkjfh-dsdsad-sdsd--dsada: >> fetchResults() >> >> On Wed, Aug 19, 2015 at 3:49 PM, Prem Yadav <ipremya...@gmail.com> wrote: >> >>> Hi Emil, >>> for either of the queries, there will be no mapreduce job. the query >>> engine understands that in both case, it need not do any computation and >>> just needs to fetch all the data from the files. >>> >>> The fetch size should be honored in both cases. Hope you are using >>> hiveserver2. >>> You can try connections using excel and cloudera's odbc driver with the >>> required parameters for your testing. For each batch that hive returns, you >>> should be able to see in hive lg something like: returning results for id >>> <hash> >>> >>> On Wed, Aug 19, 2015 at 2:54 PM, Emil Berglind <papasw...@gmail.com> >>> wrote: >>> >>>> I have a small Java app that I wrote that uses JDBC to run a hive >>>> query. The Hive table that I'm running it against has 30+ million rows, and >>>> I want to pull them all back to verify the data. If I run a simple "SELECT >>>> * FROM <table>" and set a fetch size of 30,000 then the fetch size is not >>>> honored and it seems to want to bring back all 30+ million rows at once, >>>> which is definitely not going to work. If I set a LIMIT on the SQL, like >>>> "SELECT * FROM <table> LIMIT 9999999", then it honors the fetch size just >>>> fine. However, when I set the LIMIT on there, it does not run as a map >>>> reduce job but rather seems to stream the data back. Is this how it's >>>> supposed to work? I'm new to the Hadoop eco-system and I'm really just >>>> trying to figure out what the best way to bring this data back in chunks >>>> is. Maybe I'm going about this all wrong? >>>> >>> >>> >> >