Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread Jörn Franke
Well see also comment that it is NOT advisable to use jdbc for these data transfers but to consider the alternatives mention below. The alternatives are more reliable and you will save yourself a lot of troubles. I also doubt that beeline is suitable for this volumes in general. So yes it could

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread Deepak Goel
Could you increase this number (Mebbe three times the current value) and see if it has any impact on throughput: --hiveconf mapreduce.input.fileinputformat.split.maxsize=33554432 \ Hey Namaskara~Nalama~Guten Tag~Bonjour -- Keigu Deepak 73500 12833 www.simtree.net, dee...@simtree.net deic..

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread Mich Talebzadeh
this is a classic issue. are there other users using the same network to connect to Hive. Can your unix admin use a network sniffer to determine the issue with your case? in normal operations with modest amount of data do you see the same issue or this is purely due to your load (the number of ro

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread David Nies
> Am 21.06.2016 um 08:59 schrieb Mich Talebzadeh : > > is the underlying table partitioned i.e. > > 'SELECT FROM `db`.`table` WHERE (year=2016 AND month=6 AND > day=1 AND hour=10)‘ Yes, it is, year, month, day and hour are partition columns. > > and also what is the RS size it is expected. I

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread David Nies
> Am 20.06.2016 um 20:20 schrieb Gopal Vijayaraghavan : > > >> is hosting the HiveServer2 is merely sending data with around 3 MB/sec. >> Our network is capable of much more. Playing around with `fetchSize` did >> not increase throughput. > ... >> --hiveconf >> mapred.output.compression.codec=

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-21 Thread Mich Talebzadeh
is the underlying table partitioned i.e. 'SELECT FROM `db`.`table` WHERE (year=2016 AND month=6 AND day=1 AND hour=10)' and also what is the RS size it is expected. JDBC on its own should work. Is this an ORC table? What version of Hive are you using? HTH Dr Mich Talebzadeh LinkedIn *

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread David Nies
In my test case below, I’m using `beeline` as the Java application receiving the JDBC stream. As I understand, this is the reference command line interface to Hive. Are you saying that the reference command line interface is not efficiently implemented? :) -David Nies > Am 20.06.2016 um 17:46

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread Gopal Vijayaraghavan
> is hosting the HiveServer2 is merely sending data with around 3 MB/sec. >Our network is capable of much more. Playing around with `fetchSize` did >not increase throughput. ... > --hiveconf >mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec >\ The current implementation

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread Jörn Franke
Aside from this the low network performance could also stem from the Java application receiving the JDBC stream (not threaded / not efficiently implemented etc). However that being said, do not use jdbc for this. > On 20 Jun 2016, at 17:28, Jörn Franke wrote: > > Hallo, > > For no databases (

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread Mich Talebzadeh
Hi David, What are you actually trying to do with the data. Hive and map-reduce are notoriously slow for this type of operations. Hive is good for storage that is what I vouch for. There are other alternatives. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEA

Re: Network throughput from HiveServer2 to JDBC client too low

2016-06-20 Thread Jörn Franke
Hallo, For no databases (including traditional ones) it is advisable to fetch this amount through jdbc. Jdbc is not designed for this (neither for import nor for export of large data volumes). It is a highly questionable approach from a reliability point of view. Export it as file to HDFS and