Well see also comment that it is NOT advisable to use jdbc for these data
transfers but to consider the alternatives mention below. The alternatives are
more reliable and you will save yourself a lot of troubles.
I also doubt that beeline is suitable for this volumes in general. So yes it
could
Could you increase this number (Mebbe three times the current value) and
see if it has any impact on throughput:
--hiveconf mapreduce.input.fileinputformat.split.maxsize=33554432 \
Hey
Namaskara~Nalama~Guten Tag~Bonjour
--
Keigu
Deepak
73500 12833
www.simtree.net, dee...@simtree.net
deic..
this is a classic issue. are there other users using the same network to
connect to Hive.
Can your unix admin use a network sniffer to determine the issue with your
case?
in normal operations with modest amount of data do you see the same issue
or this is purely due to your load (the number of ro
> Am 21.06.2016 um 08:59 schrieb Mich Talebzadeh :
>
> is the underlying table partitioned i.e.
>
> 'SELECT FROM `db`.`table` WHERE (year=2016 AND month=6 AND
> day=1 AND hour=10)‘
Yes, it is, year, month, day and hour are partition columns.
>
> and also what is the RS size it is expected.
I
> Am 20.06.2016 um 20:20 schrieb Gopal Vijayaraghavan :
>
>
>> is hosting the HiveServer2 is merely sending data with around 3 MB/sec.
>> Our network is capable of much more. Playing around with `fetchSize` did
>> not increase throughput.
> ...
>> --hiveconf
>> mapred.output.compression.codec=
is the underlying table partitioned i.e.
'SELECT FROM `db`.`table` WHERE (year=2016 AND month=6
AND day=1 AND hour=10)'
and also what is the RS size it is expected.
JDBC on its own should work. Is this an ORC table?
What version of Hive are you using?
HTH
Dr Mich Talebzadeh
LinkedIn *
In my test case below, I’m using `beeline` as the Java application receiving
the JDBC stream. As I understand, this is the reference command line interface
to Hive. Are you saying that the reference command line interface is not
efficiently implemented? :)
-David Nies
> Am 20.06.2016 um 17:46
> is hosting the HiveServer2 is merely sending data with around 3 MB/sec.
>Our network is capable of much more. Playing around with `fetchSize` did
>not increase throughput.
...
> --hiveconf
>mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
>\
The current implementation
Aside from this the low network performance could also stem from the Java
application receiving the JDBC stream (not threaded / not efficiently
implemented etc). However that being said, do not use jdbc for this.
> On 20 Jun 2016, at 17:28, Jörn Franke wrote:
>
> Hallo,
>
> For no databases (
Hi David,
What are you actually trying to do with the data.
Hive and map-reduce are notoriously slow for this type of operations. Hive
is good for storage that is what I vouch for.
There are other alternatives.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEA
Hallo,
For no databases (including traditional ones) it is advisable to fetch this
amount through jdbc. Jdbc is not designed for this (neither for import nor for
export of large data volumes). It is a highly questionable approach from a
reliability point of view.
Export it as file to HDFS and
11 matches
Mail list logo