> By the way, if you want near-real-time tables with Hive, maybe you should 
> have a look at this project from Uber: https://uber.github.io/hudi/
> I don't know how mature it is yet, but I think it aims at solving that kind 
> of challenge.

Depending on your hive setup, you don't need a different backend to do 
near-real-time tables.

https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

Prasanth has a benchmark for Hive 3.x, which is limited by HDFS bandwidth at 
the moment with 64 threads.

https://github.com/prasanthj/culvert

$ ./culvert -u thrift://localhost:9183 -db testing -table culvert -p 64 -n 
100000
Total rows committed: 92100000
Throughput: 1535000 rows/second

Cheers,
Gopal


Reply via email to