> By the way, if you want near-real-time tables with Hive, maybe you should > have a look at this project from Uber: https://uber.github.io/hudi/ > I don't know how mature it is yet, but I think it aims at solving that kind > of challenge.
Depending on your hive setup, you don't need a different backend to do near-real-time tables. https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest Prasanth has a benchmark for Hive 3.x, which is limited by HDFS bandwidth at the moment with 64 threads. https://github.com/prasanthj/culvert $ ./culvert -u thrift://localhost:9183 -db testing -table culvert -p 64 -n 100000 Total rows committed: 92100000 Throughput: 1535000 rows/second Cheers, Gopal