Next gen metastore

Edward Capriolo Sat, 02 Apr 2022 14:34:07 -0700

While not active in the development community as much I have been using
hive in the field as well as spark and impala for some time.


My ancmecdotal opinion is that the current metastore needs a significant re
write to deal with "next generation" workloads. By next generation I
actually mean last generation.

Currently cloudera's impala advice is . No more then 1k rows in table. And
tables with lots of partitions are problematic.

Thus really "wont get it done" at the "new" web scale. Hive server can have
memory problems with tables with 2k columns and 5k partitions.

It feels like design ideas like "surely we can fetch all the columns of a
table in one go' dont make sense universally.

Amazon has glue which can scale to amazon scale. Hive metastore cant even
really scale to q single organization. So what are the next steps,  I dont
think its simple as "move it to nosql" I think it has to be reworked from
ground up.


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Next gen metastore

Reply via email to