Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are trying perform streaming ingestion with it. We are using the Storm Hive bolt and we have 7 tables in which we are trying to insert. The RPS (requests per second) of our bolts ranges from 7000 to 5000 and our commit policies are configured accordingly i.e 100k events or 15 seconds.
We see that there are many commitTxn exceptions due to serialization errors in the metastore (we are using PostgreSQL 9.5 as metastore) The serialization errors will cause the topology to start lagging in terms of events processed as it will try to reprocess the batches that have failed. I have already backported this HIVE-10500 <https://issues.apache.org/jira/browse/HIVE-10500> to 0.14 and there isn't much improvement. I went through most of the JIRA's about transaction and I found the following HIVE-11948 <https://issues.apache.org/jira/browse/HIVE-11948>, HIVE-13013 <https://issues.apache.org/jira/browse/HIVE-13013>. I would like to backport them to 0.14. Going through the patches gives me an impression that I need to mostly update the queries and transaction levels. Do these patches also require me to update the schema in the metastore? Please also let me know if there are any other patches that I missed. I would also like to know whether Apache Hive can handle inserts to the same/different tables concurrently from multiple clients in 1.2.1 or later versions without many serialization errors in Hive metastore? -Joel