Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are
trying perform streaming ingestion with it.
We are using the Storm Hive bolt and we have 7 tables in which we are
trying to insert. The RPS (requests per second) of our bolts ranges from
7000 to 5000 and our commit policies are configured accordingly i.e 100k
events or 15 seconds.

We see that there are many commitTxn exceptions due to serialization errors
in the metastore (we are using PostgreSQL 9.5 as metastore)
The serialization errors will cause the topology to start lagging in terms
of events processed as it will try to reprocess the batches that have
failed.

I have already backported this HIVE-10500
<https://issues.apache.org/jira/browse/HIVE-10500> to 0.14 and there isn't
much improvement.
I went through most of the JIRA's about transaction and I found the
following HIVE-11948 <https://issues.apache.org/jira/browse/HIVE-11948>,
HIVE-13013 <https://issues.apache.org/jira/browse/HIVE-13013>. I would like
to backport them to 0.14.
Going through the patches gives me an impression that I need to mostly
update the queries and transaction levels.
Do these patches also require me to update the schema in the metastore?
Please also let me know if there are any other patches that I missed.

I would also like to know whether Apache Hive can handle inserts to the
same/different tables concurrently from multiple clients in 1.2.1 or later
versions without many serialization errors in Hive metastore?

-Joel

Reply via email to