[
https://issues.apache.org/jira/browse/HIVE-26419?focusedWorklogId=794341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794341
]
ASF GitHub Bot logged work on HIVE-26419:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Jul/22 18:33
Start Date: 22/Jul/22 18:33
Worklog Time Spent: 10m
Work Description: hsnusonic commented on code in PR #3466:
URL: https://github.com/apache/hive/pull/3466#discussion_r927901748
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/PersistenceManagerProvider.java:
##########
@@ -253,8 +253,12 @@ private static PersistenceManagerFactory
initPMF(Configuration conf, boolean for
} else {
try {
DataSource ds = (maxPoolSize > 0) ? dsp.create(conf, maxPoolSize) :
dsp.create(conf);
+ // The secondary connection factory is used for schema generation, and
for value generation operations.
+ // We should use a different pool for the secondary connection factory
to avoid resource starvation.
+ // Since DataNucleus uses locks for schema generation and value
generation, 2 connections should be sufficient.
+ DataSource ds2 = dsp.create(conf, /* maxPoolSize */ 2);
Review Comment:
Hi @deniskuzZ,
The issue is most easily to be observed with lots of add_partitions
requests. When there is a write operation, DataNucleus will need to use
ValueGenerator. Image there are 10 add_partitions requests, all have taken 1
connection from the pool. Now they are moving to value generation stage, there
is a monitor lock in DataNucleus so only one will proceed and others are
blocked. Even only one thread is trying to get connection from the pool, no
connection is available (assuming we have 10 connections in the pool). This is
the problem we use same pool for primary and secondary connection factory. Does
it make sense to you?
Issue Time Tracking
-------------------
Worklog Id: (was: 794341)
Time Spent: 40m (was: 0.5h)
> Use a different pool for DataNucleus' secondary connection factory
> ------------------------------------------------------------------
>
> Key: HIVE-26419
> URL: https://issues.apache.org/jira/browse/HIVE-26419
> Project: Hive
> Issue Type: Bug
> Components: Standalone Metastore
> Reporter: Yu-Wen Lai
> Assignee: Yu-Wen Lai
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Quote from DataNucleus documentation:
> {quote}The secondary connection factory is used for schema generation, and
> for value generation operations (unless specified to use primary).
> {quote}
> We should not use same connection pool for DataNucleus' primary and secondary
> connection factory. An awful situation is that each thread holds one
> connection and request for another connection for value generation, but no
> connection is available in the pool. It will keep retrying and fail at the
> end.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)