[ 
https://issues.apache.org/jira/browse/HIVE-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883338#comment-15883338
 ] 

Vihang Karajgaonkar commented on HIVE-15879:
--------------------------------------------

I agree that the patch does not improve the case of have 1 level of partition. 
It performs similar to existing approach. Did a simple test with single 
partitioned key table with ~1800 partitions on S3. Both the implementations 
take about the same time ~60 sec. But we quickly start seeing the benefits of 
this approach as soon as the number of partition keys increase.

Repeated the test above with a 2 partition keys with 10*10 = 100 partitions. 
Results shown below show significant performance gain with the default configs.

|| Default pool size ||  Before || After ||
|| Time taken (sec) | 19.8 | 3.27 |

Hi [~rajesh.balamohan] I can change the JIRA description and category to 
"Improvement" if you think that is more appropriate. Thanks!

Also updating the review board with patch HIVE-15879.03.patch


> Fix HiveMetaStoreChecker.checkPartitionDirs method
> --------------------------------------------------
>
>                 Key: HIVE-15879
>                 URL: https://issues.apache.org/jira/browse/HIVE-15879
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>         Attachments: HIVE-15879.01.patch, HIVE-15879.02.patch, 
> HIVE-15879.03.patch
>
>
> HIVE-15803 fixes the msck hang issue in 
> HiveMetaStoreChecker.checkPartitionDirs method by adding a check to see if 
> the Threadpool has any spare threads. If not it uses single threaded listing 
> of the files.
> {noformat}
>     if (pool != null) {
>       synchronized (pool) {
>         // In case of recursive calls, it is possible to deadlock with TP. 
> Check TP usage here.
>         if (pool.getActiveCount() < pool.getMaximumPoolSize()) {
>           useThreadPool = true;
>         }
>         if (!useThreadPool) {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("Not using threadPool as active count:" + 
> pool.getActiveCount()
>                 + ", max:" + pool.getMaximumPoolSize());
>           }
>         }
>       }
>     }
> {noformat}
> Based on the java doc of getActiveCount() below 
> bq. Returns the approximate number of threads that are actively executing 
> tasks.
> it returns only approximate number of threads and it cannot be guaranteed 
> that it always returns the exact number of active threads. This still exposes 
> the method implementation to the msck hang bug in rare corner cases.
> We could either:
> 1. Use a atomic counter to track exactly how many threads are actively running
> 2. Relook at the method itself to make it much simpler. Like eg, look into 
> the possibility of changing the recursive implementation to an iterative 
> implementation where worker threads pick tasks from a queue until the queue 
> is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to