[ 
https://issues.apache.org/jira/browse/HIVE-21071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HIVE-21071:
-------------------------------
    Status: Patch Available  (was: Open)

I think the last test failure is flaky. Submitting patch again.

Checkstyle errors should be ignored.  They are complaining about test code and 
some things that are outside the scope of this path (like complaining about 
method signature lengths).

> Improve getInputSummary
> -----------------------
>
>                 Key: HIVE-21071
>                 URL: https://issues.apache.org/jira/browse/HIVE-21071
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>         Attachments: HIVE-21071.1.patch, HIVE-21071.10.patch, 
> HIVE-21071.2.patch, HIVE-21071.3.patch, HIVE-21071.4.patch, 
> HIVE-21071.5.patch, HIVE-21071.6.patch, HIVE-21071.7.patch, 
> HIVE-21071.8.patch, HIVE-21071.9.patch
>
>
> There is a global lock in the {{getInptSummary}} code, so it is important 
> that it be fast.  The current implementation has quite a bit of overhead that 
> can be re-engineered.
> For example, the current implementation keeps a map of File Path to 
> ContentSummary object.  This map is populated by several threads 
> concurrently. The method then loops through the map, in a single thread, at 
> the end to add up all of the ContentSummary objects and ignores the paths.  
> The code can be be re-engineered to not use a map, or a collection at all, to 
> store the results and instead just keep a running tally.  By keeping a tally, 
> there is no {{O\(n)}} operation at the end to perform the addition.
> There are other things can be improved.  The method returns an object which 
> is never used anywhere, so change method to void return type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to