[ 
https://issues.apache.org/jira/browse/HIVE-24736?focusedWorklogId=549520&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-549520
 ]

ASF GitHub Bot logged work on HIVE-24736:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Feb/21 11:32
            Start Date: 08/Feb/21 11:32
    Worklog Time Spent: 10m 
      Work Description: szlta commented on a change in pull request #1951:
URL: https://github.com/apache/hive/pull/1951#discussion_r571975616



##########
File path: 
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapIoImpl.java
##########
@@ -229,11 +230,20 @@ public void debugDumpShort(StringBuilder sb) {
         new LinkedBlockingQueue<Runnable>(),
         new 
ThreadFactoryBuilder().setNameFormat("IO-Elevator-Thread-%d").setDaemon(true).build());
     FixedSizedObjectPool<IoTrace> tracePool = IoTrace.createTracePool(conf);
+    if (isEncodeEnabled) {
+      int encodeThreads = numThreads * 2;

Review comment:
       Yeah quite arbitrary I'll give you that.
   Text reading is actually started by the "regular" IO threads, and if 
encoding is needed for cache insertion, than one of these threads can produce 
more async encode tasks. This may end up being bursty, and according to what 
I've seen a bigger thread pool could come handy.
   Anyway I made this configurable now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 549520)
    Time Spent: 1h 40m  (was: 1.5h)

> Make buffer tracking in LLAP cache with BP wrapper more accurate
> ----------------------------------------------------------------
>
>                 Key: HIVE-24736
>                 URL: https://issues.apache.org/jira/browse/HIVE-24736
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HIVE-22492 has introduced threadlocal buffers in which LlapCachableBuffer 
> instances are stored before entering LRFU's heap - so that lock contention is 
> eased up.
> This is a nice performance improvement, but comes at the cost of losing the 
> exact accounting of llap buffer instances - e.g. if user gives a purge 
> command, not all the cache space is free'd up as one'd expect because purge 
> only considers buffers that the policy knows about. In this case we'd see in 
> LLAP's iomem servlet that the LRFU policy is empty, but a table may still 
> have the full content loaded.
> Also, if we use text based tables, during cache load, a set of -OrcEncode 
> threads are used that are ephemeral in nature. Attaching buffers to these 
> threads' thread local structures are ultimately lost. In an edge case we 
> could load lots of data into the cache by reading in many distinct smaller 
> text tables, whose buffers never reach LRFU policy, and hence cache hit ratio 
> will be suffering as a consequence (memory manager will give up asking LRFU 
> to evict, and will free up random buffers).
> I propose we try and track the amount of data stored in the BP wrapper 
> threadlocals, and flush them into the heap as a first step of a purge 
> request. This will enhance supportability.
> We should also replace the ephemeral OrcEncode threads with a thread pool, 
> that could actually serve as small performance improvement on its own by 
> saving time and memory to deal with thread lifecycle management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to