[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Gus Heck (JIRA) Mon, 15 Feb 2016 14:06:02 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147835#comment-15147835
 ]


Gus Heck commented on SOLR-8349:
--------------------------------

Guava sounded like a great idea. I love guava caches and have used them 
frequently in the past. When you mentioned it I thought to myself why didn't I 
think of that... eventually I remembered that I had, but decided not to go 
there because I didn't want to give control of memory issues to a library. In 
reflection that may have been a bit too draconian, so I gave guava caches a 
whirl. For the first time in my experience, I think I've found a use case they 
don't cover well. My design for this feature desires the following behavior:

# get should return null if the resource is requested when no attempt has been 
made to load it.
# load a resource only once, no provision for update or replacement is 
presently required, so first one in wins is just fine. 
# subsequent attempts to load the resource are a non-blocking no-op, allowing 
cores 2 through N that require the resource to continue to configure themselves 
while resource is being loaded (possibly starting the loading of resources for 
other components).
# loading will be complete before the server completes startup and begins 
servicing requests. If a resource was supposed to be loaded and there was no 
error during loading, the component can rely on the existence of the resource 
at query (or update) time.
# never allow a query or update to solr to initiate the loading for obvious 
latency reasons.  

I realized that the unit test I supplied doesn't fully test #3 above so I 
modified it like this:
{code}
+    final String[] lastThread = new String[1];
     // now add something to the map while it's under load
     try {
       Thread[] set1 = new Thread[100];

Then

           calls.incrementAndGet();
-        });
+          lastThread[0] = Thread.currentThread().getName();
+        }, "thread"+i); // give the threads names like thread0 thread1 thread2 
etc
       }
       for (Thread thread : set1) {
         thread.start();
+        Thread.sleep(10);
       }
       while (calls.get() < 100) {
         Thread.sleep(100);
       }

Then

       Thread.sleep(100);
       long count1b = counter[0];
       long count2b = counter2[0];
+      
+      // make sure other loaders didn't wait and thread0 completed last
+      assertEquals("thread0", lastThread[0]);
{code}

This modified test still passes just fine with my original code. But so far I 
haven't made it pass with guava. The naive first attempt was to use 
get(key,loader) and ignore the return value and use getIfPresent(key) to 
service get requests (handling case #1) but this was not good:

# get(key,loader) ignoring the return value will block all cores that depend on 
this resource. The test fails with loader99 updating the array last.
# wraping the get(key,loader) in a thread fails because now they all complete 
before loading completes and the test gets null when it checks the assert that 
that the value was set. (it would eventually become set, but this is just like 
completing server startup before loading is complete, and then failing queries, 
probably with an NPE). Also, failing server startup when loading fails becomes 
problematic since we have to get the exception back out of the thread.

After those attempts I found myself getting into tracking what keys have 
already had loaders supplied and trying to coordinate blocking/pass through on 
my own, and of course the access to the collection recording what I've seen has 
to be synchronized... etc. This winds up negating most of the benefit of the 
guava cache. This is too bad because at first it looked like guava would 
replace a bunch of my code with one liners. My guess is that the guava folks 
didn't supply put(k,loader) because most of the time you aren't accepting 
arbitrary loaders in the first place. If the loader is known in advance, then 
LoadingCache works nicely. Unfortunately, for us, the loader is not known in 
advance. I don't see any reasonable way to write "one loader to rule them all"  
to allow us to use a LoadingCache either, this would just move all my tracking 
code into TheOneLoader.java :). Possibly a good reorganization code wise but we 
are still writing threaded code and not getting much simplification out of 
using guava caches that way. If we are going to write synchronized code, we 
should just make it do the right thing directly I think.

Finally, weakValues() (or some equivalent) sounds good, but the problem is that 
the component supplies a loader and the container loads it at time A, and then 
at time B the component asks for the result of the load. Even if the component 
caches the results of that first request, the time between A and B leaves the 
object just loaded vulnerable to GC, since the only thing referencing it is the 
weak values cache. One could require the loader to set a reference, but that 
just presumes that that particular core won't be unloaded before some other 
core finally receives a request that causes it to fetch and cache the resource 
(remember only one loader ever runs, so only one component would get the hard 
reference initially). It seems that softValues() could be used, but without a 
hard reference too we are still potentially in trouble. 

# All cores with hard references are unloaded, core X remains which has yet to 
acquire a hard reference.
# when memory gets low, the soft resource is GC'd 
# request arrives to core X requiring resource.
# either goal #4 or #5 must be violated now. 

It's clearly the case that my design goals make things difficult. If I drop 
goal #3 (not blocking other cores), guava will drop right in nicely by using 
get(key,loader) and ignoring the return value. Which do you feel is more 
important, using guava and keeping the code simple or preserving the startup 
efficiency for the server when this feature is in use? I found this comment in 
CoreContainer which clearly indicates that the intent is to load in parallel, 
but also it seems that there's some degree of waiting going on already in cloud 
mode:
{code}
    // setup executor to load cores in parallel
    // do not limit the size of the executor in zk mode since cores may try and 
wait for each other.
    ExecutorService coreLoadExecutor = ExecutorUtil.newMDCAwareFixedThreadPool(
{code}

> Allow sharing of large in memory data structures across cores
> -------------------------------------------------------------
>
>                 Key: SOLR-8349
>                 URL: https://issues.apache.org/jira/browse/SOLR-8349
>             Project: Solr
>          Issue Type: Improvement
>          Components: Server
>    Affects Versions: 5.3
>            Reporter: Gus Heck
>         Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Reply via email to