[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Gus Heck (JIRA) Fri, 26 Feb 2016 07:29:33 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169187#comment-15169187
 ]


Gus Heck commented on SOLR-8349:
--------------------------------

Perhaps you were looking for a second hash map? We don't need one if we have 
more descriptive keys. I should have explained my latest patch. I was hasty.

Implementations of the Decoder interface can now (optionally) give their 
decoders names. 

{code}
 public interface Decoder<T> {

    /**
     * A name by which to distinguish this decoding. This only needs to be 
implemented if you want to support
     * decoding the same blob content with more than one decoder.
     * 
     * @return The name of the decoding, defaults to empty string.
     */
    default String getName() { return ""; }

    /**
     * A routine that knows how to convert the stream of bytes from the blob 
into a Java object.
     * 
     * @param inputStream the bytes from a blob
     * @return A Java object of the specified type.
     */
    T decode(InputStream inputStream);
  }

{code}

The internal hashmap that holds blob content objects (be they decoded or not) 
will append this name to the key for put and get operations in getIncrementRef 
and store the appropriate 
value in the key field of BlobContent

{code}
  BlobContentRef<Object> getBlobIncRef(String key, Decoder<Object> decoder) {
    return getBlobIncRef(key.concat(decoder.getName()), () -> 
addBlob(key,decoder));
  }

in  private <T> BlobContentRef<T> getBlobIncRef(String key, 
Callable<BlobContent<T>> blobCreator) ...

        aBlob = blobs.get(key);
        if (aBlob == null) {
          try {
            aBlob = blobCreator.call();

{code}

Note that the unmodified key was supplied to the lambda invoking addBlob...

{code}
  // for use cases sharing java objects
  private BlobContent<Object> addBlob(String key, Decoder<Object> decoder) {  
    ByteBuffer b = fetchBlob(key);                    
    String  keyPlusName = key + decoder.getName();
    BlobContent<Object> aBlob = new BlobContent<>(keyPlusName, b, decoder);
    blobs.put(keyPlusName, aBlob);
    return aBlob;
  }
{code}

Thus the BlobContent object is holding the more descriptive key when we get to 
decrementBlobRefCount and do this:
{code}
      if (ref.blob.references.isEmpty()) {
        blobs.remove(ref.blob.key);
      }
{code}

Also note that the differing method signatures distinguish when we don't have a 
decoder because we are interested in caching the raw bytes, not the decoded 
form and a different path is taken... In this case the key in the map matches 
the key in the blob store ensuring that we can't cache the raw bytes twice.

{code}
  public BlobContentRef<ByteBuffer> getBlobIncRef(String key) {
   return getBlobIncRef(key, () -> addBlob(key));
  }

  // For use cases sharing raw bytes
  private BlobContent<ByteBuffer> addBlob(String key) {
    ByteBuffer b = fetchBlob(key);
    BlobContent<ByteBuffer> aBlob  = new BlobContent<>(key, b);
    blobs.put(key, aBlob);
    return aBlob;
  }
{code}

Looking at it again today, It occurs to me that there is a small chance that if 
someone wants to share the raw bytes and also share a decoded form of the same 
blob and they fail to name their Decoder we would have a race condition. This 
should be avoided by defaulting to "!D!" (or some other "reserved" string) 
instead of empty string in the above Decoder interface (and document this in 
the javadoc). It is intentionally left up to the implementor to decide whether 
or not they provide names that allow/prevent collisions.


> Allow sharing of large in memory data structures across cores
> -------------------------------------------------------------
>
>                 Key: SOLR-8349
>                 URL: https://issues.apache.org/jira/browse/SOLR-8349
>             Project: Solr
>          Issue Type: Improvement
>          Components: Server
>    Affects Versions: 5.3
>            Reporter: Gus Heck
>            Assignee: Noble Paul
>         Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

Reply via email to