Anurag Khandelwal created ARROW-4294:
----------------------------------------

             Summary: [Plasma] Add support for evicting objects to external 
store
                 Key: ARROW-4294
                 URL: https://issues.apache.org/jira/browse/ARROW-4294
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
    Affects Versions: 0.11.1
            Reporter: Anurag Khandelwal
             Fix For: 0.13.0


Currently, when Plasma needs storage space for additional objects, it evicts 
objects by deleting them from the Plasma store. This is a problem when it isn't 
possible to reconstruct the object or reconstructing it is expensive. Adding 
support for a pluggable external store that Plasma can evict objects to will 
address this issue. 

My proposal is described below.

*Requirements*
 * Objects in Plasma should be evicted to a external store rather than being 
removed altogether
 * Communication to the external storage service should be through a very thin, 
shim interface. At the same time, the interface should be general enough to 
support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
 * Should be pluggable (e.g., it should be simple to add in or remove the 
external storage service for eviction, switch between different remote 
services, etc.) and easy to implement

*Assumptions/Non-Requirements*
 * The external store has practically infinite storage
 * The external store's write operation is idempotent and atomic; this is 
needed ensure there are no race conditions due to multiple concurrent evictions 
of the same object.

*Proposed Implementation*
 * Define a ExternalStore interface with a Connect call. The call returns an 
ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
needs to be supported has to have this interface implemented.
 * In order to read or write data to the external store in a thread-safe 
manner, one ExternalStoreHandle should be created per-thread. While the 
ExternalStoreHandle itself is not required to be thread-safe, multiple 
ExternalStoreHandles across multiple threads should be able to modify the 
external store in a thread-safe manner.
 * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
method. If an external store is specified for the Plasma store, the 
EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
object data to the external store (via the ExternalStoreHandle) and reclaim the 
memory associated with the object data/metadata rather than remove the entry 
from the Object Table altogether. In case there is no valid external store, the 
eviction path would remain the same (i.e., the object entry is still deleted 
from the Object Table).
 * The Get method in Plasma Store now tries to fetch the object from external 
store if it is not found locally and there is an external store associated with 
the Plasma Store. The method tries to offload this to an external worker thread 
pool with a fire-and-forget model, but may need to do this synchronously if 
there are too many requests already enqueued.
 * *The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, which 
can be appended to with implementations of the ExternalStore and 
ExternalStoreHandle interfaces, which will then be compiled into the 
plasma_store_server executable.*

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to