We to rely heavily on Plasma (we use Ray as well, but also Plasma independent 
of Ray). I’ve started a thread on ray dev list to see if Rays plasma can be 
used standalone outside of ray as well. That would allow us who use Plasma to 
move to a standalone “ray plasma” when/if it’s removed from Arrow.

> On 26 Sep 2020, at 00:30, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> I'd suggest as a preliminary that we stop packaging Plasma for 1-2
> releases to see who is affected by the component's removal. Usage may
> be more widespread than we realize, and we don't have much telemetry
> to know for certain.
> 
> On Tue, Aug 18, 2020 at 1:26 PM Antoine Pitrou <anto...@python.org> wrote:
>> 
>> 
>> Also, the fact that Ray has forked Plasma means their implementation
>> becomes potentially incompatible with Arrow's.  So even if we keep
>> Plasma in our codebase, we can't guarantee interoperability with Ray.
>> 
>> Regards
>> 
>> Antoine.
>> 
>> 
>> Le 18/08/2020 à 19:51, Wes McKinney a écrit :
>>> I do not think there is an urgency to remove Plasma from the Arrow
>>> codebase (as it currently does not cause much maintenance burden), but
>>> the reality is that Ray has already hard-forked and so new maintainers
>>> will need to come out of the woodwork to help support the project if
>>> it is to continue having a life of its own. I started this thread to
>>> create more awareness of the issue so that existing Plasma
>>> stakeholders can make themselves known and possibly volunteer their
>>> time to develop and maintain the codebase.
>>> 
>>> On Tue, Aug 18, 2020 at 12:02 PM Matthias Vallentin
>>> <matth...@vallentin.net> wrote:
>>>> 
>>>> We are very interested in Plasma as a stand-alone project. The fork would
>>>> hit us doubly hard, because it reduces both the appeal of an Arrow-specific
>>>> use case as well as our planned Ray integration.
>>>> 
>>>> We are developing effectively a database for network activity data that
>>>> runs with Arrow as data plane. See https://github.com/tenzir/vast for
>>>> details. One of our upcoming features is supporting a 1:N output channel
>>>> using Plasma, where multiple downstream tools (Python/Pandas, R, Spark) can
>>>> process the same data set that's exactly materialized in memory once. We
>>>> currently don't have the developer bandwidth to prioritize this effort, but
>>>> the concurrent, multi-tool processing capability was one of the main
>>>> strategic reasons to go with Arrow as data plane. If Plasma has no future,
>>>> Arrow has a reduced appeal for us in the medium term.
>>>> 
>>>> We also have Ray as a data consumer on our roadmap, but the dependency
>>>> chain seems now inverted. If we have to do costly custom plumbing for Ray,
>>>> with a custom version of Plasma, the Ray integration will lose quite a bit
>>>> of appeal because it doesn't fit into the existing 1:N model. That is, even
>>>> though the fork may make sense from a Ray-internal point of view, it
>>>> decreases the appeal of Ray from the outside. (Again, only speaking shared
>>>> data plane here.)
>>>> 
>>>> In the future, we're happy to contribute cycles when it comes to keeping
>>>> Plasma as a useful standalone project. We recently made sure that static
>>>> builds work as expected <https://github.com/apache/arrow/pull/7842>. As of
>>>> now, we unfortunately cannot commit to anything specific though, but our
>>>> interest extends to Gandiva, Flight, and lots of other parts of the Arrow
>>>> ecosystem.
>>>> 
>>>> On Tue, Aug 18, 2020 at 4:02 AM Robert Nishihara 
>>>> <robertnishih...@gmail.com>
>>>> wrote:
>>>> 
>>>>> To answer Wes's question, the Plasma inside of Ray is not currently usable
>>>>> 
>>>>> 
>>>>> in a C++ library context, though it wouldn't be impossible to make that
>>>>> 
>>>>> 
>>>>> happen.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I (or someone) could conduct a simple poll via Google Forms on the user
>>>>> 
>>>>> 
>>>>> mailing list to gauge demand if we are concerned about breaking a lot of
>>>>> 
>>>>> 
>>>>> people's workflow.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Aug 17, 2020 at 3:21 AM Antoine Pitrou <anto...@python.org> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>>> Le 15/08/2020 à 17:56, Wes McKinney a écrit :
>>>>> 
>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>>>> What isn't clear is whether the Plasma that's in Ray is usable in a
>>>>> 
>>>>> 
>>>>>>> C++ library context (e.g. what we currently ship as libplasma-dev e.g.
>>>>> 
>>>>> 
>>>>>>> on Ubuntu/Debian). That seems still useful, but if the project isn't
>>>>> 
>>>>> 
>>>>>>> being actively maintained / developed (which, given the series of
>>>>> 
>>>>> 
>>>>>>> stale PRs over the last year or two, it doesn't seem to be) it's
>>>>> 
>>>>> 
>>>>>>> unclear whether we want to keep shipping it.
>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>>> At least on GitHub, the C++ API seems to be getting little use.  Most
>>>>> 
>>>>> 
>>>>>> search results below are forks/copies of the Arrow or Ray codebases.
>>>>> 
>>>>> 
>>>>>> There are also a couple stale experiments:
>>>>> 
>>>>> 
>>>>>> https://github.com/search?l=C%2B%2B&p=1&q=PlasmaClient&type=Code
>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>>> Regards
>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>>> Antoine.
>>>>> 
>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 

Reply via email to