M
> *To:* Xiangrui Meng
> *Cc:* Reynold Xin; dev
>
> *Subject:* Re: Integrating ML/DL frameworks with Spark
> Thanks for starting this discussion, I'd also like to see some
> improvements in this area and glad to hear that the Pandas UDFs / Arrow
> functionality might be usefu
Very cool. We would be very interested in this.
What is the plan forward to make progress in each of the three areas?
From: Bryan Cutler
Sent: Monday, May 14, 2018 11:37:20 PM
To: Xiangrui Meng
Cc: Reynold Xin; dev
Subject: Re: Integrating ML/DL frameworks with
Hi all,
Paul Ogilvie pointed this thread out to me; we overlapped a little at LinkedIn.
It’s good to see that this kind of discussion is going on!
I have some thoughts regarding the discussion going on:
- Practically speaking, one of the lowest hanging fruit is the ability for
Spark to request
Thanks for starting this discussion, I'd also like to see some improvements
in this area and glad to hear that the Pandas UDFs / Arrow functionality
might be useful. I'm wondering if from your initial investigations you
found anything lacking from the Arrow format or possible improvements that
wou
Shivaram: Yes, we can call it "gang scheduling" or "barrier
synchronization". Spark doesn't support it now. The proposal is to have a
proper support in Spark's job scheduler, so we can integrate well with
MPI-like frameworks.
On Tue, May 8, 2018 at 11:17 AM Nan Zhu wrote:
> .how I skipped th
.how I skipped the last part
On Tue, May 8, 2018 at 11:16 AM, Reynold Xin wrote:
> Yes, Nan, totally agree. To be on the same page, that's exactly what I
> wrote wasn't it?
>
> On Tue, May 8, 2018 at 11:14 AM Nan Zhu wrote:
>
>> besides that, one of the things which is needed by mul
Yes, Nan, totally agree. To be on the same page, that's exactly what I
wrote wasn't it?
On Tue, May 8, 2018 at 11:14 AM Nan Zhu wrote:
> besides that, one of the things which is needed by multiple frameworks is
> to schedule tasks in a single wave
>
> i.e.
>
> if some frameworks like xgboost/mxn
besides that, one of the things which is needed by multiple frameworks is
to schedule tasks in a single wave
i.e.
if some frameworks like xgboost/mxnet requires 50 parallel workers, Spark
is desired to provide a capability to ensure that either we run 50 tasks at
once, or we should quit the compl
I think that's what Xiangrui was referring to. Instead of retrying a single
task, retry the entire stage, and the entire stage of tasks need to be
scheduled all at once.
On Tue, May 8, 2018 at 8:53 AM Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
>
>>
>>>- Fault tolerance and ex
I am committer on the MXNet project and very interested in working on
Integrating with Spark.
I am wondering how would training proceed in case of
1) training is done on one host with multiple GPUs -- I don't know if
Spark's capabilities can leveraged here
2) distributed training with data paralle
>
>
>
>>- Fault tolerance and execution model: Spark assumes fine-grained
>>task recovery, i.e. if something fails, only that task is rerun. This
>>doesn’t match the execution model of distributed ML/DL frameworks that are
>>typically MPI-based, and rerunning a single task would lea
Hi,
You misunderstood me. I exactly wanted to say that Spark should be aware of
them. So I agree with you. The point is to have also the yarn GPU/fpga
scheduling as an option aside a potential spark GPU/fpga scheduler.
For the other proposal - yes the interfaces are slow, but one has to think i
I don't think it's sufficient to have them in YARN (or any other services)
without Spark aware of them. If Spark is not aware of them, then there is
no way to really efficiently utilize these accelerators when you run
anything that require non-accelerators (which is almost 100% of the cases
in real
Hadoop / Yarn 3.1 added GPU scheduling. 3.2 is planned to add FPGA scheduling,
so it might be worth to have the last point generic that not only the Spark
scheduler, but all supported schedulers can use GPU.
For the other 2 points I just wonder if it makes sense to address this in the
ml framew
Thanks Reynold for summarizing the offline discussion! I added a few
comments inline. -Xiangrui
On Mon, May 7, 2018 at 5:37 PM Reynold Xin wrote:
> Hi all,
>
> Xiangrui and I were discussing with a heavy Apache Spark user last week on
> their experiences integrating machine learning (and deep le
15 matches
Mail list logo