On 10/26/21 10:02 PM, David Li wrote:
Hi Yibo,

Just curious, has there been more thought on this from your/the HPC side?

Yes. I will investigate the possible approach. Maybe build a quick (and dirty) POC test at first.


I also realized we never asked, what is motivating Flight in this space in the 
first place? Presumably broader Arrow support in general?

No special reason. Will be great if comes up with something useful, or an interesting experiment otherwise.


-David

On Fri, Sep 10, 2021, at 12:27, Micah Kornfield wrote:

I would support doing the work necessary to get UCX (or really any other
transport) supported, even if it is a lot of work. (I'm hoping this clears
the path to supporting a Flight-to-browser transport as well; a few
projects seem to have rolled their own approaches but I think Flight itself
should really handle this, too.)


Another possible technical approach is investigating to see if coming up
with a  custom gRPC "channel" implementation for new transports .
Searching around it seems like there were some defunct PRs trying to
enable UCX as one, I didn't look closely enough at why they might have
failed.

On Thu, Sep 9, 2021 at 11:07 AM David Li <lidav...@apache.org> wrote:

I would support doing the work necessary to get UCX (or really any other
transport) supported, even if it is a lot of work. (I'm hoping this clears
the path to supporting a Flight-to-browser transport as well; a few
projects seem to have rolled their own approaches but I think Flight itself
should really handle this, too.)

 From what I understand, you could tunnel gRPC over UCX as Keith mentions,
or directly use UCX, which is what it sounds like you are thinking about.
One idea we had previously was to stick to gRPC for 'control plane'
methods, and support alternate protocols only for 'data plane' methods like
DoGet - this might be more manageable, depending on what you have in mind.

In general - there's quite a bit of work here, so it would help to
separate the work into phases, and share some more detailed
design/implementation plans, to make review more manageable. (I realize of
course this is just a general interest check right now.) Just splitting
gRPC/Flight is going to take a decent amount of work, and (from what little
I understand) using UCX means choosing from various communication methods
it offers and writing a decent amount of scaffolding code, so it would be
good to establish what exactly a 'UCX' transport means. (For instance,
presumably there's no need to stick to the Protobuf-based wire format, but
what format would we use?)

It would also be good to expand the benchmarks, to validate the
performance we get from UCX and have a way to compare it against gRPC.
Anecdotally I've found gRPC isn't quite able to saturate a connection so it
would be interesting to see what other transports can do.

Jed - how would you see MPI and Flight interacting? As another
transport/alternative to UCX? I admit I'm not familiar with the HPC space.

About transferring commands with data: Flight already has an app_metadata
field in various places to allow things like this, it may be interesting to
combine with the ComputeIR proposal on this mailing list, and hopefully you
& your colleagues can take a look there as well.

-David

On Thu, Sep 9, 2021, at 11:24, Jed Brown wrote:
Yibo Cai <yibo....@arm.com> writes:

HPC infrastructure normally leverages RDMA for fast data transfer
among
storage nodes and compute nodes. Computation tasks are dispatched to
compute nodes with best fit resources.

Concretely, we are investigating porting UCX as Flight transport
layer.
UCX is a communication framework for modern networks. [1]
Besides HPC usage, many projects (spark, dask, blazingsql, etc) also
adopt UCX to accelerate network transmission. [2][3]

I'm interested in this topic and think it's important that even if the
focus is direct to UCX, that there be some thought into MPI
interoperability and support for scalable collectives. MPI considers UCX to
be an implementation detail, but the two main implementations (MPICH and
Open MPI) support it and vendor implementations are all derived from these
two.




Reply via email to