Hi,

there are no cons of using SPARK with GPU's you just have to be careful
about the GPU memory and a few other details.

I have seen sometimes 10x improvement over general SPARK 3.x performance
and sometimes around 30x.

Not all the queries will be performant with GPU's and it is up to you to
test out scenarios specific to you. I use EMR for this option and it is
really impressive what NVIDIA folks have done.

I think, there was an initial promise with SPARK 3.x release that SPARK
dataframes can be transferred directly through native integration to
tensorflow and others, which is a brilliant way forward for SPARK, but I
think that SPARK project leaders are yet to prioritise it.

Also Ray, another project by Berkeley, is trying to make SPARK dataframes
transfer to tensorflow. Clearly if SPARK users use Ray to transfer SPARK
dataframes to tensorflow and other frameworks, then obviously Ray will have
massive adoption.

Personally I think that SPARK community could have just built the
integration with other frameworks natively given the fantastic
contributions by NVIDIA to SPARK and such a large active development
community, but surely Ray also has to win as well and nothing better than
to ride on the success of SPARK. But I may be wrong, and SPARK community
may still be developing those integrations.


Regards,
Gourav Sengupta


On Fri, Jul 30, 2021 at 2:46 AM Artemis User <arte...@dtechspace.com> wrote:

> Has anyone had any experience with running Spark-Rapids on a GPU-powered
> cluster (https://github.com/NVIDIA/spark-rapids)?  I am very interested
> in knowing:
>
>    1. What is the hardware/software platform and the type of Spark
>    cluster you are using to run Spark-Rapids?
>    2. How easy was the installation process?
>    3. Are you running Scala or PySpark or both with Spark-Rapids?
>    4. Have performance you've seen compared with running a CPU-only
>    cluster?
>    5. Any pros/cons of using Spark-Rapids?
>
> Thanks a lot in advance!
>
> -- ND
>

Reply via email to