Re: Running Spark Rapids on GPU-Powered Spark Cluster

Artemis User Fri, 30 Jul 2021 12:30:12 -0700

Gourav, with all respect, I really don't want to start a conversationabout your political correctness. I don't think my comments offendanyone in this group (including you) except big corporations. Again, Iam looking for concrete answers to my questions that can help me to getmy project started, not some C-level talks. If you don't know theanswers, I'd appreciate you just ignore my posts...


-- ND


On 7/30/21 12:15 PM, Gourav Sengupta wrote:

Hi Artemis,

no one, and I repeat no one, is monopolising the data science market,in fact almost all algorithms and code and papers are available forfree with largest open source contributions coming in from Amazon,Google, and Azure, who you are saying are trying to monopolise themarket.

I think that we owe to these large corporations who spent billions andthen open source their products. In this chain, which I am one of theoldest members, you will receive responses from Matei Zaharia, ReynoldXin, Burak, TD, Michael Amburst, and so on.

I personally find myself fortunate to be a part of this kind of agroup. They still are founders of Databricks which is a profit makingcompany, but all innovations from Databricks are eventually given awayfor free by projects which are headed by the employees of Databricks.

Let us please be grateful and acknowledge their kindness if possible.I am sure we will all find help that we seek, but the help will mostlikely come from those as well who are paid and supported by companiestowards whom you are being so unkind



Regards,
Gourav Sengupta

On Fri, Jul 30, 2021 at 4:02 PM Artemis User <arte...@dtechspace.com<mailto:arte...@dtechspace.com>> wrote:


    Thanks Gourav for the info.  Actually I am looking for concrete
    experiences and detailed best practices from people who have build
    their own GPU-powered environment instead of relying on big cloud
    providers who are dominating and trying to monopolize the data
    science market....

    -- ND

    On 7/30/21 4:37 AM, Gourav Sengupta wrote:

    Hi,

    there are no cons of using SPARK with GPU's you just have to be
    careful about the GPU memory and a few other details.

    I have seen sometimes 10x improvement over general SPARK 3.x
    performance and sometimes around 30x.

    Not all the queries will be performant with GPU's and it is up to
    you to test out scenarios specific to you. I use EMR for this
    option and it is really impressive what NVIDIA folks have done.

    I think, there was an initial promise with SPARK 3.x release that
    SPARK dataframes can be transferred directly through native
    integration to  tensorflow and others, which is a brilliant way
    forward for SPARK, but I think that SPARK project leaders are yet
    to prioritise it.

    Also Ray, another project by Berkeley, is trying to make SPARK
    dataframes transfer to tensorflow. Clearly if SPARK users use Ray
    to transfer SPARK dataframes to tensorflow and other frameworks,
    then obviously Ray will have massive adoption.

    Personally I think that SPARK community could have just built the
    integration with other frameworks natively given the fantastic
    contributions by NVIDIA to SPARK and such a large active
    development community, but surely Ray also has to win as well and
    nothing better than to ride on the success of SPARK. But I may be
    wrong, and SPARK community may still be developing those
    integrations.


    Regards,
    Gourav Sengupta


    On Fri, Jul 30, 2021 at 2:46 AM Artemis User
    <arte...@dtechspace.com <mailto:arte...@dtechspace.com>> wrote:

        Has anyone had any experience with running Spark-Rapids on a
        GPU-powered cluster (https://github.com/NVIDIA/spark-rapids
        <https://github.com/NVIDIA/spark-rapids>)? I am very
        interested in knowing:

         1. What is the hardware/software platform and the type of
            Spark cluster you are using to run Spark-Rapids?
         2. How easy was the installation process?
         3. Are you running Scala or PySpark or both with Spark-Rapids?
         4. Have performance you've seen compared with running a
            CPU-only cluster?
         5. Any pros/cons of using Spark-Rapids?

        Thanks a lot in advance!

        -- ND

Re: Running Spark Rapids on GPU-Powered Spark Cluster

Reply via email to