Re: Running Spark Rapids on GPU-Powered Spark Cluster

Mich Talebzadeh Fri, 30 Jul 2021 13:14:34 -0700

Hi,


If I may say, from time to time we have had some disagreements in the forum
myself included. However, we have all been here long enough to look for
collaboration as opposed to going in tangent (no pun intended).


So I repeat what our friend Artemis User requested originally for anyone
with experience of using "Spark-Rapids on a GPU-powered cluster" to share
their views.


"Has anyone had any experience with running Spark-Rapids on a GPU-powered
cluster (https://github.com/NVIDIA/spark-rapids)?  I am very interested in
knowing:

   1. What is the hardware/software platform and the type of Spark cluster
   you are using to run Spark-Rapids?
   2. How easy was the installation process?
   3. Are you running Scala or PySpark or both with Spark-Rapids?
   4. What performance have you seen versus running a CPU-only cluster?
   5. Any pros/cons of using Spark-Rapids?"


I for myself have no experience with the GPU for Spark. I know of data
scientists who have generally used GPUs with karas and Tensorflow going
back to 2017 but cannot give any qualified answer. So I leave it to other
forum members who can help better our friend.

Cheers,

Mich


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 30 Jul 2021 at 20:32, Artemis User <arte...@dtechspace.com> wrote:

> Gourav, with all respect, I really don't want to start a conversation
> about your political correctness.  I don't think my comments offend anyone
> in this group (including you) except big corporations.  Again, I am looking
> for concrete answers to my questions that can help me to get my project
> started, not some C-level talks.  If you don't know the answers, I'd
> appreciate you just ignore my posts...
>
> -- ND
>
> On 7/30/21 12:15 PM, Gourav Sengupta wrote:
>
> Hi Artemis,
>
> no one, and I repeat no one, is monopolising the data science market, in
> fact almost all algorithms and code and papers are available for free with
> largest open source contributions coming in from Amazon, Google, and Azure,
> who you are saying are trying to monopolise the market.
>
> I think that we owe to these large corporations who spent billions and
> then open source their products. In this chain, which I am one of the
> oldest members, you will receive responses from Matei Zaharia, Reynold Xin,
> Burak, TD, Michael Amburst, and so on.
>
> I personally find myself fortunate to be a part of this kind of a group.
> They still are founders of Databricks which is a profit making company, but
> all innovations from Databricks are eventually given away for free by
> projects which are headed by the employees of Databricks.
>
> Let us please be grateful and acknowledge their kindness if possible. I am
> sure we will all find help that we seek, but the help will most likely come
> from those as well who are paid and supported by companies towards whom you
> are being so unkind
>
>
> Regards,
> Gourav Sengupta
>
>
>
>
> On Fri, Jul 30, 2021 at 4:02 PM Artemis User <arte...@dtechspace.com>
> wrote:
>
>> Thanks Gourav for the info.  Actually I am looking for concrete
>> experiences and detailed best practices from people who have build their
>> own GPU-powered environment instead of relying on big cloud providers who
>> are dominating and trying to monopolize the data science market....
>>
>> -- ND
>>
>> On 7/30/21 4:37 AM, Gourav Sengupta wrote:
>>
>> Hi,
>>
>> there are no cons of using SPARK with GPU's you just have to be careful
>> about the GPU memory and a few other details.
>>
>> I have seen sometimes 10x improvement over general SPARK 3.x performance
>> and sometimes around 30x.
>>
>> Not all the queries will be performant with GPU's and it is up to you to
>> test out scenarios specific to you. I use EMR for this option and it is
>> really impressive what NVIDIA folks have done.
>>
>> I think, there was an initial promise with SPARK 3.x release that SPARK
>> dataframes can be transferred directly through native integration to
>> tensorflow and others, which is a brilliant way forward for SPARK, but I
>> think that SPARK project leaders are yet to prioritise it.
>>
>> Also Ray, another project by Berkeley, is trying to make SPARK dataframes
>> transfer to tensorflow. Clearly if SPARK users use Ray to transfer SPARK
>> dataframes to tensorflow and other frameworks, then obviously Ray will have
>> massive adoption.
>>
>> Personally I think that SPARK community could have just built the
>> integration with other frameworks natively given the fantastic
>> contributions by NVIDIA to SPARK and such a large active development
>> community, but surely Ray also has to win as well and nothing better than
>> to ride on the success of SPARK. But I may be wrong, and SPARK community
>> may still be developing those integrations.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>> On Fri, Jul 30, 2021 at 2:46 AM Artemis User <arte...@dtechspace.com>
>> wrote:
>>
>>> Has anyone had any experience with running Spark-Rapids on a GPU-powered
>>> cluster (https://github.com/NVIDIA/spark-rapids)?  I am very interested
>>> in knowing:
>>>
>>>    1. What is the hardware/software platform and the type of Spark
>>>    cluster you are using to run Spark-Rapids?
>>>    2. How easy was the installation process?
>>>    3. Are you running Scala or PySpark or both with Spark-Rapids?
>>>    4. Have performance you've seen compared with running a CPU-only
>>>    cluster?
>>>    5. Any pros/cons of using Spark-Rapids?
>>>
>>> Thanks a lot in advance!
>>>
>>> -- ND
>>>
>>
>>
>

Re: Running Spark Rapids on GPU-Powered Spark Cluster

Reply via email to