Spark with GPU

2022-08-13 Thread rajat kumar
Hello, I have been hearing about GPU in spark3. For batch jobs , will it help to improve GPU performance. Also is GPU support available only on Databricks or on cloud based Spark clusters ? I am new , if anyone can share insight , it will help Thanks Rajat

Re: Spark with GPU

2022-08-13 Thread Sean Owen
Spark does not use GPUs itself, but tasks you run on Spark can. The only 'support' there is is for requesting GPUs as resources for tasks, so it's just a question of resource management. That's in OSS. On Sat, Aug 13, 2022 at 8:16 AM rajat kumar wrote: > Hello, > > I have been hearing about GPU

Re: Spark with GPU

2022-08-13 Thread rajat kumar
Thanks Sean. Also, I observed that lots of things are not supported in GPU by NVIDIA. E.g. nested types/decimal type/Udfs etc. So, will it use CPU automatically for running those tasks which require nested types or will it run on GPU and fail. Thanks Rajat On Sat, Aug 13, 2022, 18:54 Sean Owen

Re: Spark with GPU

2022-08-13 Thread Sean Owen
This isn't a Spark question, but rather a question about whatever Spark application you are talking about. RAPIDS? On Sat, Aug 13, 2022 at 10:35 AM rajat kumar wrote: > Thanks Sean. > > Also, I observed that lots of things are not supported in GPU by NVIDIA. > E.g. nested types/decimal type/Udfs

Re: Spark with GPU

2022-08-13 Thread Alessandro Bellina
This thread may be better suited as a discussion in our Spark plug-in’s repo: https://github.com/NVIDIA/spark-rapids/discussions. Just to answer the questions that were asked so far: I would recommend checking our documentation for what is supported as of our latest release (22.06): https://nvidi

Re: Spark with GPU

2022-08-13 Thread Gourav Sengupta
One of the best things that could have happened to SPARK (now mostly an overhyped ETL tool with small incremental optimisation changes and no large scale innovation) is the release by NVIDIA for GPU processing. You need some time to get your head around it, but it is supported quite easily in AWS E

PySpark schema sanitization

2022-08-13 Thread Shay Elbaz
Hi, I have a simple ETL application, where the data source schama needs to be sanitized. Column names might include special characters that need to be removed. For example, from "some{column}" to "some_column". Normally I'd just alias the columns, but in this case the schema can have thousands