Re: Running Spark Rapids on GPU-Powered Spark Cluster

2021-07-30 Thread Mich Talebzadeh
Hi, If I may say, from time to time we have had some disagreements in the forum myself included. However, we have all been here long enough to look for collaboration as opposed to going in tangent (no pun intended). So I repeat what our friend Artemis User requested originally for anyone with e

Re: Running Spark Rapids on GPU-Powered Spark Cluster

2021-07-30 Thread Artemis User
Gourav, with all respect, I really don't want to start a conversation about your political correctness.  I don't think my comments offend anyone in this group (including you) except big corporations.  Again, I am looking for concrete answers to my questions that can help me to get my project st

Spark-SQL plugin into HIVE

2021-07-30 Thread Renganathan Mutthiah
Hi, HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift framework is actually customised as HIVESERVER2. In this way, HIVE is acting as a service. Via programming language, we can use HIVE as a

Re: Running Spark Rapids on GPU-Powered Spark Cluster

2021-07-30 Thread Gourav Sengupta
Hi Artemis, no one, and I repeat no one, is monopolising the data science market, in fact almost all algorithms and code and papers are available for free with largest open source contributions coming in from Amazon, Google, and Azure, who you are saying are trying to monopolise the market. I thi

Re: Running Spark Rapids on GPU-Powered Spark Cluster

2021-07-30 Thread Artemis User
Thanks Gourav for the info.  Actually I am looking for concrete experiences and detailed best practices from people who have build their own GPU-powered environment instead of relying on big cloud providers who are dominating and trying to monopolize the data science market -- ND On 7/30/

Re: Cloudera Parcel : spark issues after upgrade 1.6 to 2.4

2021-07-30 Thread Sean Owen
(This is a list of OSS Spark - anything vendor-specific should go to vendor lists for better answers.) On Fri, Jul 30, 2021 at 8:35 AM Harsh Sharma wrote: > hi Team , > > we are upgrading our cloudera parcels to 6.X from 5.x , hence e have > upgraded version of park from 1.6 to 2.4 . While exec

Cloudera Parcel : spark issues after upgrade 1.6 to 2.4

2021-07-30 Thread Harsh Sharma
hi Team , we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded version of park from 1.6 to 2.4 . While executing a spark program we are getting the below error : Please help us how to resolve in cloudera parcels. There are suggestion to install spark gateway roles

Re: Connection Reset by Peer : failed to remove cached rdd

2021-07-30 Thread Harsh Sharma
[Stage 284:>(199 + 1) / 200][Stage 292:> (1 + 3) / 200] [Stage 284:>(199 + 1) / 200][Stage 292:> (2 + 3) / 200] [Stage 292:> (2 + 4) / 200][14/06/21 10:46:17,006 WARN shuffle-server-4](Transport

Re: How can I write data to hive with jdbc

2021-07-30 Thread Mich Talebzadeh
This is a generic JDBC write to a DB from Pyspark def writeTableWithJDBC(dataFrame, url, tableName, user, password, driver, mode): try: dataFrame. \ write. \ format("jdbc"). \ option("url", url). \ option("dbtable", tableName). \

How can I write data to hive with jdbc

2021-07-30 Thread igyu
val DF = sparkSession.read.format("jdbc") .option("url", "jdbc:hive2://tidb4ser:11000/hivetest;hive.server2.proxy.user=jztwk") .option("dbtable", "(SELECT * FROM tb_user where created=1602864000) as tmp") .option("user", "admin") .option("password", "00") .option("fetchsize", "2000")

Re: Running Spark Rapids on GPU-Powered Spark Cluster

2021-07-30 Thread Gourav Sengupta
Hi, there are no cons of using SPARK with GPU's you just have to be careful about the GPU memory and a few other details. I have seen sometimes 10x improvement over general SPARK 3.x performance and sometimes around 30x. Not all the queries will be performant with GPU's and it is up to you to te