Re: spark session jdbc performance

2017-10-25 Thread Gourav Sengupta
Hi Naveen, Can you please copy and paste the lines in your original email again, and perhaps then Lucas can go through it completely & kindly stop thinking that others are responding by assuming things? On other hand, please try to let me know how things are going on, there is another post on thi

Re: spark session jdbc performance

2017-10-25 Thread lucas.g...@gmail.com
Are we seeing the UI is showing only one partition to run the query? The original poster hasn't replied yet. My assumption is that there's only one executor configured / deployed. But we only know what the OP stated which wasn't enough to be sure of anything. Why are you suggesting that partiti

Re: spark session jdbc performance

2017-10-25 Thread Gourav Sengupta
Hi Lucas, so if I am assuming things, can you please explain why the UI is showing only one partition to run the query? Regards, Gourav Sengupta On Wed, Oct 25, 2017 at 6:03 PM, lucas.g...@gmail.com wrote: > Gourav, I'm assuming you misread the code. It's 30 partitions, which > isn't a ridic

Re: spark session jdbc performance

2017-10-25 Thread lucas.g...@gmail.com
Gourav, I'm assuming you misread the code. It's 30 partitions, which isn't a ridiculous value. Maybe you misread the upperBound for the partitions? (That would be ridiculous) Why not use the PK as the partition column? Obviously it depends on the downstream queries. If you're going to be perfo

Re: spark session jdbc performance

2017-10-24 Thread Gourav Sengupta
Hi Naveen, I do not think that it is prudent to use the PK as the partitionColumn. That is too many partitions for any system to handle. The numPartitions will be valid in case of JDBC very differently. Please keep me updated on how things go. Regards, Gourav Sengupta On Tue, Oct 24, 2017 at 1

Re: spark session jdbc performance

2017-10-24 Thread Srinivasa Reddy Tatiredidgari
Hi, is the subquery is user defined sqls or table name in db.If it is user Defined sql.Make sure ur partition column is in main select clause. Sent from Yahoo Mail on Android On Wed, Oct 25, 2017 at 3:25, Naveen Madhire wrote: Hi,   I am trying to fetch data from Oracle DB using a subq

Re: spark session jdbc performance

2017-10-24 Thread lucas.g...@gmail.com
Sorry, I meant to say: "That code looks SANE to me" Assuming that you're seeing the query running partitioned as expected then you're likely configured with one executor. Very easy to check in the UI. Gary Lucas On 24 October 2017 at 16:09, lucas.g...@gmail.com wrote: > Did you check the quer

Re: spark session jdbc performance

2017-10-24 Thread lucas.g...@gmail.com
Did you check the query plan / check the UI? That code looks same to me. Maybe you've only configured for one executor? Gary On Oct 24, 2017 2:55 PM, "Naveen Madhire" wrote: > > Hi, > > > > I am trying to fetch data from Oracle DB using a subquery and experiencing > lot of performance issues.

spark session jdbc performance

2017-10-24 Thread Naveen Madhire
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, *Using Spark 2.0.2* *val *df = spark_session.read.format(*"jdbc"*) .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*) .option(*"url"*, jdbc_url) .o

spark session jdbc performance

2017-10-24 Thread Madhire, Naveen
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, Using Spark 2.0.2 val df = spark_session.read.format("jdbc") .option("driver","oracle.jdbc.OracleDriver") .option("url", jdbc_url) .option("user", user)

spark session jdbc performance

2017-10-24 Thread Madhire, Naveen
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, Using Spark 2.0.2 val df = spark_session.read.format("jdbc") .option("driver","oracle.jdbc.OracleDriver") .option("url", jdbc_url) .option("user", user)