spark session jdbc performance

2017-10-24 Thread Naveen Madhire
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, *Using Spark 2.0.2* *val *df = spark_session.read.format(*"jdbc"*) .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*) .option(*"url"*, jdbc_url) .o

spark session jdbc performance

2017-10-24 Thread Madhire, Naveen
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, Using Spark 2.0.2 val df = spark_session.read.format("jdbc") .option("driver","oracle.jdbc.OracleDriver") .option("url", jdbc_url) .option("user", user)

spark session jdbc performance

2017-10-24 Thread Madhire, Naveen
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, Using Spark 2.0.2 val df = spark_session.read.format("jdbc") .option("driver","oracle.jdbc.OracleDriver") .option("url", jdbc_url) .option("user", user)

PySpark pickling behavior

2017-10-11 Thread Naveen Swamy
) rdd = rdd.map(lambda x: Model.predict(x, args) //*fails here with: pickle.PicklingError: Could not serialize object: TypeError: can't pickle thread.lock objects* Thanks, Naveen

Loading objects only once

2017-09-27 Thread Naveen Swamy
k gets mapped to a separate python process? The reason I ask is I want to be to use mapPartition method to load a batch of files and run inference on them separately for which I need to load the object once per task. Any Thanks for your time in answering my question. Cheers, Naveen

Re: Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
as it has in built HDFS log > rolling capabilities > > On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire > wrote: > >> Hi, >> >> I am using spark streaming with 1 minute duration to read data from kafka >> topic, apply transformations and persist into HDF

Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
and create a HDFS directory say *every 30 minutes* instead of duration of the spark streaming application? Any help would be appreciated. Thanks, Naveen

Re:

2017-01-23 Thread Naveen
Hi Keith, Can you try including a clean-up step at the end of job, before driver is out of SparkContext, to clean the necessary files through some regex patterns or so, on all nodes in your cluster by default. If files are not available on few nodes, that should not be a problem, isnnt? On Sun,

Re: 答复: submit spark task on yarn asynchronously via java?

2016-12-24 Thread Naveen
Hi, Please use SparkLauncher API class and invoke the threads using async calls using Futures. Using SparkLauncher, you can mention class name, application resouce, arguments to be passed to the driver, deploy-mode etc. I would suggest to use scala's Future, is scala code is possible. https://spa

Re: Launching multiple spark jobs within a main spark job.

2016-12-24 Thread Naveen
Thanks Liang, Vadim and everyone for your inputs!! With this clarity, I've tried client modes for both main and sub-spark jobs. Every main spark job and its corresponding threaded spark jobs are coming up on the YARN applications list and the jobs are getting executed properly. I need to now test

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Naveen
r these spawned sparkcontexts will get different nodes / executors from resource manager? On Wed, Dec 21, 2016 at 6:43 PM, Naveen wrote: > Hi Sebastian, > > Yes, for fetching the details from Hive and HBase, I would want to use > Spark's HiveContext etc. > However, based on your point,

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Naveen
launching the > jobs? > You can use SparkLauncher in a normal app and just listen for state > transitions > > On Wed, 21 Dec 2016, 11:44 Naveen, wrote: > >> Hi Team, >> >> Thanks for your responses. >> Let me give more details in a picture of how I am trying

Re: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread Naveen
> Anyway, If you run spark applicaction you would have multiple jobs, which > makes sense that it is not a problem. > > > > Thanks David. > > > > *From:* Naveen [mailto:hadoopst...@gmail.com] > *Sent:* Wednesday, December 21, 2016 9:18 AM > *To:* d...@spark.apache.o

Launching multiple spark jobs within a main spark job.

2016-12-20 Thread Naveen
Hi Team, Is it ok to spawn multiple spark jobs within a main spark job, my main spark job's driver which was launched on yarn cluster, will do some preprocessing and based on it, it needs to launch multilple spark jobs on yarn cluster. Not sure if this right pattern. Please share your thoughts. S

java.lang.ClassCastException: optional binary element (UTF8) is not a group

2016-09-20 Thread Rajan, Naveen
org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) ... 3 more Regards, Naveen This email is confidential and intended only for the use of the individual or entity named above and may contain information that is

Spark DataFrame sum of multiple columns

2016-04-21 Thread Naveen Kumar Pokala
help me on this? Thanks, Naveen

reading EOF exception while reading parquet ile from hadoop

2016-04-20 Thread Naveen Kumar Pokala
read.java:745) Thanks, Naveen Kumar Pokala [cid:image001.jpg@01D19B26.32EE0FE0]

Standard deviation on multiple columns

2016-04-18 Thread Naveen Kumar Pokala
pache.spark.sql.DataFrame (exprs: scala.collection.immutable.Map[String,String])org.apache.spark.sql.DataFrame (aggExpr: (String, String),aggExprs: (String, String)*)org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.sql.Column) Naveen

Determinant of Matrix

2015-08-24 Thread Naveen
Hi, Is there any function to find the determinant of a mllib.linalg.Matrix (a covariance matrix) using Spark? Regards, Naveen - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user

Re: Convert mllib.linalg.Matrix to Breeze

2015-08-20 Thread Naveen
Yanbo Liang <mailto:yblia...@gmail.com>> wrote: You can use Matrix.toBreeze() <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala#L56> . 2015-08-20 18:24 GMT+08:00 Naveen mailto:nav...@formcept.com>>:

Convert mllib.linalg.Matrix to Breeze

2015-08-20 Thread Naveen
Hi All, Is there anyway to convert a mllib matrix to a Dense Matrix of Breeze? Any leads are appreciated. Thanks, Naveen - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Repartition question

2015-08-03 Thread Naveen Madhire
Hi All, I am running the WikiPedia parsing example present in the "Advance Analytics with Spark" book. https://github.com/sryza/aas/blob/d3f62ef3ed43a59140f4ae8afbe2ef81fc643ef2/ch06-lsa/src/main/scala/com/cloudera/datascience/lsa/ParseWikipedia.scala#l112 The partitions of the RDD returned by

pyspark issue

2015-07-27 Thread Naveen Madhire
Hi, I am running pyspark in windows and I am seeing an error while adding pyfiles to the sparkcontext. below is the example, sc = SparkContext("local","Sample",pyFiles="C:/sample/yattag.zip") this fails with no file found error for "C" The below logic is treating the path as individual files l

Re: Spark - Eclipse IDE - Maven

2015-07-24 Thread Naveen Madhire
You can use Intellij for Scala. There are many articles online which you can refer for setting up Intellij and scala pluggin. Thanks On Friday, July 24, 2015, Siva Reddy wrote: > I want to program in scala for spark. > > > > -- > View this message in context: > http://apache-spark-user-list.10

LinearRegressionWithSGD Outputs NaN

2015-07-21 Thread Naveen
val regParam = 0.01 val regType = "L2" val algorithm = new LinearRegressionWithSGD() algorithm.optimizer.setNumIterations(numIterations).setStepSize(stepSize).setRegParam(regParam) val model = algorithm.run(parsedTrainData)

Re: PySpark Nested Json Parsing

2015-07-20 Thread Naveen Madhire
I had the similar issue with spark 1.3 After migrating to Spark 1.4 and using sqlcontext.read.json it worked well I think you can look at dataframe select and explode options to read the nested json elements, array etc. Thanks. On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu wrote: > Could you tr

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-07-18 Thread Naveen Madhire
I am facing the same issue, i tried this but getting compilation error for the "$" in the explode function So, I had to modify to the below to make it work. df.select(explode(new Column("entities.user_mentions")).as("mention")) On Wed, Jun 24, 2015 at 2:48 PM, Michael Armbrust wrote: > Star

Re: Spark and HDFS

2015-07-15 Thread Naveen Madhire
Yes. I did this recently. You need to copy the cloudera cluster related conf files into the local machine and set HADOOP_CONF_DIR or YARN_CONF_DIR. And also local machine should be able to ssh to the cloudera cluster. On Wed, Jul 15, 2015 at 8:51 AM, ayan guha wrote: > Assuming you run spark lo

Job aborted due to stage failure: Task not serializable:

2015-07-15 Thread Naveen Dabas
I am using the below code and using kryo serializer .when i run this code i got this error : Task not serializable in commented line2) how broadcast variables are treated in exceotu.are they local variables or can be used in any function defined as global variables. object StreamingLogIn

Spark executor memory information

2015-07-13 Thread Naveen Dabas
Hi, I am new to spark and need some guidance on below mentioned points: 1)I am using spark 1.2,is it possible to see how much memory is being allocated to an executor for web UI. If not how can we figure that out.2)    I am interested in source code of mlib,it is possible to get access to

Re: Unit tests of spark application

2015-07-13 Thread Naveen Madhire
also use spark-testing-base from >> spark-packages.org as a basis for your unittests. >> >> On Fri, Jul 10, 2015 at 12:03 PM, Daniel Siegmann < >> daniel.siegm...@teamaol.com> wrote: >> >>> On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire &g

Unit tests of spark application

2015-07-10 Thread Naveen Madhire
Hi, I want to write junit test cases in scala for testing spark application. Is there any guide or link which I can refer. Thank you very much. -Naveen

DataFrame question

2015-07-07 Thread Naveen Madhire
Hi All, I am working with dataframes and have been struggling with this thing, any pointers would be helpful. I've a Json file with the schema like this, links: array (nullable = true) ||-- element: struct (containsNull = true) |||-- desc: string (nullable = true) |||-- id

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Naveen Madhire
Hi Marcelo, Quick Question. I am using Spark 1.3 and using Yarn Client mode. It is working well, provided I have to manually pip-install all the 3rd party libraries like numpy etc to the executor nodes. So the SPARK-5479 fix in 1.5 which you mentioned fix this as well? Thanks. On Thu, Jun 25,

Re: How to set HBaseConfiguration in Spark

2015-05-20 Thread Naveen Madhire
Cloudera blog has some details. Please check if this is helpful to you. http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ Thanks. On Wed, May 20, 2015 at 4:21 AM, donhoff_h <165612...@qq.com> wrote: > Hi, all > > I wrote a program to get HBaseConfiguration object in Spar

Failed to locate the winutils binary in the hadoop binary path

2015-01-29 Thread Naveen Kumar Pokala
Executor: Using REPL class URI: http://172.22.5.79:60436 15/01/29 17:21:28 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkdri...@ii01-hdhlg32.ciqhyd.com:60464/user/HeartbeatReceiver 15/01/29 17:21:28 INFO SparkILoop: Created spark context.. Spark context available as sc. -Naveen

Pyspark Interactive shell

2015-01-06 Thread Naveen Kumar Pokala
Hi, Anybody tried to connect to spark cluster( on UNIX machines) from windows interactive shell ? -Naveen.

Re: Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire
es with a: 24, Lines with b: 15 > > The exception seems to be happening with Spark cleanup after executing > your code. Try adding sc.stop() at the end of your program to see if the > exception goes away. > > > > > On Wednesday, December 31, 2014 6:40 AM, Naveen Madh

Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire
Hi All, I am trying to run a sample Spark program using Scala SBT, Below is the program, def main(args: Array[String]) { val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should be some file on your system val sc = new SparkContext("local", "Simple App", "E:/ApacheSpark/

Re: pyspark.daemon not found

2014-12-31 Thread Naveen Kumar Pokala
kjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Please help me to resolve this issue. -Naveen From: Naveen Kumar Pokala [mailto:npok...@spcapitaliq.com] Sent: Wednesday, December 31, 2014 2:28 PM To: user@spark.apache.org Subject: pyspark.daemon not found Error from python worker:

pyspark.daemon not found

2014-12-31 Thread Naveen Kumar Pokala
Error from python worker: python: module pyspark.daemon not found PYTHONPATH was: /home/npokala/data/spark-install/spark-master/python: Please can somebody help me on this, how to resolve the issue. -Naveen

python: module pyspark.daemon not found

2014-12-29 Thread Naveen Kumar Pokala
n.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Please can anyone suggest me how to resolve the issue. -Naveen

RE: Spark Job submit

2014-11-26 Thread Naveen Kumar Pokala
how to execute from spark submit from windows machine. Please provide me sample code if you have any. -Naveen From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Wednesday, November 26, 2014 10:03 PM To: Naveen Kumar Pokala Cc: user@spark.apache.org Subject: Re: Spark Job submit How about

Spark Job submit

2014-11-26 Thread Naveen Kumar Pokala
Hi. Is there a way to submit spark job on Hadoop-YARN cluster from java code. -Naveen

RE: Control number of parquet generated from JavaSchemaRDD

2014-11-25 Thread Naveen Kumar Pokala
Hi, While submitting your spark job mention --executor-cores 2 --num-executors 24 it will divide the dataset into 24*2 parquet files. Or set spark.default.parallelism value like 50 on sparkconf object. It will divide the dataset into 50 files into your HDFS. -Naveen -Original Message

Re: Submit Spark driver on Yarn Cluster in client mode

2014-11-24 Thread Naveen Kumar Pokala
Hi Akhil, But driver and yarn both are in different networks, How to specify (export HADOOP_CONF_DIR=XXX) path. Like driver is from my windows machine and yarn is some unix machine on different network. -Naveen. From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Monday, November 24

Submit Spark driver on Yarn Cluster in client mode

2014-11-24 Thread Naveen Kumar Pokala
Hi, I want to submit my spark program from my machine on a YARN Cluster in yarn client mode. How to specify al l the required details through SPARK submitter. Please provide me some details. -Naveen.

Execute Spark programs from local machine on Yarn-hadoop cluster

2014-11-21 Thread Naveen Kumar Pokala
y to give IP address, port all the details to connect a master(YARN) on some other network from my local spark Program. -Naveen

RE: Null pointer exception with larger datasets

2014-11-18 Thread Naveen Kumar Pokala
Thanks Akhil. -Naveen. From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Tuesday, November 18, 2014 1:19 PM To: Naveen Kumar Pokala Cc: user@spark.apache.org Subject: Re: Null pointer exception with larger datasets Make sure your list is not null, if that is null then its more like

Null pointer exception with larger datasets

2014-11-17 Thread Naveen Kumar Pokala
r.java:617) java.lang.Thread.run(Thread.java:745) How to handle this? -Naveen

HDFS read text file

2014-11-17 Thread Naveen Kumar Pokala
age001.png@01D0027F.FB321550] How to read that file, I mean each line as Object of student. -Naveen

Spark GCLIB error

2014-11-13 Thread Naveen Kumar Pokala
at java.lang.System.load(System.java:1083) at org.xerial.snappy.SnappyNativeLoader.load(SnappyNativeLoader.java:39) ... 29 more -Naveen.

Snappy error with Spark SQL

2014-11-12 Thread Naveen Kumar Pokala
) java.lang.Thread.run(Thread.java:745) Please help me. Regards, Naveen.

RE: Spark SQL configurations

2014-11-12 Thread Naveen Kumar Pokala
Thanks Akhil. -Naveen From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Wednesday, November 12, 2014 6:38 PM To: Naveen Kumar Pokala Cc: user@spark.apache.org Subject: Re: Spark SQL configurations JavaSQLContext.sqlContext.setConf is available. Thanks Best Regards On Wed, Nov 12, 2014

Spark SQL configurations

2014-11-12 Thread Naveen Kumar Pokala
[cid:image001.png@01CFFE9C.25904980] Hi, How to set the above properties on JavaSQLContext. I am not able to see setConf method on JavaSQLContext Object. I have added spark core jar and spark assembly jar to my build path. And I am using spark 1.1.0 and hadoop 2.4.0 --Naveen

RE: scala.MatchError

2014-11-12 Thread Naveen Kumar Pokala
) case class Instrument(issue: Issue = null) -Naveen From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, November 12, 2014 12:09 AM To: Xiangrui Meng Cc: Naveen Kumar Pokala; user@spark.apache.org Subject: Re: scala.MatchError Xiangrui is correct that is must be a java bean

scala.MatchError

2014-11-11 Thread Naveen Kumar Pokala
text.applySchema(JavaSQLContext.scala:90) at sample.spark.test.SparkJob.main(SparkJob.java:33) ... 5 more Please help me. Regards, Naveen.

save as file

2014-11-11 Thread Naveen Kumar Pokala
Hi, I am spark 1.1.0. I need a help regarding saving rdd in a JSON file? How to do that? And how to mentions hdfs path in the program. -Naveen

RE: Parallelize on spark context

2014-11-06 Thread Naveen Kumar Pokala
. How to check how many cores are running to complete task of 8 datasets?(Is there any commands or UI to check that) Regards, Naveen. From: holden.ka...@gmail.com [mailto:holden.ka...@gmail.com] On Behalf Of Holden Karau Sent: Friday, November 07, 2014 12:46 PM To: Naveen Kumar Pokala Cc: user

Parallelize on spark context

2014-11-06 Thread Naveen Kumar Pokala
batches of 500 size. Regards, Naveen.

Nesting RDD

2014-11-06 Thread Naveen Kumar Pokala
Hi, I am trying to execute a sample program by nesting the RDD inside the transformations. It is throwing null pointer exception. Any solution or alternative would be helpful. Thanks & regards, Naveen.

Number cores split up

2014-11-05 Thread Naveen Kumar Pokala
obs? 4) Do we have any chance to control the batch division on nodes? Please give some clarity on above. Thanks & Regards, Naveen

Spark Debugging

2014-10-29 Thread Naveen Kumar Pokala
on my local machine. I am not able to find a way to debug. Please let me know the ways to debug my driver program as well as executor programs [cid:image001.jpg@01CFF439.BBA1F3A0] Naveen kumar pokala +91 8801169530

Spark Debugging

2014-10-29 Thread Naveen Kumar Pokala
on my local machine. I am not able to find a way to debug. Please let me know the ways to debug my driver program as well as executor programs Regards, Naveen.