spark session jdbc performance

2017-10-24 Thread Naveen Madhire
Hi, I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues. Below is the query I am using, *Using Spark 2.0.2* *val *df = spark_session.read.format(*"jdbc"*) .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*) .option(*"url"*, jdbc_url) .o

Re: Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
as it has in built HDFS log > rolling capabilities > > On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire > wrote: > >> Hi, >> >> I am using spark streaming with 1 minute duration to read data from kafka >> topic, apply transformations and persist into HDF

Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
Hi, I am using spark streaming with 1 minute duration to read data from kafka topic, apply transformations and persist into HDFS. The application is creating a new directory every 1 minute with many partition files(= nbr of partitions). What parameter should I need to change/configure to persist

Repartition question

2015-08-03 Thread Naveen Madhire
Hi All, I am running the WikiPedia parsing example present in the "Advance Analytics with Spark" book. https://github.com/sryza/aas/blob/d3f62ef3ed43a59140f4ae8afbe2ef81fc643ef2/ch06-lsa/src/main/scala/com/cloudera/datascience/lsa/ParseWikipedia.scala#l112 The partitions of the RDD returned by

pyspark issue

2015-07-27 Thread Naveen Madhire
Hi, I am running pyspark in windows and I am seeing an error while adding pyfiles to the sparkcontext. below is the example, sc = SparkContext("local","Sample",pyFiles="C:/sample/yattag.zip") this fails with no file found error for "C" The below logic is treating the path as individual files l

Re: Spark - Eclipse IDE - Maven

2015-07-24 Thread Naveen Madhire
You can use Intellij for Scala. There are many articles online which you can refer for setting up Intellij and scala pluggin. Thanks On Friday, July 24, 2015, Siva Reddy wrote: > I want to program in scala for spark. > > > > -- > View this message in context: > http://apache-spark-user-list.10

Re: PySpark Nested Json Parsing

2015-07-20 Thread Naveen Madhire
I had the similar issue with spark 1.3 After migrating to Spark 1.4 and using sqlcontext.read.json it worked well I think you can look at dataframe select and explode options to read the nested json elements, array etc. Thanks. On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu wrote: > Could you tr

Re: How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames

2015-07-18 Thread Naveen Madhire
I am facing the same issue, i tried this but getting compilation error for the "$" in the explode function So, I had to modify to the below to make it work. df.select(explode(new Column("entities.user_mentions")).as("mention")) On Wed, Jun 24, 2015 at 2:48 PM, Michael Armbrust wrote: > Star

Re: Spark and HDFS

2015-07-15 Thread Naveen Madhire
Yes. I did this recently. You need to copy the cloudera cluster related conf files into the local machine and set HADOOP_CONF_DIR or YARN_CONF_DIR. And also local machine should be able to ssh to the cloudera cluster. On Wed, Jul 15, 2015 at 8:51 AM, ayan guha wrote: > Assuming you run spark lo

Re: Unit tests of spark application

2015-07-13 Thread Naveen Madhire
also use spark-testing-base from >> spark-packages.org as a basis for your unittests. >> >> On Fri, Jul 10, 2015 at 12:03 PM, Daniel Siegmann < >> daniel.siegm...@teamaol.com> wrote: >> >>> On Fri, Jul 10, 2015 at 1:41 PM, Naveen Madhire &g

Unit tests of spark application

2015-07-10 Thread Naveen Madhire
Hi, I want to write junit test cases in scala for testing spark application. Is there any guide or link which I can refer. Thank you very much. -Naveen

DataFrame question

2015-07-07 Thread Naveen Madhire
Hi All, I am working with dataframes and have been struggling with this thing, any pointers would be helpful. I've a Json file with the schema like this, links: array (nullable = true) ||-- element: struct (containsNull = true) |||-- desc: string (nullable = true) |||-- id

Re: Has anyone run Python Spark application on Yarn-cluster mode ? (which has 3rd party Python modules to be shipped with)

2015-06-25 Thread Naveen Madhire
Hi Marcelo, Quick Question. I am using Spark 1.3 and using Yarn Client mode. It is working well, provided I have to manually pip-install all the 3rd party libraries like numpy etc to the executor nodes. So the SPARK-5479 fix in 1.5 which you mentioned fix this as well? Thanks. On Thu, Jun 25,

Re: How to set HBaseConfiguration in Spark

2015-05-20 Thread Naveen Madhire
Cloudera blog has some details. Please check if this is helpful to you. http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ Thanks. On Wed, May 20, 2015 at 4:21 AM, donhoff_h <165612...@qq.com> wrote: > Hi, all > > I wrote a program to get HBaseConfiguration object in Spar

Re: Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire
es with a: 24, Lines with b: 15 > > The exception seems to be happening with Spark cleanup after executing > your code. Try adding sc.stop() at the end of your program to see if the > exception goes away. > > > > > On Wednesday, December 31, 2014 6:40 AM, Naveen Madh

Fwd: Sample Spark Program Error

2014-12-31 Thread Naveen Madhire
Hi All, I am trying to run a sample Spark program using Scala SBT, Below is the program, def main(args: Array[String]) { val logFile = "E:/ApacheSpark/usb/usb/spark/bin/README.md" // Should be some file on your system val sc = new SparkContext("local", "Simple App", "E:/ApacheSpark/