Re: [spark-csv] how to build with Hadoop 2.6.0?

2015-08-19 Thread Gil Vernik
It shouldn't? This one com.databricks.spark.csv.util.TextFile has hadoop imports. I figured out that the answer to my question is just to add libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0". But i still wonder where is this 2.2.0 default comes from. From: Mohit Jaggi

Re: Creating RDD with key and Subkey

2015-08-19 Thread Ranjana Rajendran
Hi Ratika, I tried the following: val l = List("apple", "orange", "banana") var inner = new scala.collection.mutable.HashMap[String, List[String]] inner.put("fruits",l) var list = new scala.collection.mutable.HashMap[String, scala.collection.mutable.HashMap[String, List[String]]] list.put("fo

Re: [spark-csv] how to build with Hadoop 2.6.0?

2015-08-19 Thread Mohit Jaggi
spark-csv should not depend on hadoop On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik wrote: > I would like to build spark-csv with Hadoop 2.6.0 > I noticed that when i build it with sbt/sbt ++2.10.4 package it build it > with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository). > >

Re: Creating RDD with key and Subkey

2015-08-19 Thread Ratika Prasad
We need to create RDDas below JavaPairRDD>>> The idea is we need to do lookup() on Key which will return a list of hash maps kind of structure and then do lookup on subkey which is the key in the HashMap returned _ From: Silas Davis mailto:si...@silasdavis.net>>

Re: Creating RDD with key and Subkey

2015-08-19 Thread Silas Davis
This should be sent to the user mailing list, I think. It depends what you want to do with the RDD, so yes you could throw around (String, HashMap>) tuples or perhaps you'd like to be able to groupByKey, reduceByKey on the key and sub-key as a composite in which case JavaPairRDD, List> might be mo

Creating RDD with key and Subkey

2015-08-19 Thread Ratika Prasad
Hi, We have a need where we need the RDD with the following format JavaPairRDD>>, mostly RDD with a Key and Subkey kind of a structure, how is that doable in Spark ? Thanks R

RE: Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Madhusudanan Kandasamy
Slave nodes.. Thanks, Madhu. Ratika Prasad To Madhusudanan

RE: Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Ratika Prasad
Should this be done on master or slave node or both ? From: Madhusudanan Kandasamy [mailto:madhusuda...@in.ibm.com] Sent: Wednesday, August 19, 2015 9:31 PM To: Ratika Prasad Cc: dev@spark.apache.org Subject: Re: Unable to run the spark application in standalone cluster mode Try Increasing the

Re: Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Madhusudanan Kandasamy
Try Increasing the spark worker memory in conf/spark-env.sh export SPARK_WORKER_MEMORY=2g Thanks, Madhu. Ratika Prasad

Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Ratika Prasad
Hi , We have a simple spark application which is running through when run locally on master node as below ./bin/spark-submit --class com.coupons.salestransactionprocessor.SalesTransactionDataPointCreation --master local sales-transaction-processor-0.0.1-SNAPSHOT-jar-with-dependencies.jar But

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Zoltán Zvara
I personally build with SBT and run Spark on YARN with IntelliJ. You need to connect to remote JVMs with a remote debugger. You also need to do similar, if you use Python, because it will launch a JVM on the driver aswell. On Wed, Aug 19, 2015 at 2:10 PM canan chen wrote: > Thanks Ted. I notice

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread canan chen
Thanks Ted. I notice another thread about running spark programmatically (client mode for standalone and yarn). Would it be much easier to debug spark if is is possible ? Hasn't anyone thought about it ? On Wed, Aug 19, 2015 at 5:50 PM, Ted Yu wrote: > See this thread: > > > http://search-hadoo

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+module&subj=Building+Spark+Building+just+one+module+ > On Aug 19, 2015, at 1:44 AM, canan chen wrote: > > I want to work on one jira, but it is not easy to do unit test, because it > involves different components especi

What's the best practice for developing new features for spark ?

2015-08-19 Thread canan chen
I want to work on one jira, but it is not easy to do unit test, because it involves different components especially UI. spark building is pretty slow, I don't want to build it each time to test my code change. I am wondering how other people do ? Is there any experience can share ? Thanks