It shouldn't?
This one com.databricks.spark.csv.util.TextFile has hadoop imports.
I figured out that the answer to my question is just to add
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0".
But i still wonder where is this 2.2.0 default comes from.
From: Mohit Jaggi
Hi Ratika,
I tried the following:
val l = List("apple", "orange", "banana")
var inner = new scala.collection.mutable.HashMap[String, List[String]]
inner.put("fruits",l)
var list = new scala.collection.mutable.HashMap[String,
scala.collection.mutable.HashMap[String, List[String]]]
list.put("fo
spark-csv should not depend on hadoop
On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik wrote:
> I would like to build spark-csv with Hadoop 2.6.0
> I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
> with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
>
>
We need to create RDDas below
JavaPairRDD>>>
The idea is we need to do lookup() on Key which will return a list of hash maps
kind of structure and then do lookup on subkey which is the key in the HashMap
returned
_
From: Silas Davis mailto:si...@silasdavis.net>>
This should be sent to the user mailing list, I think.
It depends what you want to do with the RDD, so yes you could throw around
(String, HashMap>) tuples or perhaps you'd like to be
able to groupByKey, reduceByKey on the key and sub-key as a composite in
which case JavaPairRDD, List> might be mo
Hi,
We have a need where we need the RDD with the following format
JavaPairRDD>>, mostly RDD with a Key and
Subkey kind of a structure, how is that doable in Spark ?
Thanks
R
Slave nodes..
Thanks,
Madhu.
Ratika Prasad
To
Madhusudanan
Should this be done on master or slave node or both ?
From: Madhusudanan Kandasamy [mailto:madhusuda...@in.ibm.com]
Sent: Wednesday, August 19, 2015 9:31 PM
To: Ratika Prasad
Cc: dev@spark.apache.org
Subject: Re: Unable to run the spark application in standalone cluster mode
Try Increasing the
Try Increasing the spark worker memory in conf/spark-env.sh
export SPARK_WORKER_MEMORY=2g
Thanks,
Madhu.
Ratika Prasad
Hi ,
We have a simple spark application which is running through when run locally on
master node as below
./bin/spark-submit --class
com.coupons.salestransactionprocessor.SalesTransactionDataPointCreation
--master local
sales-transaction-processor-0.0.1-SNAPSHOT-jar-with-dependencies.jar
But
I personally build with SBT and run Spark on YARN with IntelliJ. You need
to connect to remote JVMs with a remote debugger. You also need to do
similar, if you use Python, because it will launch a JVM on the driver
aswell.
On Wed, Aug 19, 2015 at 2:10 PM canan chen wrote:
> Thanks Ted. I notice
Thanks Ted. I notice another thread about running spark programmatically
(client mode for standalone and yarn). Would it be much easier to debug
spark if is is possible ? Hasn't anyone thought about it ?
On Wed, Aug 19, 2015 at 5:50 PM, Ted Yu wrote:
> See this thread:
>
>
> http://search-hadoo
See this thread:
http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+module&subj=Building+Spark+Building+just+one+module+
> On Aug 19, 2015, at 1:44 AM, canan chen wrote:
>
> I want to work on one jira, but it is not easy to do unit test, because it
> involves different components especi
I want to work on one jira, but it is not easy to do unit test, because it
involves different components especially UI. spark building is pretty slow,
I don't want to build it each time to test my code change. I am wondering
how other people do ? Is there any experience can share ? Thanks
14 matches
Mail list logo