Subscribe

2015-08-16 Thread Rishitesh Mishra

grpah x issue spark 1.3

2015-08-16 Thread dizzy5112
Hi using spark 1.3 and trying some sample code: when i run: all works well but with it falls over and i get a whole heap of errors: Is anyone else experiencing this? Ive tried different graphs and always end up with the same results. thanks -- View this message in context: http://apache-

Re: Apache Spark - Parallel Processing of messages from Kafka - Java

2015-08-16 Thread Hemant Bhanawat
In spark, every action (foreach, collect etc.) gets converted into a spark job and jobs are executed sequentially. You may want to refactor your code in calculateUseCase? to just run transformations (map, flatmap) and call a single action in the end. On Sun, Aug 16, 2015 at 3:19 PM, mohanaugust

Understanding the two jobs run with spark sql join

2015-08-16 Thread Todd
Hi,I have a basic spark sql join run in the local mode. I checked the UI,and see that there are two jobs are run. There DAG graph are pasted at the end. I have several questions here: 1. Looks that Job0 and Job1 all have the same DAG Stages, but the stage 3 and stage4 are skipped. I would ask wha

Re: Spark Master HA on YARN

2015-08-16 Thread Jeff Zhang
To make it clear, Spark Standalone is similar to Yarn as a simple cluster management system. Spark Master <---> Yarn Resource Manager Spark Worker <---> Yarn Node Manager On Mon, Aug 17, 2015 at 4:59 AM, Ruslan Dautkhanov wrote: > There is no Spark master in YARN mode. It's standalone mo

Re: SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-16 Thread Jeff Zhang
Check module example's dependency (right click examples and click Open Modules Settings), by default scala-library is provided, you need to change it to compile to run SparkPi in Intellij. As I remember, you also need to change guava and jetty related library to compile too. On Mon, Aug 17, 2015 a

Re: Can't find directory after resetting REPL state

2015-08-16 Thread Kevin Jung
Thanks Ted, it may be a bug. This is a jira ticket. https://issues.apache.org/jira/browse/SPARK-10039 Kevin --- Original Message --- Sender : Ted Yu Date : 2015-08-16 11:29 (GMT+09:00) Title : Re: Can't find directory after resetting REPL state I tried with master branch and got the fol

Re: Spark Master HA on YARN

2015-08-16 Thread Ruslan Dautkhanov
There is no Spark master in YARN mode. It's standalone mode terminology. In YARN cluster mode, Spark's Application Master (Spark Driver runs in it) will be restarted automatically by RM up to yarn.resourcemanager.am.max-retries times (default is 2). -- Ruslan Dautkhanov On Fri, Jul 17, 2015 at 1:

Re: Spark can't fetch application jar after adding it to HTTP server

2015-08-16 Thread Rishi Yadav
can you tell more about your environment. I understand you are running it on a single machine but is firewall enabled? On Sun, Aug 16, 2015 at 5:47 AM, t4ng0 wrote: > Hi > > I am new to spark and trying to run standalone application using > spark-submit. Whatever i could understood, from logs is

Example code to spawn multiple threads in driver program

2015-08-16 Thread unk1102
Hi I have Spark driver program which has one loop which iterates for around 2000 times and for two thousands times it executes jobs in YARN. Since loop will do the job serially I want to introduce parallelism If I create 2000 tasks/runnable/callable in my Spark driver program will it get executed i

Spark executor lost because of time out even after setting quite long time out value 1000 seconds

2015-08-16 Thread unk1102
Hi I have written Spark job which seems to be working fine for almost an hour and after that executor start getting lost because of timeout I see the following in log statement 15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with no recent heartbeats: 1051638 ms exceeds timeou

SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-16 Thread xiaohe lan
Hi, I am trying to run SparkPi in Intellij and getting NoClassDefFoundError. Anyone else saw this issue before ? Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0

Spark on scala 2.11 build fails due to incorrect jline dependency in REPL

2015-08-16 Thread Stephen Boesch
I am building spark with the following options - most notably the **scala-2.11**: . dev/switch-to-scala-2.11.sh mvn -Phive -Pyarn -Phadoop-2.6 -Dhadoop2.6.2 -Pscala-2.11 -DskipTests -Dmaven.javadoc.skip=true clean package The build goes pretty far but fails in one of the minor modules *repl

Re: Difference between Sort based and Hash based shuffle

2015-08-16 Thread Muhammad Haseeb Javed
I did check it out and although I did get a general understanding of the various classes used to implement Sort and Hash shuffles, however these slides lack details as to how they are implemented and why sort generally has better performance than hash On Sun, Aug 16, 2015 at 4:31 AM, Ravi Kiran w

Spark cant fetch the added jar to http server

2015-08-16 Thread t4ng0
Hi I have been trying to run standalone application using spark-submit but somehow spark started the http server and added jar file to it but it is unable to fetch the jar file. I am running the spark-cluster on localhost. If anyone can help me to find what i am missing here, thanks in advance.

Re: Error: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

2015-08-16 Thread Rishi Yadav
try --jars rather than --class to submit jar. On Fri, Aug 14, 2015 at 6:19 AM, Stephen Boesch wrote: > The NoClassDefFoundException differs from ClassNotFoundException : it > indicates an error while initializing that class: but the class is found in > the classpath. Please provide the full st

Re: How to submit an application using spark-submit

2015-08-16 Thread t4ng0
Hi I have been trying to run standalone application using spark-submit but somehow spark started the http server and added jar file to it but it is unable to fetch the jar file. I am running the spark-cluster on localhost. If anyone can help me to find what i am missing here, thanks in advance.

Re: Re: Can't understand the size of raw RDD and its DataFrame

2015-08-16 Thread Rishi Yadav
Dataframes in simple terms are RDDs combined with Schema. In reality they are much more than that and provide a very fine level of optimization, Check out project Tungsten. In your case it was one column as you chose. By default, it keeps same columns as in RDD (same as field of a case class if yo

Spark can't fetch application jar after adding it to HTTP server

2015-08-16 Thread t4ng0
Hi I am new to spark and trying to run standalone application using spark-submit. Whatever i could understood, from logs is that spark can't fetch the jar file after adding it to the http server. Do i need to configure proxy settings for spark too individually if it is a problem. Otherwise please

Re: Executors on multiple nodes

2015-08-16 Thread Sandy Ryza
Hi Mohit, It depends on whether dynamic allocation is turned on. If not, the number of executors is specified by the user with the --num-executors option. If dynamic allocation is turned on, refer to the doc for details: https://spark.apache.org/docs/1.4.0/job-scheduling.html#dynamic-resource-al

Apache Spark - Parallel Processing of messages from Kafka - Java

2015-08-16 Thread mohanaugust
JavaPairReceiverInputDStream messages = KafkaUtils.createStream(...); JavaPairDStream filteredMessages = filterValidMessages(messages); JavaDStream useCase1 = calculateUseCase1(filteredMessages); JavaDStream useCase2 = calculateUseCase2(filteredMessages); JavaDStream useCase3 = calculateUseCase3(f

Re: TestSQLContext compilation error when run SparkPi in Intellij ?

2015-08-16 Thread canan chen
Thanks Andrew. On Sun, Aug 16, 2015 at 1:53 PM, Andrew Or wrote: > Hi Canan, TestSQLContext is no longer a singleton but now a class. It is > never meant to be a fully public API, but if you wish to use it you can > just instantiate a new one: > > val sqlContext = new TestSQLContext > > or jus

Spark hangs on collect (stuck on scheduler delay)

2015-08-16 Thread Sagi r
Hi, I'm building a spark application in which I load some data from an Elasticsearch cluster (using latest elasticsearch-hadoop connector) and continue to perform some calculations on the spark cluster. In one case, I use collect on the RDD as soon as it is created (loaded from ES). However, it is