How to split one RDD to small ones according to its key's value

2015-10-13 Thread 张志强(旺轩)
Hi everyone, I am facing a requirement that I want to split one RDD into some small ones: but I want to split it according to its Key element value , e.g: for those its key is X, they gonna be in RDD1; for those its key is Y, they gonna be in RDD2 , and so on. I know it has a routine ca

Getting started

2015-10-13 Thread _abhishek
Hello I am interested in contributing to apache spark.I am new to open source.Can someone please help me with how to get started,beginner level bugs etc. Thanks Abhishek Kumar -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Getting-started-tp14588.htm

Re: Getting started

2015-10-13 Thread Ted Yu
Please see https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Tue, Oct 13, 2015 at 5:49 AM, _abhishek wrote: > Hello > I am interested in contributing to apache spark.I am new to open source.Can > someone please help me with how to get started,beginner level bugs etc. > T

RE: How to split one RDD to small ones according to its key's value

2015-10-13 Thread PK Gnanam
I think you will need to use the partitionBy method .partitionBy(no of partitions, lambda that returns a partitioner) Thanks, PK From: 张志强(旺轩) [mailto:zzq98...@alibaba-inc.com] Sent: Tuesday, October 13, 2015 4:17 AM To: dev@spark.apache.org Subject: How to split one RDD to small ones a

Spark Event Listener

2015-10-13 Thread Jakob Odersky
Hi, I came across the spark listener API while checking out possible UI extensions recently. I noticed that all events inherit from a sealed trait `SparkListenerEvent` and that a SparkListener has a corresponding `onEventXXX(event)` method for every possible event. Considering that events inherit

Re: Spark Event Listener

2015-10-13 Thread Jakob Odersky
the path of the source file defining the event API is `core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala` On 13 October 2015 at 16:29, Jakob Odersky wrote: > Hi, > I came across the spark listener API while checking out possible UI > extensions recently. I noticed that all event

Re: Spark Event Listener

2015-10-13 Thread Josh Rosen
Check out SparkFirehoseListener, an adapter which forwards all events to a single `onEvent` method in order to let you do pattern-matching as you have described: https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkFirehoseListener.java On Tue, Oct 13, 2015 at 4:29

[Streaming] join events in last 10 minutes

2015-10-13 Thread Daniel Li
We have a scenario that events from three kafka topics sharing the same keys need to be merged. One topic has the master events; most events in other two topics arrive within 10 minutes of master event arrival. Wrote pseudo code below. I'd love to hear your thoughts whether I am on the right track.

When does python program started in pyspark

2015-10-13 Thread canan chen
I look at the source code of spark, but didn't find where python program is started in python. It seems spark-submit will call PythonGatewayServer, but where is python program started ? Thanks

Re: When does python program started in pyspark

2015-10-13 Thread skaarthik oss
See PythonRunner @ https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala On Tue, Oct 13, 2015 at 7:50 PM, canan chen wrote: > I look at the source code of spark, but didn't find where python program > is started in python. > > It seems spark-s

Re: When does python program started in pyspark

2015-10-13 Thread canan chen
I think PythonRunner is launched when executing python script. PythonGatewayServer is entry point for python spark shell if (args.isPython && deployMode == CLIENT) { if (args.primaryResource == PYSPARK_SHELL) { args.mainClass = "org.apache.spark.api.python.PythonGatewayServer" } else {