Hi everyone,
I am facing a requirement that I want to split one RDD into some small ones:
but I want to split it according to its Key element value , e.g: for those
its key is X, they gonna be in RDD1; for those its key is Y, they gonna be
in RDD2 , and so on.
I know it has a routine ca
Hello
I am interested in contributing to apache spark.I am new to open source.Can
someone please help me with how to get started,beginner level bugs etc.
Thanks
Abhishek Kumar
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Getting-started-tp14588.htm
Please see
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
On Tue, Oct 13, 2015 at 5:49 AM, _abhishek
wrote:
> Hello
> I am interested in contributing to apache spark.I am new to open source.Can
> someone please help me with how to get started,beginner level bugs etc.
> T
I think you will need to use the partitionBy method
.partitionBy(no of partitions, lambda that returns a partitioner)
Thanks,
PK
From: 张志强(旺轩) [mailto:zzq98...@alibaba-inc.com]
Sent: Tuesday, October 13, 2015 4:17 AM
To: dev@spark.apache.org
Subject: How to split one RDD to small ones a
Hi,
I came across the spark listener API while checking out possible UI
extensions recently. I noticed that all events inherit from a sealed trait
`SparkListenerEvent` and that a SparkListener has a corresponding
`onEventXXX(event)` method for every possible event.
Considering that events inherit
the path of the source file defining the event API is
`core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala`
On 13 October 2015 at 16:29, Jakob Odersky wrote:
> Hi,
> I came across the spark listener API while checking out possible UI
> extensions recently. I noticed that all event
Check out SparkFirehoseListener, an adapter which forwards all events to a
single `onEvent` method in order to let you do pattern-matching as you have
described:
https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/SparkFirehoseListener.java
On Tue, Oct 13, 2015 at 4:29
We have a scenario that events from three kafka topics sharing the same
keys need to be merged. One topic has the master events; most events in
other two topics arrive within 10 minutes of master event arrival. Wrote
pseudo code below. I'd love to hear your thoughts whether I am on the right
track.
I look at the source code of spark, but didn't find where python program is
started in python.
It seems spark-submit will call PythonGatewayServer, but where is python
program started ?
Thanks
See PythonRunner @
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
On Tue, Oct 13, 2015 at 7:50 PM, canan chen wrote:
> I look at the source code of spark, but didn't find where python program
> is started in python.
>
> It seems spark-s
I think PythonRunner is launched when executing python script.
PythonGatewayServer is entry point for python spark shell
if (args.isPython && deployMode == CLIENT) {
if (args.primaryResource == PYSPARK_SHELL) {
args.mainClass = "org.apache.spark.api.python.PythonGatewayServer"
} else {
11 matches
Mail list logo