I would also add, from a data locality theoretic standpoint, mapPartitions()
provides for node-local computation that plain old map-reduce does not.
From my Android phone on T-Mobile. The first nationwide 4G network.
Original message
From: Ashic Mahtab
Date: 06/28/2015 10:5
Will Spark 2.0 Structured Streaming obviate some of the Druid/Spark use cases?
From: Raymond Honderdors
To: "yuzhih...@gmail.com"
Cc: "user@spark.apache.org"
Sent: Wednesday, March 23, 2016 8:43 AM
Subject: Re: Spark with Druid
I saw these but i fail to understand how to direct th
In terms of publication date, a paper on Nephele was published in 2009, prior
to the 2010 USENIX paper on Spark. Nephele is the execution engine of
Stratosphere, which became Flink.
From: Mark Hamstra
To: Mich Talebzadeh
Cc: Corey Nolet ; "user @spark"
Sent: Sunday, April 17, 2016 3:
There have been commercial CEP solutions for decades, including from my
employer.
From: Mich Talebzadeh
To: Mark Hamstra
Cc: Corey Nolet ; "user @spark"
Sent: Sunday, April 17, 2016 3:48 PM
Subject: Re: Apache Flink
The problem is that the strength and wider acceptance of a typic
As with all history, "what if"s are not scientifically testable hypotheses, but
my speculation is the energy (VCs, startups, big Internet companies,
universities) within Silicon Valley contrasted to Germany.
From: Mich Talebzadeh
To: Michael Malak ; "user @spark"
http://go.databricks.com/apache-spark-2.0-presented-by-databricks-co-founder-reynold-xin
From: Sourav Mazumder
To: user
Sent: Wednesday, April 20, 2016 11:07 AM
Subject: Spark 2.0 forthcoming features
Hi All,
Is there somewhere we can get idea of the upcoming features in Spark 2
At first glance, it looks like the only streaming data sources available out of
the box from the github master branch are
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala
and
https://github.com/apache/spark/blob/
Yes, it is possible to use GraphX from Java but it requires 10x the amount of
code and involves using obscure typing and pre-defined lambda prototype
facilities. I give an example of it in my book, the source code for which can
be downloaded for free from
https://www.manning.com/books/spark-gra
Yes. And a paper that describes using grids (actually varying grids) is
http://research.microsoft.com/en-us/um/people/jingdw/pubs%5CCVPR12-GraphConstruction.pdf
In the Spark GraphX In Action book that Robin East and I are writing, we
implement a drastically simplified version of this in chapter
In chapter 10 of Spark GraphX In Action, we describe how to use Zeppelin with
d3.js to render graphs using d3's force-directed rendering algorithm. The
source code can be downloaded for free from
https://www.manning.com/books/spark-graphx-in-action
From: agc studio
To: user@spark.apache.
Chapter 6 of my book implements Dijkstra's Algorithm. The source code is
available to download for free.
https://www.manning.com/books/spark-graphx-in-action
From: Brian Wilson
To: user@spark.apache.org
Sent: Monday, October 24, 2016 7:11 AM
Subject: Shortest path with directed and
You might be interested in "Maximum Flow implementation on Spark GraphX" done
by a Colorado School of Mines grad student a couple of years ago.
http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx
From: Swapnil Shinde
To: user@spark.ap
But isn't foldLeft() overkill for the originally stated use case of max diff of
adjacent pairs? Isn't foldLeft() for recursive non-commutative non-associative
accumulation as opposed to an embarrassingly parallel operation such as this
one?
This use case reminds me of FIR filtering in DSP. It se
Can my new book, Spark GraphX In Action, which is currently in MEAP
http://manning.com/malak/, be added to
https://spark.apache.org/documentation.html and, if appropriate, to
https://spark.apache.org/graphx/ ?
Michael Malak
You could have your receiver send a "magic value" when it is done. I discuss
this Spark Streaming pattern in my presentation "Spark Gotchas and
Anti-Patterns". In the PDF version, it's slides
34-36.http://www.datascienceassn.org/content/2014-11-05-spark-gotchas-and-anti-patterns-julia-language
http://www.datascienceassn.org/content/making-sense-making-sense-performance-data-analytics-frameworks
From: "bit1...@163.com"
To: user
Sent: Monday, April 27, 2015 8:33 PM
Subject: Why Spark is much faster than Hadoop MapReduce even on disk
#yiv1713360705 body {line-height:1.5;}
How about a treeReduceByKey? :-)
On Friday, June 20, 2014 11:55 AM, DB Tsai wrote:
Currently, the reduce operation combines the result from mapper
sequentially, so it's O(n).
Xiangrui is working on treeReduce which is O(log(n)). Based on the
benchmark, it dramatically increase the performan
It's really more of a Scala question than a Spark question, but the standard OO
(not Scala-specific) way is to create your own custom supertype (e.g.
MyCollectionTrait), inherited/implemented by two concrete classes (e.g. MyRDD
and MyArray), each of which manually forwards method calls to the co
Depending on the density of your keys, the alternative signature
def updateStateByKey[S](updateFunc: (Iterator[(K, Seq[V], Option[S])]) ?
Iterator[(K, S)], partitioner: Partitioner, rememberPartitioner:
Boolean)(implicit arg0: ClassTag[S]): DStream[(K, S)]
at least iterates by key rather than
On Wednesday, October 22, 2014 9:06 AM, Sean Owen wrote:
> No, there's no such thing as an RDD of RDDs in Spark.
> Here though, why not just operate on an RDD of Lists? or a List of RDDs?
> Usually one of these two is the right approach whenever you feel
> inclined to operate on an RDD of RDDs.
Asim Jalis writes:
>
> Thanks. Another question. I have event data with timestamps. I want to
> create a sliding window
> using timestamps. Some windows will have a lot of events in them others
> won’t. Is there a way
> to get an RDD made of this kind of a variable length window?
You should c
"looks like Spark outperforms Stratosphere fairly consistently in the
experiments"
There was one exception the paper noted, which was when memory resources were
constrained. In that case, Stratosphere seemed to have degraded more gracefully
than Spark, but the author did not explore it deeper.
Is this a bug?
scala> sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))
scala> sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect
res1: Array[(Long, Int)] = Array((2,11))
s the ASF Jira system will let me
reset my password.
On Sunday, May 11, 2014 4:40 AM, Michael Malak wrote:
Is this a bug?
scala> sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))
scala> sc.parallelize(1L to 2L,4).zip(sc.par
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I subsitute with isInstanceOf[]. I am using Spark
0.9.0/Scala 2.10.3.
Is this a bug?
Spark Shell (equals uses match
Mohit Jaggi:
A workaround is to use zipWithIndex (to appear in Spark 1.0, but if you're
still on 0.9x you can swipe the code from
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
), map it to (x => (x._2,x._1)) and then sortByKey.
Sp
26 matches
Mail list logo