You might be interested in "Maximum Flow implementation on Spark GraphX" done
by a Colorado School of Mines grad student a couple of years ago.
http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx
From: Swapnil Shinde
To: u...@spark.ap
It's been reduced to a single line of code.
http://technicaltidbit.blogspot.com/2016/03/dataframedataset-swap-places-in-spark-20.html
From: Gerhard Fiedler
To: "dev@spark.apache.org"
Sent: Friday, June 3, 2016 9:01 AM
Subject: Where is DataFrame.scala in 2.0?
When I look at the
I see you've been burning the midnight oil.
From: Reynold Xin
To: "dev@spark.apache.org"
Sent: Friday, April 1, 2016 1:15 AM
Subject: [discuss] using deep learning to improve Spark
Hi all,
Hope you all enjoyed the Tesla 3 unveiling earlier tonight.
I'd like to bring your attention
Would it make sense (in terms of feasibility, code organization, and
politically) to have a JavaDataFrame, as a way to isolate the 1000+ extra lines
to a Java compatibility layer/class?
From: Reynold Xin
To: "dev@spark.apache.org"
Sent: Thursday, February 25, 2016 4:23 PM
Subject: [d
I believe that in the initialization portion of GraphX SVDPlusPluS, the
initialization of biases is incorrect. Specifically, in line
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
instead of
(vd._1, vd._2, msg.get._2 / msg.ge
Since RDDs are generally unordered, aren't things like textFile().first() not
guaranteed to return the first row (such as looking for a header row)? If so,
doesn't that make the example in
http://spark.apache.org/docs/1.2.1/quick-start.html#basics misleading?
---
1. Is IndexedRDD planned for 1.3?
https://issues.apache.org/jira/browse/SPARK-2365
2. Once IndexedRDD is in, is it planned to convert Word2VecModel to it from its
current Map[String,Array[Float]]?
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature/Wo
ose not immersed in data science
or AI and thus may have narrower appeal.
- Original Message -----
From: Evan R. Sparks
To: Matei Zaharia
Cc: Koert Kuipers ; Michael Malak ;
Patrick Wendell ; Reynold Xin ;
"dev@spark.apache.org"
Sent: Tuesday, January 27, 2015 9:55 AM
Subject: Re: renaming
And in the off chance that anyone hasn't seen it yet, the Jan. 13 Bay Area
Spark Meetup YouTube contained a wealth of background information on this idea
(mostly from Patrick and Reynold :-).
https://www.youtube.com/watch?v=YWppYPWznSQ
From: Patrick Wendell
To:
I created https://issues.apache.org/jira/browse/SPARK-5343 for this.
- Original Message -
From: Michael Malak
To: "dev@spark.apache.org"
Cc:
Sent: Monday, January 19, 2015 5:09 PM
Subject: GraphX ShortestPaths backwards?
GraphX ShortestPaths seems to be following edges
GraphX ShortestPaths seems to be following edges backwards instead of forwards:
import org.apache.spark.graphx._
val g = Graph(sc.makeRDD(Array((1L,""), (2L,""), (3L,""))),
sc.makeRDD(Array(Edge(1L,2L,""), Edge(2L,3L,""
lib.ShortestPaths.run(g,Array(3)).vertices.collect
res1: Array[(org.apac
But wouldn't the gain be greater under something similar to EdgePartition1D
(but perhaps better load-balanced based on number of edges for each vertex) and
an algorithm that primarily follows edges in the forward direction?
From: Ankur Dave
To: Michael Malak
Cc: "dev@spark.
Does GraphX make an effort to co-locate vertices onto the same workers as the
majority (or even some) of its edges?
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache
According to:
https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting
"Note that TriangleCount requires the edges to be in canonical orientation
(srcId < dstId)"
But isn't this overstating the requirement? Isn't the requirement really that
IF there are duplicate ed
Thank you. I created
https://issues.apache.org/jira/browse/SPARK-5064
- Original Message -
From: xhudik
To: dev@spark.apache.org
Cc:
Sent: Saturday, January 3, 2015 2:04 PM
Subject: Re: GraphX rmatGraph hangs
Hi Michael,
yes, I can confirm the behavior.
It get stuck (loop?) and eat a
The following single line just hangs, when executed in either Spark Shell or
standalone:
org.apache.spark.graphx.util.GraphGenerators.rmatGraph(sc, 4, 8)
It just outputs "0 edges" and then locks up.
The only other information I've found via Google is:
http://mail-archives.apache.org/mod_mbox/sp
At Spark Summit, Patrick Wendell indicated the number of MLlib algorithms would
"roughly double" in 1.1 from the current approx. 15.
http://spark-summit.org/wp-content/uploads/2014/07/Future-of-Spark-Patrick-Wendell.pdf
What are the planned additional algorithms?
In Jira, I only see two when fil
Shouldn't I be seeing N2 and N4 in the output below? (Spark 0.9.0 REPL) Or am I
missing something fundamental?
val nodes = sc.parallelize(Array((1L, "N1"), (2L, "N2"), (3L, "N3"), (4L,
"N4"), (5L, "N5")))
val edges = sc.parallelize(Array(Edge(1L, 2L, "E1"), Edge(1L, 3L, "E2"),
Edge(2L, 4L, "E
While developers may appreciate "1.0 == API stability," I'm not sure that will
be the understanding of the VP who gives the green light to a Spark-based
development effort.
I fear a bug that silently produces erroneous results will be perceived like
the FDIV bug, but in this case without the mo
When using map() and lookup() in conjunction, I get an exception (each
independently works fine). I'm using Spark 0.9.0/Scala 2.10.3
val a = sc.parallelize(Array(11))
val m = sc.parallelize(Array((11,21)))
a.map(m.lookup(_)(0)).collect
14/05/14 15:03:35 ERROR Executor: Exception in task ID 23
sc
Is it permissible to use a custom class (as opposed to e.g. the built-in String
or Int) for the key in groupByKey? It doesn't seem to be working for me on
Spark 0.9.0/Scala 2.10.3:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
class C(val s:String) extends Serializ
12)))
r: org.apache.spark.rdd.RDD[(C, Int)] = ParallelCollectionRDD[3] at parallelize
at :14
scala> r.lookup(new C("a"))
:17: error: type mismatch;
found : C
required: C
r.lookup(new C("a"))
^
On Tuesday, May 13, 2014 3:05 PM, Ana
Reposting here on dev since I didn't see a response on user:
I'm seeing different Serializable behavior in Spark Shell vs. Scala Shell. In
the Spark Shell, equals() fails when I use the canonical equals() pattern of
match{}, but works when I subsitute with isInstanceOf[]. I am using Spark
0.9.0
23 matches
Mail list logo