Hi salexln, RDD's immutability depends on the underlying structure. I have the following example.
------------------------------------------------------------------------------------------------------------------ scala> val m = Array.fill(2, 2)(0) m: Array[Array[Int]] = Array(Array(0, 0), Array(0, 0)) scala> val rdd = sc.parallelize(m) rdd: org.apache.spark.rdd.RDD[Array[Int]] = ParallelCollectionRDD[1] at parallelize at <console>:23 scala> rdd.collect() res6: Array[Array[Int]] = Array(Array(0, 0), Array(0, 0)) scala> m(0)(1) = 2 scala> rdd.collect() res8: Array[Array[Int]] = Array(Array(0, 2), Array(0, 0)) ------------------------------------------------------------------------------------------------------------------ You see that variable rdd actually changes when its underlying array changes. Hopefully this helps you. Best, Ai On Mon, Dec 28, 2015 at 12:36 PM, salexln <sale...@gmail.com> wrote: > Hi guys, > I know the RDDs are immutable and therefore their value cannot be changed > but I see the following behaviour: > I wrote an implementation for FuzzyCMeans algorithm and now I'm testing it, > so i run the following example: > > import org.apache.spark.mllib.clustering.FuzzyCMeans > import org.apache.spark.mllib.linalg.Vectors > > val data = > sc.textFile("/home/development/myPrjects/R/butterfly/butterfly.txt") > val parsedData = data.map(s => Vectors.dense(s.split(' > ').map(_.toDouble))).cache() >> parsedData: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] >> = MapPartitionsRDD[2] at map at <console>:31 > > val numClusters = 2 > val numIterations = 20 > > parsedData.foreach{ point => println(point) } >> [0.0,-8.0] > [-3.0,-2.0] > [-3.0,0.0] > [-3.0,2.0] > [-2.0,-1.0] > [-2.0,0.0] > [-2.0,1.0] > [-1.0,0.0] > [0.0,0.0] > [1.0,0.0] > [2.0,-1.0] > [2.0,0.0] > [2.0,1.0] > [3.0,-2.0] > [3.0,0.0] > [3.0,2.0] > [0.0,8.0] > > val clusters = FuzzyCMeans.train(parsedData, numClusters, numIteration > parsedData.foreach{ point => println(point) } >> > [0.0,-0.4803333185624595] > [-0.1811743096972924,-0.12078287313152826] > [-0.06638890786148487,0.0] > [-0.04005925925925929,0.02670617283950619] > [-0.12193263222069807,-0.060966316110349035] > [-0.0512,0.0] > [NaN,NaN] > [-0.049382716049382706,0.0] > [NaN,NaN] > [0.006830134553650707,0.0] > [0.05120000000000002,-0.02560000000000001] > [0.04755220304297078,0.0] > [0.06581619798335057,0.03290809899167529] > [0.12010867103812725,-0.0800724473587515] > [0.10946638900458144,0.0] > [0.14814814814814817,0.09876543209876545] > [0.0,0.49119985188436205] > > > > But how can this be that my method changes the Immutable RDD? > > BTW, the signature of the train method, is the following: > > train( data: RDD[Vector], clusters: Int, maxIterations: Int) > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Vector-Immutability-issue-tp15827.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > -- Best Ai --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org