You are right. I was looking at the wrong logs. I ran it on my local machine and saw that the println actually wrote the vertexIds. I was then able to find the same in the executors' logs in the remote machine.
Thanks for the clarification. On Mon, Feb 23, 2015 at 2:00 PM, Sean Owen <so...@cloudera.com> wrote: > Here, println isn't happening on the driver. Are you sure you are > looking at the right machine's logs? > > Yes this may be parallelized over many machines. > > On Mon, Feb 23, 2015 at 6:37 PM, kvvt <kvi...@vt.edu> wrote: > > In the snippet below, > > > > graph.edges.groupBy[VertexId](f1).foreach { > > edgesBySrc => { > > f2(edgesBySrc).foreach { > > vertexId => { > > *println(vertexId)* > > } > > } > > } > > } > > > > "f1" is a function that determines how to group the edges (in my case it > > groups by source vertex) > > "f2" is another function that does some computation on the edges. It > returns > > an iterable (Iterable[VertexId]). > > > > *Questions:* > > > > 1. The problem is that "println(vertexId)" doesn't printing anything. I > have > > made sure that "f2" doesn't return an empty iterable. I am not sure what > I > > am missing here. > > > > 2. I am assuming that "f2" is called for each group in parallel. Is this > > correct? If not, what is the correct way to operate on each group in > > parallel? > > > > > > Thanks! > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-groupBy-tp21773.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >