Re: RDD groupBy

Vijayasarathy Kannan Mon, 23 Feb 2015 11:10:48 -0800

You are right. I was looking at the wrong logs. I ran it on my local
machine and saw that the println actually wrote the vertexIds. I was then
able to find the same in the executors' logs in the remote machine.


Thanks for the clarification.

On Mon, Feb 23, 2015 at 2:00 PM, Sean Owen <so...@cloudera.com> wrote:

> Here, println isn't happening on the driver. Are you sure you are
> looking at the right machine's logs?
>
> Yes this may be parallelized over many machines.
>
> On Mon, Feb 23, 2015 at 6:37 PM, kvvt <kvi...@vt.edu> wrote:
> > In the snippet below,
> >
> > graph.edges.groupBy[VertexId](f1).foreach {
> >   edgesBySrc => {
> >     f2(edgesBySrc).foreach {
> >       vertexId => {
> >         *println(vertexId)*
> >       }
> >     }
> >   }
> > }
> >
> > "f1" is a function that determines how to group the edges (in my case it
> > groups by source vertex)
> > "f2" is another function that does some computation on the edges. It
> returns
> > an iterable (Iterable[VertexId]).
> >
> > *Questions:*
> >
> > 1. The problem is that "println(vertexId)" doesn't printing anything. I
> have
> > made sure that "f2" doesn't return an empty iterable. I am not sure what
> I
> > am missing here.
> >
> > 2. I am assuming that "f2" is called for each group in parallel. Is this
> > correct? If not, what is the correct way to operate on each group in
> > parallel?
> >
> >
> > Thanks!
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-groupBy-tp21773.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>

Re: RDD groupBy

Reply via email to