[Thanks a *lot* for your answers!]

That's CoOl, a possible example would be to simply write a
for-comprehension that would do this:
>
> val allEvents = for {
>   deviceId      <- rddFromHdfsOfDeviceId
>   deviceEvent <- rddFromHdfsOfDeviceEvent(deviceId)
> } deviceEvent
> val hist = computeHistOf(allEvents)


It's just an example huh ( :-D )

Also here is the answer I was preparing to your previous answer

yes, it's what does
> And it makes a lot of sense because RDD is """"just"""" an abstraction of
> the whole sequence of data and distribute it in a resilient fashion.
> So RDD#flatMap would mean: distribute in a resilient fashion a sequence of
> RDDs before aggregate (or consolidate or what not verb ^^) them all.
> A example to explain myself maybe ;-) (and maybe also to confirm my own
> understanding)
> * data (logical representation)
>  > [A1, A2, A3, A4, A5, A6, A7, A8, A9, A10]
> * rdd-ized
> * > rdd1*[A1, A2, A3] *rdd2*[A4, A5] *rdd3*[A6, A7, A8, A9, A10]
> * map with f:A=>B
> * > rdd1'*[B1, B2, B3] *rdd2'*[B4, B5] *rdd3'*[B6, B7, B8, B9, B10]
> * current flatMap with f: A => Traversable[B]  (*not redistributed I
> suppose*)
>  > (intermediate step) *rdd1'*[[B11, B12], [B21] , []] *rdd2'*[[],
> [B41,B42,B43]] *rdd3'*[[B61], [B71, B72], [B81, B82, B83, B84], [], []]
>  > (final result)           *rdd1"*[B11, B12, B21] *rdd2**"*[B41,B42,B43]
> *rdd3**"*[B61, B71, B72, B81, B82, B83, B84]
> * for expression compliant version of flatMap with f: A => RDD[B]  (*could
> involve some redistributions I presume*)
>  > (intermediate step) *rdd1'*[*rdd11*[B11] *rdd11*[B12] *rdd12*[B21]]
> *rdd2'*[*rdd21*[B41] *rdd22*[B42, B43]] *rdd3'*[*rdd31*[B61] *rdd32*[B71,
> B72], *rdd33*[B81] *rdd34*[B82, B83] *rdd35*[B84]]
>  > (final result)           *rdd1**"*[B11, B12, B21] *rdd2**"*
> [B41,B42,B43] *rdd3**"*[B61, B71, B72, B81, B82, B83, B84]
>
> As you can see in the example I tried to depict here, the final result
> that would be seen is the same but in between there might have some
> redistribution (have a quick look at B81, B82, B83, B84).
> That's why, imho, it'd interesting to clear this out by either renaming
> the function -- or even awesomely better making flatMap able to
> redistribute in-between (which can have a big impact at the reconciliation
> => my first mail :-D).
> Tell me if I'm completely wrong ;-) -- or if I forget something in my
> actual understanding -- or if there is something similar already existing
> and I'm not aware of
> Cheers,
> andy
>
>
>

Andy Petrella
Belgium (Liège)

*       *********
 Data Engineer in *NextLab <http://nextlab.be/> sprl* (owner)
 Engaged Citizen Coder for *WAJUG <http://wajug.be/>* (co-founder)
 Author of *Learning Play! Framework 2
<http://www.packtpub.com/learning-play-framework-2/book>*
 Bio: on visify <https://www.vizify.com/es/52c3feec2163aa0010001eaa>
*       *********
Mobile: *+32 495 99 11 04*
Mails:

   - andy.petre...@nextlab.be
   - andy.petre...@gmail.com

*       *********
Socials:

   - Twitter: https://twitter.com/#!/noootsab
   - LinkedIn: http://be.linkedin.com/in/andypetrella
   - Blogger: http://ska-la.blogspot.com/
   - GitHub:  https://github.com/andypetrella
   - Masterbranch: https://masterbranch.com/andy.petrella



On Sat, Mar 15, 2014 at 7:06 PM, Koert Kuipers <ko...@tresata.com> wrote:

> just going head first without any thinking, it changed flatMap to
> flatMapData and added a flatMap. for FlatMappedRDD my compute is:
>
> firstParent[T].iterator(split, context).flatMap(f andThen (_.compute(split,
> context)))
>
>
> scala> val x = sc.parallelize(1 to 100)
> scala> x.flatMap _
> res0: (Int => org.apache.spark.rdd.RDD[Nothing]) =>
> org.apache.spark.rdd.RDD[Nothing] = <function1>
>
> my f for flatMap is now f: T => RDD[U], however, i am not sure how to write
> a useful function for this :)
>
>
>
> On Sat, Mar 15, 2014 at 1:17 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> > MappedRDD does:
> > firstParent[T].iterator(split, context).map(f)
> >
> > and FlatMappedRDD:
> > firstParent[T].iterator(split, context).flatMap(f)
> >
> > do yeah seems like its a map or flatMap over the iterator inside, not the
> > RDD itself, sort of...
> >
> >
> > On Sat, Mar 15, 2014 at 9:08 AM, andy petrella <andy.petre...@gmail.com
> >wrote:
> >
> >> Yep,
> >> Regarding flatMap and an implicit parameter might work like in scala's
> >> future for instance:
> >>
> >>
> https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246
> >>
> >> Dunno, still waiting for some insights from the team ^^
> >>
> >> andy
> >>
> >> On Wed, Mar 12, 2014 at 3:23 PM, Pascal Voitot Dev <
> >> pascal.voitot....@gmail.com> wrote:
> >>
> >> > On Wed, Mar 12, 2014 at 3:06 PM, andy petrella <
> andy.petre...@gmail.com
> >> > >wrote:
> >> >
> >> > > Folks,
> >> > >
> >> > > I want just to pint something out...
> >> > > I didn't had time yet to sort it out and to think enough to give
> >> valuable
> >> > > strict explanation of -- event though, intuitively I feel they are a
> >> lot
> >> > > ===> need spark people or time to move forward.
> >> > > But here is the thing regarding *flatMap*.
> >> > >
> >> > > Actually, it looks like (and again intuitively makes sense) that RDD
> >> (and
> >> > > of course DStream) aren't monadic and it is reflected in the
> >> > implementation
> >> > > (and signature) of flatMap.
> >> > >
> >> > > >
> >> > > > *  def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] =
> **
> >> > > > new FlatMappedRDD(this, sc.clean(f))*
> >> > >
> >> > >
> >> > > There!? flatMap (or bind, >>=) should take a function that use the
> >> same
> >> > > Higher level abstraction in order to be considered as such right?
> >> > >
> >> > >
> >> > I had remarked exactly the same thing and asked myself the same
> >> question...
> >> >
> >> > In this case, it takes a function that returns a TraversableOnce which
> >> is
> >> > > the type of the content of the RDD, and what represent the output is
> >> more
> >> > > the content of the RDD than the RDD itself (still right?).
> >> > >
> >> > > This actually breaks the understand of map and flatMap
> >> > >
> >> > > > *def map[U: ClassTag](f: T => U): RDD[U] = new MappedRDD(this,
> >> > > > sc.clean(f))*
> >> > >
> >> > >
> >> > > Indeed, RDD is a functor and the underlying reason for flatMap to
> not
> >> > take
> >> > > A => RDD[B] doesn't show up in map.
> >> > >
> >> > > This has a lot of consequence actually, because at first one might
> >> want
> >> > to
> >> > > create for-comprehension over RDDs, of even Traversable[F[_]]
> >> functions
> >> > > like sequence -- and he will get stuck since the signature aren't
> >> > > compliant.
> >> > > More importantly, Scala uses convention on the structure of a type
> to
> >> > allow
> >> > > for-comp... so where Traversable[F[_]] will fail on type, for-comp
> >> will
> >> > > failed weirdly.
> >> > >
> >> >
> >> > +1
> >> >
> >> >
> >> > >
> >> > > Again this signature sounds normal, because my intuitive feeling
> about
> >> > RDDs
> >> > > is that they *only can* be monadic but the composition would depend
> on
> >> > the
> >> > > use case and might have heavy consequences (unioning the RDDs for
> >> > instance
> >> > > => this happening behind the sea can be a big pain, since it
> wouldn't
> >> be
> >> > > efficient at all).
> >> > >
> >> > > So Yes, RDD could be monadic but with care.
> >> > >
> >> >
> >> > At least we can say, it is a Functor...
> >> > Actually, I had imagined studying the monadic aspect of RDDs but as
> you
> >> > said, it's not so easy...
> >> > So for now, I consider them as pseudo-monadic ;)
> >> >
> >> >
> >> >
> >> > > So what exposes this signature is a way to flatMap over the inner
> >> value,
> >> > > like it is almost the case for Map (flatMapValues)
> >> > >
> >> > > So, wouldn't be better to rename flatMap as flatMapData (or whatever
> >> > better
> >> > > name)? Or to have flatMap requiring a Monad instance of RDD?
> >> > >
> >> > >
> >> > renaming is to flatMapData or flatTraversableMap sounds good to me
> >> (even if
> >> > lots of people will hate it...)
> >> > flatMap requiring a Monad would make it impossible to use with
> >> > for-comprehension certainly no?
> >> >
> >> >
> >> > > Sorry for the prose, just dropped my thoughts and feelings at once
> :-/
> >> > >
> >> > >
> >> > I agree with you in case it can help not to feel alone ;)
> >> >
> >> > Pascal
> >> >
> >> > Cheers,
> >> > > andy
> >> > >
> >> > > PS: and my English maybe, although my name's Andy I'm a native
> Belgian
> >> > ^^.
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to