On Wed, Mar 12, 2014 at 3:06 PM, andy petrella <andy.petre...@gmail.com>wrote:

> Folks,
>
> I want just to pint something out...
> I didn't had time yet to sort it out and to think enough to give valuable
> strict explanation of -- event though, intuitively I feel they are a lot
> ===> need spark people or time to move forward.
> But here is the thing regarding *flatMap*.
>
> Actually, it looks like (and again intuitively makes sense) that RDD (and
> of course DStream) aren't monadic and it is reflected in the implementation
> (and signature) of flatMap.
>
> >
> > *  def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = **
> > new FlatMappedRDD(this, sc.clean(f))*
>
>
> There!? flatMap (or bind, >>=) should take a function that use the same
> Higher level abstraction in order to be considered as such right?
>
>
I had remarked exactly the same thing and asked myself the same question...

In this case, it takes a function that returns a TraversableOnce which is
> the type of the content of the RDD, and what represent the output is more
> the content of the RDD than the RDD itself (still right?).
>
> This actually breaks the understand of map and flatMap
>
> > *def map[U: ClassTag](f: T => U): RDD[U] = new MappedRDD(this,
> > sc.clean(f))*
>
>
> Indeed, RDD is a functor and the underlying reason for flatMap to not take
> A => RDD[B] doesn't show up in map.
>
> This has a lot of consequence actually, because at first one might want to
> create for-comprehension over RDDs, of even Traversable[F[_]] functions
> like sequence -- and he will get stuck since the signature aren't
> compliant.
> More importantly, Scala uses convention on the structure of a type to allow
> for-comp... so where Traversable[F[_]] will fail on type, for-comp will
> failed weirdly.
>

+1


>
> Again this signature sounds normal, because my intuitive feeling about RDDs
> is that they *only can* be monadic but the composition would depend on the
> use case and might have heavy consequences (unioning the RDDs for instance
> => this happening behind the sea can be a big pain, since it wouldn't be
> efficient at all).
>
> So Yes, RDD could be monadic but with care.
>

At least we can say, it is a Functor...
Actually, I had imagined studying the monadic aspect of RDDs but as you
said, it's not so easy...
So for now, I consider them as pseudo-monadic ;)



> So what exposes this signature is a way to flatMap over the inner value,
> like it is almost the case for Map (flatMapValues)
>
> So, wouldn't be better to rename flatMap as flatMapData (or whatever better
> name)? Or to have flatMap requiring a Monad instance of RDD?
>
>
renaming is to flatMapData or flatTraversableMap sounds good to me (even if
lots of people will hate it...)
flatMap requiring a Monad would make it impossible to use with
for-comprehension certainly no?


> Sorry for the prose, just dropped my thoughts and feelings at once :-/
>
>
I agree with you in case it can help not to feel alone ;)

Pascal

Cheers,
> andy
>
> PS: and my English maybe, although my name's Andy I'm a native Belgian ^^.
>

Reply via email to