just going head first without any thinking, it changed flatMap to
flatMapData and added a flatMap. for FlatMappedRDD my compute is:

firstParent[T].iterator(split, context).flatMap(f andThen (_.compute(split,
context)))


scala> val x = sc.parallelize(1 to 100)
scala> x.flatMap _
res0: (Int => org.apache.spark.rdd.RDD[Nothing]) =>
org.apache.spark.rdd.RDD[Nothing] = <function1>

my f for flatMap is now f: T => RDD[U], however, i am not sure how to write
a useful function for this :)



On Sat, Mar 15, 2014 at 1:17 PM, Koert Kuipers <ko...@tresata.com> wrote:

> MappedRDD does:
> firstParent[T].iterator(split, context).map(f)
>
> and FlatMappedRDD:
> firstParent[T].iterator(split, context).flatMap(f)
>
> do yeah seems like its a map or flatMap over the iterator inside, not the
> RDD itself, sort of...
>
>
> On Sat, Mar 15, 2014 at 9:08 AM, andy petrella <andy.petre...@gmail.com>wrote:
>
>> Yep,
>> Regarding flatMap and an implicit parameter might work like in scala's
>> future for instance:
>>
>> https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246
>>
>> Dunno, still waiting for some insights from the team ^^
>>
>> andy
>>
>> On Wed, Mar 12, 2014 at 3:23 PM, Pascal Voitot Dev <
>> pascal.voitot....@gmail.com> wrote:
>>
>> > On Wed, Mar 12, 2014 at 3:06 PM, andy petrella <andy.petre...@gmail.com
>> > >wrote:
>> >
>> > > Folks,
>> > >
>> > > I want just to pint something out...
>> > > I didn't had time yet to sort it out and to think enough to give
>> valuable
>> > > strict explanation of -- event though, intuitively I feel they are a
>> lot
>> > > ===> need spark people or time to move forward.
>> > > But here is the thing regarding *flatMap*.
>> > >
>> > > Actually, it looks like (and again intuitively makes sense) that RDD
>> (and
>> > > of course DStream) aren't monadic and it is reflected in the
>> > implementation
>> > > (and signature) of flatMap.
>> > >
>> > > >
>> > > > *  def flatMap[U: ClassTag](f: T => TraversableOnce[U]): RDD[U] = **
>> > > > new FlatMappedRDD(this, sc.clean(f))*
>> > >
>> > >
>> > > There!? flatMap (or bind, >>=) should take a function that use the
>> same
>> > > Higher level abstraction in order to be considered as such right?
>> > >
>> > >
>> > I had remarked exactly the same thing and asked myself the same
>> question...
>> >
>> > In this case, it takes a function that returns a TraversableOnce which
>> is
>> > > the type of the content of the RDD, and what represent the output is
>> more
>> > > the content of the RDD than the RDD itself (still right?).
>> > >
>> > > This actually breaks the understand of map and flatMap
>> > >
>> > > > *def map[U: ClassTag](f: T => U): RDD[U] = new MappedRDD(this,
>> > > > sc.clean(f))*
>> > >
>> > >
>> > > Indeed, RDD is a functor and the underlying reason for flatMap to not
>> > take
>> > > A => RDD[B] doesn't show up in map.
>> > >
>> > > This has a lot of consequence actually, because at first one might
>> want
>> > to
>> > > create for-comprehension over RDDs, of even Traversable[F[_]]
>> functions
>> > > like sequence -- and he will get stuck since the signature aren't
>> > > compliant.
>> > > More importantly, Scala uses convention on the structure of a type to
>> > allow
>> > > for-comp... so where Traversable[F[_]] will fail on type, for-comp
>> will
>> > > failed weirdly.
>> > >
>> >
>> > +1
>> >
>> >
>> > >
>> > > Again this signature sounds normal, because my intuitive feeling about
>> > RDDs
>> > > is that they *only can* be monadic but the composition would depend on
>> > the
>> > > use case and might have heavy consequences (unioning the RDDs for
>> > instance
>> > > => this happening behind the sea can be a big pain, since it wouldn't
>> be
>> > > efficient at all).
>> > >
>> > > So Yes, RDD could be monadic but with care.
>> > >
>> >
>> > At least we can say, it is a Functor...
>> > Actually, I had imagined studying the monadic aspect of RDDs but as you
>> > said, it's not so easy...
>> > So for now, I consider them as pseudo-monadic ;)
>> >
>> >
>> >
>> > > So what exposes this signature is a way to flatMap over the inner
>> value,
>> > > like it is almost the case for Map (flatMapValues)
>> > >
>> > > So, wouldn't be better to rename flatMap as flatMapData (or whatever
>> > better
>> > > name)? Or to have flatMap requiring a Monad instance of RDD?
>> > >
>> > >
>> > renaming is to flatMapData or flatTraversableMap sounds good to me
>> (even if
>> > lots of people will hate it...)
>> > flatMap requiring a Monad would make it impossible to use with
>> > for-comprehension certainly no?
>> >
>> >
>> > > Sorry for the prose, just dropped my thoughts and feelings at once :-/
>> > >
>> > >
>> > I agree with you in case it can help not to feel alone ;)
>> >
>> > Pascal
>> >
>> > Cheers,
>> > > andy
>> > >
>> > > PS: and my English maybe, although my name's Andy I'm a native Belgian
>> > ^^.
>> > >
>> >
>>
>
>

Reply via email to