Re: Question on mappartitionwithsplit

Chengi Liu Sun, 17 Aug 2014 11:22:08 -0700

Hi,
  Thanks for the response..
In the second case f2??
foo will have to be declared globablly??right??


My function is somthing like:
def indexing(splitIndex, iterator):
  count = 0
  offset = sum(*offset_lists*[:splitIndex]) if splitIndex else 0
  indexed = []
  for i, e in enumerate(iterator):
    index = count + offset + i
    for j, ele in enumerate(e):
      indexed.append((index, j, ele))
  yield indexed

def another_funct(offset_lists):
    *#get that damn offset_lists*
    rdd.mapPartitionsWithSplit(indexing)
But then, the issue is that offset_lists?
Any suggestions?


On Sun, Aug 17, 2014 at 11:15 AM, Davies Liu <dav...@databricks.com> wrote:

> The callback function f only accept 2 arguments, if you want to pass
> another objects to it, you need closure, such as:
>
> foo=xxx
> def f(index, iterator, foo):
>      yield (index, foo)
> rdd.mapPartitionsWithIndex(lambda index, it: f(index, it, foo))
>
> also you can make f become `closure`:
>
> def f2(index, iterator):
>     yield (index, foo)
> rdd.mapPartitionsWithIndex(f2)
>
> On Sun, Aug 17, 2014 at 10:25 AM, Chengi Liu <chengi.liu...@gmail.com>
> wrote:
> > Hi,
> >   In this example:
> >
> http://www.cs.berkeley.edu/~pwendell/strataconf/api/pyspark/pyspark.rdd.RDD-class.html#mapPartitionsWithSplit
> > Let say, f takes three arguments:
> > def f(splitIndex, iterator, foo): yield splitIndex
> > Now, how do i send this foo parameter to this method?
> > rdd.mapPartitionsWithSplit(f)
> > Thanks
> >
>

Re: Question on mappartitionwithsplit

Reply via email to