Has anyone tried using functools.partial (
https://docs.python.org/2/library/functools.html#functools.partial) with
PySpark? If it works, it might be a nice way to address this use-case.
On Sun, Aug 17, 2014 at 7:35 PM, Davies Liu wrote:
> On Sun, Aug 17, 2014 at 11:21 AM, Chengi Liu
> wrote:
On Sun, Aug 17, 2014 at 11:21 AM, Chengi Liu wrote:
> Hi,
> Thanks for the response..
> In the second case f2??
> foo will have to be declared globablly??right??
>
> My function is somthing like:
> def indexing(splitIndex, iterator):
> count = 0
> offset = sum(offset_lists[:splitIndex]) if s
Building on what Davies Liu said,
How about something like:
def indexing(splitIndex, iterator,*offset_lists* ):
count = 0
offset = sum(*offset_lists*[:splitIndex]) if splitIndex else 0
indexed = []
for i, e in enumerate(iterator):
index = count + offset + i
for j, ele in enumerate(e
Hi,
Thanks for the response..
In the second case f2??
foo will have to be declared globablly??right??
My function is somthing like:
def indexing(splitIndex, iterator):
count = 0
offset = sum(*offset_lists*[:splitIndex]) if splitIndex else 0
indexed = []
for i, e in enumerate(iterator):
The callback function f only accept 2 arguments, if you want to pass
another objects to it, you need closure, such as:
foo=xxx
def f(index, iterator, foo):
yield (index, foo)
rdd.mapPartitionsWithIndex(lambda index, it: f(index, it, foo))
also you can make f become `closure`:
def f2(index,
Hi,
In this example:
http://www.cs.berkeley.edu/~pwendell/strataconf/api/pyspark/pyspark.rdd.RDD-class.html#mapPartitionsWithSplit
Let say, f takes three arguments:
def f(splitIndex, iterator, foo): yield splitIndex
Now, how do i send this foo parameter to this method?
rdd.mapPartitionsWithSplit(