Re: Making BatchPythonEvaluation actually Batch

2016-03-31 Thread Davies Liu
@Justin, it's fixed by https://github.com/apache/spark/pull/12057 On Thu, Feb 11, 2016 at 11:26 AM, Davies Liu wrote: > Had a quick look in your commit, I think that make sense, could you > send a PR for that, then we can review it. > > In order to support 2), we need to change the serialized Pyt

Re: Making BatchPythonEvaluation actually Batch

2016-02-11 Thread Davies Liu
Had a quick look in your commit, I think that make sense, could you send a PR for that, then we can review it. In order to support 2), we need to change the serialized Python function from `f(iter)` to `f(x)`, process one row at a time (not a partition), then we can easily combine them together:

Making BatchPythonEvaluation actually Batch

2016-01-31 Thread Justin Uang
Hey guys, BLUF: sorry for the length of this email, trying to figure out how to batch Python UDF executions, and since this is my first time messing with catalyst, would like any feedback My team is starting to use PySpark UDFs quite heavily, and performance is a huge blocker. The extra roundtrip