What about exposing transforms that make it easy to coerce data to what the
method needs? Instead of passing a dataframe, you’d pass df.toSet to isin
Assuming toSet returns a local list, wouldn’t that have the problem of not
being able to handle extremely large lists? In contrast, I believe SQL’s
Perhaps we can just have a function that turns a DataFrame into a Column?
That'd work for both correlated and uncorrelated case, although in the
correlated case we'd need to turn off eager analysis (otherwise there is no
way to construct a valid DataFrame).
On Thu, Apr 19, 2018 at 4:08 PM, Ryan B
Nick, thanks for raising this.
It looks useful to have something in the DF API that behaves like
sub-queries, but I’m not sure that passing a DF works. Making every method
accept a DF that may contain matching data seems like it puts a lot of work
on the API — which now has to accept a DF all over