That all sounds reasonable but I think in the case of 4 and maybe also 3 I
would rather see it implemented to raise an error message that explains
what’s going on and suggests the explicit operation that would do the most
equivalent thing. And perhaps raise a warning (using the warnings module)
for things that might be unintuitively expensive.
On Fri, Oct 26, 2018 at 12:15 Holden Karau <hol...@pigscanfly.ca> wrote:

> Coming out of https://github.com/apache/spark/pull/21654 it was agreed
> the helper methods in question made sense but there was some desire for a
> plan as to which helper methods we should use.
>
> I'd like to purpose a light weight solution to start with for helper
> methods that match either Pandas or general Python collection helper
> methods:
> 1) If the helper method doesn't collect the DataFrame back or force
> evaluation to the driver then we should add it without discussion
> 2) If the method forces evaluation this matches most obvious way that
> would implemented then we should add it with a note in the docstring
> 3) If the method does collect the DataFrame back to the driver and that is
> the most obvious way it would implemented (e.g. calling list to get back a
> list would have to collect the DataFrame) then we should add it with a
> warning in the docstring
> 4) If the method collects the DataFrame but a reasonable Python developer
> wouldn't expect that behaviour not implementing the helper method would be
> better
>
> What do folks think?
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
-- 
-- 
Cheers,
Leif

Reply via email to