Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/3003#issuecomment-61183847
Thanks for updating this! I'd still like the error message returned to the
user (the one in the abort() call) to include the size of the too-big result,
as well as the configured maximum size. I think there is little cost to adding
this information, and great savings to a user who is trying to understand this
functionality. It looks like you're running from a Spark shell with the
logging level set to info, but users in other environments will only see the
SparkException and not the log message.
Also, it looks like you didn't address the comment about multiple
jobs/stages running at once. Right now, the maximum limit only applies to one
stage. This seems like an issue because multiple concurrent stages or jobs
that all collect results can together add up to more than the limit. @mateiz
do you think this is a non-issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]