[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

kayousterhout Thu, 30 Oct 2014 15:57:07 -0700

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/3003#issuecomment-61183847
  
    Thanks for updating this!  I'd still like the error message returned to the 
user (the one in the abort() call) to include the size of the too-big result, 
as well as the configured maximum size.  I think there is little cost to adding 
this information, and great savings to a user who is trying to understand this 
functionality.  It looks like you're running from a Spark shell with the 
logging level set to info, but users in other environments will only see the 
SparkException and not the log message.
    
    Also, it looks like you didn't address the comment about multiple 
jobs/stages running at once.  Right now, the maximum limit only applies to one 
stage.  This seems like an issue because multiple concurrent stages or jobs 
that all collect results can together add up to more than the limit.  @mateiz 
do you think this is a non-issue?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...

Reply via email to