16/05/19 15:51:39 WARN CoarseGrainedExecutorBackend: An unknown (ip-10-171-80-97.ec2.internal:44765) driver disconnected. 16/05/19 15:51:42 ERROR TransportClient: Failed to send RPC 5466711974642652953 to ip-10-171-80-97.ec2.internal/10.171.80.97:44765: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException
Can you check the log for ip-10-171-80-97.ec2.internal to see if there was some clue ? Cheers On Thu, May 19, 2016 at 9:24 AM, Geet Kumar <[email protected]> wrote: > Ah, it seems the code did not show up in the email. Here is a link to the > original post: > http://apache-spark-user-list.1001560.n3.nabble.com/Latency-experiment-without-losing-executors-td26981.html > > Also, attached is the executor logs. > spark-logging.log > <https://drive.google.com/a/hawk.iit.edu/file/d/0B6naIKwXOhAUVjNLSTlkTGhnM3c/view?usp=drive_web> > > > Geet Kumar > DataSys Laboratory, CS/IIT > Linguistic Cognition Laboratory, CS/IIT > Department of Computer Science, Illinois Institute of Technology (IIT) > Email: [email protected] > > > On Thu, May 19, 2016 at 3:23 AM, Ted Yu <[email protected]> wrote: > >> I didn't see the code snippet. Were you using picture(s) ? >> >> Please pastebin the code. >> >> It would be better if you pastebin executor log for the killed executor. >> >> Thanks >> >> On Wed, May 18, 2016 at 9:41 PM, gkumar7 <[email protected]> wrote: >> >>> I would like to test the latency (tasks/s) perceived in a simple >>> application >>> on Apache Spark. >>> >>> The idea: The workers will generate random data to be placed in a list. >>> The >>> final action (count) will count the total number of data points >>> generated. >>> >>> Below, the numberOfPartitions is equal to the number of datapoints which >>> need to be generated (datapoints are integers). >>> >>> Although the code works as expected, a total of 119 spark executors were >>> killed while running with 64 slaves. I feel this is because since spark >>> assigns executors to each node, the amount of total partitions each node >>> is >>> assigned to compute may be larger than the available memory on that node. >>> This causes these executors to be killed and therefore, the latency >>> measurement I would like to analyze is inaccurate. >>> >>> Any assistance with code cleanup below or how to fix the above issue to >>> decrease the number of killed executors, would be much appreciated. >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Latency-experiment-without-losing-executors-tp26981.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >
