How to measure IO time in Spark over S3

2017-02-12 Thread Gili Nachum
Hi! How can I tell IO duration for a Spark application doing R/W from S3 (using S3 as a filesystem sc.textFile("s3a://...")? I would like to know the % of time doing IO of the overall app execution time. Gili.

Failures on JavaSparkContext. - "Futures timed out after [10000 milliseconds]"

2017-01-29 Thread Gili Nachum
Hi, I sometimes get these Random init failures in test and prod. Is there a use case that could lead to these errors? For example: Not enough cores? driver and worker not on the same LAN? etc... Running Spark 1.5.1. Retrying solves it. Caused by: java.util.concurrent.TimeoutException: Futures ti