I am running Spark 1.0 on a 4-node standalone spark cluster (1 master + 3 
worker). Our app is fetching data from Cassandra and doing a basic filter, map, 
and countByKey on that data. I have run into a strange problem. Even if the 
number of rows in Cassandra is just 1M, the Spark job goes seems to go into an 
infinite loop and runs for hours. With a small amount of data (less than 100 
rows), the job does finish, but takes almost 30-40 seconds and we frequently 
see the messages shown below. If we run the same application on a single node 
Spark (--master local[4]), then we don't see these warnings and the task 
finishes in less than 6-7 seconds. Any idea what could be the cause for these 
problems when we run our application on a standalone 4-node spark cluster?

14/06/30 19:30:16 WARN TaskSetManager: Lost TID 25036 (task 6.0:90)
14/06/30 19:30:16 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:18 WARN TaskSetManager: Lost TID 25310 (task 6.1:0)
14/06/30 19:30:18 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:19 WARN TaskSetManager: Lost TID 25582 (task 6.2:0)
14/06/30 19:30:19 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:21 WARN TaskSetManager: Lost TID 25882 (task 6.3:34)
14/06/30 19:30:21 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(0, 192.168.222.142, 39342, 0)
14/06/30 19:30:22 WARN TaskSetManager: Lost TID 26152 (task 6.4:0)
14/06/30 19:30:22 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(0, 192.168.222.142, 39342, 0)
14/06/30 19:30:23 WARN TaskSetManager: Lost TID 26427 (task 6.5:4)
14/06/30 19:30:23 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:25 WARN TaskSetManager: Lost TID 26690 (task 6.6:0)
14/06/30 19:30:25 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:26 WARN TaskSetManager: Lost TID 26959 (task 6.7:0)
14/06/30 19:30:26 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:28 WARN TaskSetManager: Lost TID 27449 (task 6.8:218)
14/06/30 19:30:28 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:30 WARN TaskSetManager: Lost TID 27718 (task 6.9:0)
14/06/30 19:30:30 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:30 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:31 WARN TaskSetManager: Lost TID 27991 (task 6.10:1)
14/06/30 19:30:31 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:33 WARN TaskSetManager: Lost TID 28265 (task 6.11:0)
14/06/30 19:30:33 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:34 WARN TaskSetManager: Lost TID 28550 (task 6.12:0)
14/06/30 19:30:34 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:36 WARN TaskSetManager: Lost TID 28822 (task 6.13:0)
14/06/30 19:30:36 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:37 WARN TaskSetManager: Lost TID 29093 (task 6.14:0)
14/06/30 19:30:37 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:39 WARN TaskSetManager: Lost TID 29366 (task 6.15:0)
14/06/30 19:30:39 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:40 WARN TaskSetManager: Lost TID 29648 (task 6.16:9)
14/06/30 19:30:40 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:42 WARN TaskSetManager: Lost TID 29924 (task 6.17:0)
14/06/30 19:30:42 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:43 WARN TaskSetManager: Lost TID 30193 (task 6.18:0)
14/06/30 19:30:43 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(2, 192.168.222.164, 57185, 0)
14/06/30 19:30:45 WARN TaskSetManager: Lost TID 30559 (task 6.19:98)
14/06/30 19:30:45 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(1, 192.168.222.152, 45896, 0)
14/06/30 19:30:46 WARN TaskSetManager: Lost TID 30826 (task 6.20:0)
14/06/30 19:30:46 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(1, 192.168.222.152, 45896, 0)
14/06/30 19:30:48 WARN TaskSetManager: Lost TID 31098 (task 6.21:0)
14/06/30 19:30:48 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(1, 192.168.222.152, 45896, 0)
14/06/30 19:30:50 WARN TaskSetManager: Lost TID 31370 (task 6.22:0)
14/06/30 19:30:50 WARN TaskSetManager: Loss was due to fetch failure from 
BlockManagerId(1, 192.168.222.152, 45896, 0)

Thanks.

Mohammed

Reply via email to