What did you set for driver memory? The default value is 256m or 512m, which is too small. Try to set "--driver-memory 10g" with spark-submit or spark-shell and see whether it works or not. -Xiangrui
On Mon, Aug 11, 2014 at 6:26 PM, durin <m...@simon-schaefer.net> wrote: > I'm trying to apply KMeans training to some text data, which consists of > lines that each contain something between 3 and 20 words. For that purpose, > all unique words are saved in a dictionary. This dictionary can become very > large as no hashing etc. is done, but it should spill to disk in case it > doesn't fit into memory anymore: > var dict = scala.collection.mutable.Map[String,Int]() > dict.persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER) > > With the help of this dictionary, I build sparse feature vectors for each > line which are then saved in an RDD that is used as input for KMeans.train. > > Spark is running in standalone mode, in this case with 5 worker nodes. > It appears that anything up to the actual training completes successfully > with 126G of training data (logs below). > > The training data is provided in form a cached, broadcasted variable to all > worker nodes: > > var vectors2 = > vectors.repartition(1000).persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER) > var broadcastVector = sc.broadcast(vectors2) > println("---------------------Start model training---------------------"); > var model = KMeans.train(broadcastVector.value, 20, 10) > > The first error I get is a null pointer exception, but there is still work > done after that. I think the real reason this terminates is > java.lang.OutOfMemoryError: Java heap space. > > Is it possible that this happens because the cluster centers in the model > are represented in dense instead of sparse form, thereby getting large with > a large vector size? If yes, how can I make sure it doesn't crash because of > that? It should spill to disk if necessary. > My goal would be to have the input size only limited by disk space. Sure it > would get very slow if it spills to disk all the time, but it shouldn't > terminate. > > > > Here's the console output from the model.train part: > > ---------------------Start model training--------------------- > 14/08/11 17:05:17 INFO spark.SparkContext: Starting job: takeSample at > KMeans.scala:263 > 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Registering RDD 64 > (repartition at <console>:48) > 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Got job 6 (takeSample at > KMeans.scala:263) with 1000 output partitions (allowLocal=false) > 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Final stage: Stage > 8(takeSample at KMeans.scala:263) > 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Parents of final stage: > List(Stage 9) > 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Missing parents: List(Stage > 9) > 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Submitting Stage 9 > (MapPartitionsRDD[64] at repartition at <console>:48), which has no missing > parents > 4116.323: [GC (Allocation Failure) [PSYoungGen: 1867168K->240876K(2461696K)] > 4385155K->3164592K(9452544K), 1.4455064 secs] [Times: user=11.33 sys=0.03, > real=1.44 secs] > 4174.512: [GC (Allocation Failure) [PSYoungGen: 1679497K->763168K(2338816K)] > 4603212K->3691609K(9329664K), 0.8050508 secs] [Times: user=6.04 sys=0.01, > real=0.80 secs] > 4188.250: [GC (Allocation Failure) [PSYoungGen: 2071822K->986136K(2383360K)] > 5000263K->4487601K(9374208K), 1.6795174 secs] [Times: user=13.23 sys=0.01, > real=1.68 secs] > 14/08/11 17:06:57 INFO scheduler.DAGScheduler: Submitting 1 missing tasks > from Stage 9 (MapPartitionsRDD[64] at repartition at <console>:48) > 14/08/11 17:06:57 INFO scheduler.TaskSchedulerImpl: Adding task set 9.0 with > 1 tasks > 4190.947: [GC (Allocation Failure) [PSYoungGen: 2336718K->918720K(2276864K)] > 5838183K->5406145K(9267712K), 1.5793066 secs] [Times: user=12.40 sys=0.02, > real=1.58 secs] > 14/08/11 17:07:00 WARN scheduler.TaskSetManager: Stage 9 contains a task of > very large size (272484 KB). The maximum recommended task size is 100 KB. > 14/08/11 17:07:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 9.0 (TID 3053, idp11.foo.bar, PROCESS_LOCAL, 279023993 bytes) > 4193.607: [GC (Allocation Failure) [PSYoungGen: 2070046K->599908K(2330112K)] > 6557472K->5393557K(9320960K), 0.3267949 secs] [Times: user=2.53 sys=0.01, > real=0.33 secs] > 4194.645: [GC (Allocation Failure) [PSYoungGen: 1516770K->589655K(2330112K)] > 6310419K->5383352K(9320960K), 0.2566507 secs] [Times: user=1.96 sys=0.00, > real=0.26 secs] > 4195.815: [GC (Allocation Failure) [PSYoungGen: 1730909K->275312K(2330112K)] > 6524606K->5342865K(9320960K), 0.2053884 secs] [Times: user=1.57 sys=0.00, > real=0.21 secs] > 14/08/11 17:08:56 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in > memory on idp11.foo.bar:46418 (size: 136.0 B, free: 10.4 GB) > 14/08/11 17:08:56 INFO spark.MapOutputTrackerMasterActor: Asked to send map > output locations for shuffle 1 to sp...@idp11.foo.bar:57072 > 14/08/11 17:10:09 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 9.0 > (TID 3053, idp11.foo.bar): java.lang.NullPointerException: > > $line86.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36) > > $line86.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > org.apache.spark.scheduler.Task.run(Task.scala:51) > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > 4382.710: [GC (Allocation Failure) [PSYoungGen: 1435334K->306688K(2333184K)] > 6502887K->5374264K(9324032K), 0.1423619 secs] [Times: user=0.94 sys=0.01, > real=0.14 secs] > 14/08/11 17:10:10 INFO scheduler.TaskSetManager: Starting task 0.1 in stage > 9.0 (TID 3054, idp09.foo.bar, PROCESS_LOCAL, 279023993 bytes) > 4383.842: [GC (Allocation Failure) [PSYoungGen: 1473219K->313540K(2330112K)] > 6540795K->5381274K(9320960K), 0.1694822 secs] [Times: user=1.30 sys=0.01, > real=0.17 secs] > 4384.836: [GC (Allocation Failure) [PSYoungGen: 1360342K->431799K(2448384K)] > 6428075K->5499572K(9439232K), 0.2106620 secs] [Times: user=1.59 sys=0.00, > real=0.21 secs] > 4386.083: [GC (Allocation Failure) [PSYoungGen: 1732982K->275312K(2381312K)] > 6800755K->5616957K(9372160K), 0.2064240 secs] [Times: user=1.58 sys=0.00, > real=0.21 secs] > 14/08/11 17:13:14 WARN storage.BlockManagerMasterActor: Removing > BlockManager BlockManagerId(1, idp09.foo.bar, 46815, 0) with no recent heart > beats: 81307ms exceeds 45000ms > 14/08/11 17:13:35 INFO storage.BlockManagerMasterActor: Registering block > manager idp09.foo.bar:46815 with 10.4 GB RAM > 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in > memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB) > 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB) > 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in > memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB) > 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in > memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB) > 14/08/11 17:13:43 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in > memory on idp09.foo.bar:46815 (size: 136.0 B, free: 10.4 GB) > 14/08/11 17:13:43 INFO spark.MapOutputTrackerMasterActor: Asked to send map > output locations for shuffle 1 to sp...@idp09.foo.bar:45452 > 14/08/11 17:16:03 INFO scheduler.TaskSetManager: Finished task 0.1 in stage > 9.0 (TID 3054) in 354311 ms on idp09.foo.bar (1/1) > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: Stage 9 (repartition at > <console>:48) finished in 546.308 s > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: looking for newly runnable > stages > 14/08/11 17:16:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0, > whose tasks have all completed, from pool > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: running: Set() > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: waiting: Set(Stage 8) > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: failed: Set() > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: Missing parents for Stage 8: > List() > 14/08/11 17:16:03 INFO scheduler.DAGScheduler: Submitting Stage 8 > (MappedRDD[71] at map at KMeans.scala:123), which is now runnable > 4751.664: [GC (Allocation Failure) [PSYoungGen: 1603872K->118240K(2490368K)] > 6945517K->5459924K(9481216K), 0.1854085 secs] [Times: user=1.33 sys=0.00, > real=0.19 secs] > 4807.985: [GC (Allocation Failure) [PSYoungGen: 1595872K->492896K(2482176K)] > 6937556K->5834920K(9473024K), 0.6883449 secs] [Times: user=5.36 sys=0.01, > real=0.69 secs] > 4832.448: [GC (Allocation Failure) [PSYoungGen: 1716332K->895136K(2263552K)] > 7058357K->6816776K(9254400K), 1.2636489 secs] [Times: user=9.90 sys=0.01, > real=1.27 secs] > 14/08/11 17:17:41 INFO scheduler.DAGScheduler: Submitting 1000 missing tasks > from Stage 8 (MappedRDD[71] at map at KMeans.scala:123) > 14/08/11 17:17:41 INFO scheduler.TaskSchedulerImpl: Adding task set 8.0 with > 1000 tasks > 4834.762: [GC (Allocation Failure) [PSYoungGen: 2128155K->885978K(2168320K)] > 8049796K->7702659K(9159168K), 8.5102780 secs] [Times: user=38.78 sys=1.61, > real=8.51 secs] > 4843.283: [Full GC (Ergonomics) [PSYoungGen: 885978K->0K(2168320K)] > [ParOldGen: 6816680K->2286524K(6990848K)] 7702659K->2286524K(9159168K), > [Metaspace: 81087K->81087K(1118208K)], 8.615370 > 7 secs] [Times: user=63.32 sys=0.33, real=8.62 secs] > 4852.799: [GC (Allocation Failure) [PSYoungGen: 1085341K->850420K(2330112K)] > 3371865K->3136952K(9320960K), 0.3394825 secs] [Times: user=2.55 sys=0.02, > real=0.34 secs] > 14/08/11 17:18:00 WARN scheduler.TaskSetManager: Stage 8 contains a task of > very large size (272490 KB). The maximum recommended task size is 100 KB. > 14/08/11 17:18:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 8.0 (TID 3055, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4854.097: [GC (Allocation Failure) [PSYoungGen: 2006494K->545140K(2330112K)] > 4293027K->3409458K(9320960K), 0.3943651 secs] [Times: user=3.04 sys=0.01, > real=0.40 secs] > 14/08/11 17:18:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage > 8.0 (TID 3056, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4855.523: [GC (Allocation Failure) [PSYoungGen: 1703271K->882986K(2330112K)] > 4567590K->4019826K(9320960K), 0.4778008 secs] [Times: user=3.69 sys=0.02, > real=0.48 secs] > 14/08/11 17:18:03 INFO scheduler.TaskSetManager: Starting task 2.0 in stage > 8.0 (TID 3057, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4856.982: [GC (Allocation Failure) [PSYoungGen: 2005951K->577866K(2330112K)] > 5142792K->3987245K(9320960K), 0.3770014 secs] [Times: user=2.89 sys=0.02, > real=0.38 secs] > 14/08/11 17:18:05 INFO scheduler.TaskSetManager: Starting task 3.0 in stage > 8.0 (TID 3058, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4858.343: [GC (Allocation Failure) [PSYoungGen: 1738896K->310890K(2330112K)] > 5148275K->3992807K(9320960K), 0.2853468 secs] [Times: user=2.17 sys=0.01, > real=0.28 secs] > 4859.519: [GC (Allocation Failure) [PSYoungGen: 1429616K->272650K(2330112K)] > 5111533K->4227121K(9320960K), 0.2705028 secs] [Times: user=2.09 sys=0.00, > real=0.27 secs] > 14/08/11 17:18:06 INFO scheduler.TaskSetManager: Starting task 4.0 in stage > 8.0 (TID 3059, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4860.734: [GC (Allocation Failure) [PSYoungGen: 1429338K->545108K(2389504K)] > 5383809K->4772109K(9380352K), 0.3282623 secs] [Times: user=2.53 sys=0.02, > real=0.33 secs] > 14/08/11 17:18:08 INFO scheduler.TaskSetManager: Starting task 5.0 in stage > 8.0 (TID 3060, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4862.090: [GC (Allocation Failure) [PSYoungGen: 1701120K->883050K(2114560K)] > 5928121K->5382589K(9105408K), 0.4179785 secs] [Times: user=3.15 sys=0.00, > real=0.41 secs] > 14/08/11 17:18:09 INFO scheduler.TaskSetManager: Starting task 6.0 in stage > 8.0 (TID 3061, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4863.484: [GC (Allocation Failure) [PSYoungGen: 2006771K->577866K(2366976K)] > 6506311K->5349943K(9357824K), 0.3806139 secs] [Times: user=2.92 sys=0.02, > real=0.38 secs] > 14/08/11 17:18:11 INFO scheduler.TaskSetManager: Starting task 7.0 in stage > 8.0 (TID 3062, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4864.936: [GC (Allocation Failure) [PSYoungGen: 1777373K->349002K(2330112K)] > 6549451K->5393633K(9320960K), 0.3118865 secs] [Times: user=2.36 sys=0.01, > real=0.31 secs] > 4866.109: [GC (Allocation Failure) [PSYoungGen: 1428049K->272682K(2401280K)] > 6472680K->5589859K(9392128K), 0.2053937 secs] [Times: user=1.58 sys=0.00, > real=0.20 secs] > 14/08/11 17:18:13 INFO scheduler.TaskSetManager: Starting task 8.0 in stage > 8.0 (TID 3063, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4867.255: [GC (Allocation Failure) [PSYoungGen: 1428363K->545204K(2388992K)] > 6745540K->6134903K(9379840K), 0.3292614 secs] [Times: user=2.52 sys=0.00, > real=0.33 secs] > 14/08/11 17:18:14 INFO scheduler.TaskSetManager: Starting task 9.0 in stage > 8.0 (TID 3064, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4868.619: [GC (Allocation Failure) [PSYoungGen: 1700778K->883018K(2279424K)] > 7290478K->6745255K(9270272K), 0.4138342 secs] [Times: user=3.20 sys=0.00, > real=0.41 secs] > 14/08/11 17:18:16 INFO scheduler.TaskSetManager: Starting task 10.0 in stage > 8.0 (TID 3065, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4870.016: [GC (Allocation Failure) [PSYoungGen: 2005858K->577834K(2362880K)] > 7868096K->6712625K(9353728K), 0.3216270 secs] [Times: user=2.48 sys=0.02, > real=0.33 secs] > 14/08/11 17:18:18 INFO scheduler.TaskSetManager: Starting task 11.0 in stage > 8.0 (TID 3066, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4871.361: [GC (Allocation Failure) [PSYoungGen: 1777098K->349034K(2429440K)] > 7911890K->6756372K(9420288K), 0.2425195 secs] [Times: user=1.86 sys=0.01, > real=0.24 secs] > 4872.470: [GC (Allocation Failure) [PSYoungGen: 1428179K->272586K(2411008K)] > 7835517K->6952462K(9401856K), 0.2090806 secs] [Times: user=1.60 sys=0.01, > real=0.21 secs] > 4872.680: [Full GC (Ergonomics) [PSYoungGen: 272586K->0K(2411008K)] > [ParOldGen: 6679875K->5790843K(6990848K)] 6952462K->5790843K(9401856K), > [Metaspace: 81088K->81088K(1118208K)], 9.408670 > 1 secs] [Times: user=70.70 sys=0.29, real=9.40 secs] > 14/08/11 17:18:29 INFO scheduler.TaskSetManager: Starting task 12.0 in stage > 8.0 (TID 3067, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4883.028: [GC (Allocation Failure) [PSYoungGen: 1156929K->545236K(2479104K)] > 6947773K->6336079K(9469952K), 0.2738816 secs] [Times: user=2.10 sys=0.00, > real=0.28 secs] > 14/08/11 17:18:30 INFO scheduler.TaskSetManager: Starting task 13.0 in stage > 8.0 (TID 3068, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4884.347: [GC (Allocation Failure) [PSYoungGen: 1700618K->883018K(2306048K)] > 7491461K->6946435K(9296896K), 0.4920853 secs] [Times: user=3.82 sys=0.01, > real=0.50 secs] > 14/08/11 17:18:32 INFO scheduler.TaskSetManager: Starting task 14.0 in stage > 8.0 (TID 3069, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4885.818: [GC (Allocation Failure) [PSYoungGen: 2005731K->577898K(2436096K)] > 8069149K->6913845K(9426944K), 0.3060761 secs] [Times: user=2.17 sys=0.02, > real=0.30 secs] > 14/08/11 17:18:33 INFO scheduler.TaskSetManager: Starting task 15.0 in stage > 8.0 (TID 3070, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4887.211: [GC (Allocation Failure) [PSYoungGen: 1853207K->425322K(2391552K)] > 8189155K->7033799K(9382400K), 0.3021801 secs] [Times: user=2.34 sys=0.01, > real=0.30 secs] > 4887.513: [Full GC (Ergonomics) [PSYoungGen: 425322K->0K(2391552K)] > [ParOldGen: 6608477K->6684656K(6990848K)] 7033799K->6684656K(9382400K), > [Metaspace: 81096K->81032K(1118208K)], 9.489051 > 5 secs] [Times: user=70.52 sys=0.34, real=9.49 secs] > 14/08/11 17:18:44 INFO scheduler.TaskSetManager: Starting task 16.0 in stage > 8.0 (TID 3071, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4898.115: [Full GC (Ergonomics) [PSYoungGen: 1314547K->0K(2391552K)] > [ParOldGen: 6684656K->6899949K(6990848K)] 7999203K->6899949K(9382400K), > [Metaspace: 81032K->81025K(1118208K)], 11.0145 > 761 secs] [Times: user=67.67 sys=0.88, real=11.02 secs] > 4910.045: [Full GC (Ergonomics) [PSYoungGen: 1117462K->272491K(2391552K)] > [ParOldGen: 6899949K->6878697K(6990848K)] 8017411K->7151189K(9382400K), > [Metaspace: 81025K->81003K(1118208K)], 13 > .0508933 secs] [Times: user=96.11 sys=0.45, real=13.05 secs] > 14/08/11 17:19:10 INFO scheduler.TaskSetManager: Starting task 17.0 in stage > 8.0 (TID 3072, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4923.867: [Full GC (Ergonomics) [PSYoungGen: 1157002K->577697K(2391552K)] > [ParOldGen: 6878697K->6878671K(6990848K)] 8035699K->7456368K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 11 > .8407076 secs] [Times: user=73.16 sys=0.35, real=11.84 secs] > 4936.151: [Full GC (Ergonomics) [PSYoungGen: 1123485K->545009K(2391552K)] > [ParOldGen: 6878671K->6878671K(6990848K)] 8002156K->7423681K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 10 > .0288176 secs] [Times: user=75.19 sys=0.35, real=10.03 secs] > 14/08/11 17:19:33 INFO scheduler.TaskSetManager: Starting task 18.0 in stage > 8.0 (TID 3073, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4946.717: [Full GC (Ergonomics) [PSYoungGen: 1122927K->697593K(2391552K)] > [ParOldGen: 6878671K->6878671K(6990848K)] 8001599K->7576264K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 8. > 4595299 secs] [Times: user=63.18 sys=0.26, real=8.45 secs] > 4955.584: [Full GC (Ergonomics) [PSYoungGen: 1276308K->817527K(2391552K)] > [ParOldGen: 6878671K->6878670K(6990848K)] 8154980K->7696198K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 10 > .1614967 secs] [Times: user=76.43 sys=0.29, real=10.16 secs] > 4966.013: [Full GC (Ergonomics) [PSYoungGen: 1090782K->817502K(2391552K)] > [ParOldGen: 6878670K->6878670K(6990848K)] 7969453K->7696173K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 10 > .6428199 secs] [Times: user=79.71 sys=0.35, real=10.64 secs] > 14/08/11 17:20:03 INFO scheduler.TaskSetManager: Starting task 19.0 in stage > 8.0 (TID 3074, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 4977.071: [Full GC (Ergonomics) [PSYoungGen: 1242847K->893797K(2391552K)] > [ParOldGen: 6878670K->6878670K(6990848K)] 8121517K->7772468K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 9. > 9548540 secs] [Times: user=74.76 sys=0.31, real=9.95 secs] > 4987.156: [Full GC (Ergonomics) [PSYoungGen: 1047786K->970141K(2391552K)] > [ParOldGen: 6878670K->6878670K(6990848K)] 7926457K->7848811K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 8. > 4711455 secs] [Times: user=63.27 sys=0.33, real=8.47 secs] > 4995.861: [Full GC (Ergonomics) [PSYoungGen: 1275597K->1122715K(2391552K)] > [ParOldGen: 6878670K->6878670K(6990848K)] 8154267K->8001385K(9382400K), > [Metaspace: 81003K->81003K(1118208K)], 1 > 0.3113909 secs] [Times: user=76.20 sys=0.31, real=10.31 secs] > 5006.173: [Full GC (Allocation Failure) [PSYoungGen: > 1122715K->1122715K(2391552K)] [ParOldGen: 6878670K->6876589K(6990848K)] > 8001385K->7999305K(9382400K), [Metaspace: 81003K->79986K(11182 > 08K)], 12.8222611 secs] [Times: user=94.71 sys=0.43, real=12.82 secs] > 5019.191: [Full GC (Ergonomics) [PSYoungGen: 1278710K->0K(2391552K)] > [ParOldGen: 6876589K->2320712K(6990848K)] 8155299K->2320712K(9382400K), > [Metaspace: 80014K->80014K(1118208K)], 8.33951 > 79 secs] [Times: user=62.12 sys=0.28, real=8.34 secs] > 14/08/11 17:20:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from > thread [spark-akka.actor.default-dispatcher-18] shutting down ActorSystem > [spark] > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3230) > at > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) > at > org.apache.spark.scheduler.Task$.serializeWithDependencies(Task.scala:132) > at > org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:419) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7$$anonfun$apply$2.apply$mcVI$sp(TaskSchedulerImpl.scala:257) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:253) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:250) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at > scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:250) > at > org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:250) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:250) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:153) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:120) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 14/08/11 17:20:54 INFO scheduler.DAGScheduler: Failed to run takeSample at > KMeans.scala:263 > 14/08/11 17:20:55 INFO scheduler.TaskSetManager: Starting task 21.0 in stage > 8.0 (TID 3076, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes) > 5028.749: [GC (Allocation Failure) [PSYoungGen: 1327230K->889140K(2304512K)] > 3647943K->3209852K(9295360K), 0.3623403 secs] [Times: user=2.65 sys=0.02, > real=0.37 secs] > org.apache.spark.SparkException: Job cancelled because SparkContext was shut > down > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:608) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:607) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:607) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1203) > at > akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201) > at > akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163) > at akka.actor.ActorCell.terminate(ActorCell.scala:338) > at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431) > at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447) > at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262) > at akka.dispatch.Mailbox.run(Mailbox.scala:218) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Using-very-large-files-for-KMeans-training-cluster-centers-size-tp11937.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org