Re: Using very large files for KMeans training -- cluster centers size?

Xiangrui Meng Tue, 12 Aug 2014 00:56:37 -0700

What did you set for driver memory? The default value is 256m or 512m,
which is too small. Try to set "--driver-memory 10g" with spark-submit
or spark-shell and see whether it works or not. -Xiangrui


On Mon, Aug 11, 2014 at 6:26 PM, durin <m...@simon-schaefer.net> wrote:
> I'm trying to apply KMeans training to some text data, which consists of
> lines that each contain something between 3 and 20 words. For that purpose,
> all unique words are saved in a dictionary. This dictionary can become very
> large as no hashing etc. is done, but it should spill to disk in case it
> doesn't fit into memory anymore:
> var dict = scala.collection.mutable.Map[String,Int]()
> dict.persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER)
>
> With the help of this dictionary, I build sparse feature vectors for each
> line which are then saved in an RDD that is used as input for KMeans.train.
>
> Spark is running in standalone mode, in this case with 5 worker nodes.
> It appears that anything up to the actual training completes successfully
> with 126G of training data (logs below).
>
> The training data is provided in form a cached, broadcasted variable to all
> worker nodes:
>
> var vectors2 =
> vectors.repartition(1000).persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER)
> var broadcastVector = sc.broadcast(vectors2)
> println("---------------------Start model training---------------------");
> var model = KMeans.train(broadcastVector.value, 20, 10)
>
> The first error I get is a null pointer exception, but there is still work
> done after that. I think the real reason this terminates is
> java.lang.OutOfMemoryError: Java heap space.
>
> Is it possible that this happens because the cluster centers in the model
> are represented in dense instead of sparse form, thereby getting large with
> a large vector size? If yes, how can I make sure it doesn't crash because of
> that? It should spill to disk if necessary.
> My goal would be to have the input size only limited by disk space. Sure it
> would get very slow if it spills to disk all the time, but it shouldn't
> terminate.
>
>
>
> Here's the console output from the model.train part:
>
> ---------------------Start model training---------------------
> 14/08/11 17:05:17 INFO spark.SparkContext: Starting job: takeSample at
> KMeans.scala:263
> 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Registering RDD 64
> (repartition at <console>:48)
> 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Got job 6 (takeSample at
> KMeans.scala:263) with 1000 output partitions (allowLocal=false)
> 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Final stage: Stage
> 8(takeSample at KMeans.scala:263)
> 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Parents of final stage:
> List(Stage 9)
> 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Missing parents: List(Stage
> 9)
> 14/08/11 17:05:17 INFO scheduler.DAGScheduler: Submitting Stage 9
> (MapPartitionsRDD[64] at repartition at <console>:48), which has no missing
> parents
> 4116.323: [GC (Allocation Failure) [PSYoungGen: 1867168K->240876K(2461696K)]
> 4385155K->3164592K(9452544K), 1.4455064 secs] [Times: user=11.33 sys=0.03,
> real=1.44 secs]
> 4174.512: [GC (Allocation Failure) [PSYoungGen: 1679497K->763168K(2338816K)]
> 4603212K->3691609K(9329664K), 0.8050508 secs] [Times: user=6.04 sys=0.01,
> real=0.80 secs]
> 4188.250: [GC (Allocation Failure) [PSYoungGen: 2071822K->986136K(2383360K)]
> 5000263K->4487601K(9374208K), 1.6795174 secs] [Times: user=13.23 sys=0.01,
> real=1.68 secs]
> 14/08/11 17:06:57 INFO scheduler.DAGScheduler: Submitting 1 missing tasks
> from Stage 9 (MapPartitionsRDD[64] at repartition at <console>:48)
> 14/08/11 17:06:57 INFO scheduler.TaskSchedulerImpl: Adding task set 9.0 with
> 1 tasks
> 4190.947: [GC (Allocation Failure) [PSYoungGen: 2336718K->918720K(2276864K)]
> 5838183K->5406145K(9267712K), 1.5793066 secs] [Times: user=12.40 sys=0.02,
> real=1.58 secs]
> 14/08/11 17:07:00 WARN scheduler.TaskSetManager: Stage 9 contains a task of
> very large size (272484 KB). The maximum recommended task size is 100 KB.
> 14/08/11 17:07:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
> 9.0 (TID 3053, idp11.foo.bar, PROCESS_LOCAL, 279023993 bytes)
> 4193.607: [GC (Allocation Failure) [PSYoungGen: 2070046K->599908K(2330112K)]
> 6557472K->5393557K(9320960K), 0.3267949 secs] [Times: user=2.53 sys=0.01,
> real=0.33 secs]
> 4194.645: [GC (Allocation Failure) [PSYoungGen: 1516770K->589655K(2330112K)]
> 6310419K->5383352K(9320960K), 0.2566507 secs] [Times: user=1.96 sys=0.00,
> real=0.26 secs]
> 4195.815: [GC (Allocation Failure) [PSYoungGen: 1730909K->275312K(2330112K)]
> 6524606K->5342865K(9320960K), 0.2053884 secs] [Times: user=1.57 sys=0.00,
> real=0.21 secs]
> 14/08/11 17:08:56 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in
> memory on idp11.foo.bar:46418 (size: 136.0 B, free: 10.4 GB)
> 14/08/11 17:08:56 INFO spark.MapOutputTrackerMasterActor: Asked to send map
> output locations for shuffle 1 to sp...@idp11.foo.bar:57072
> 14/08/11 17:10:09 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 9.0
> (TID 3053, idp11.foo.bar): java.lang.NullPointerException:
>
> $line86.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
>
> $line86.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:57)
>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:147)
>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         java.lang.Thread.run(Thread.java:745)
> 4382.710: [GC (Allocation Failure) [PSYoungGen: 1435334K->306688K(2333184K)]
> 6502887K->5374264K(9324032K), 0.1423619 secs] [Times: user=0.94 sys=0.01,
> real=0.14 secs]
> 14/08/11 17:10:10 INFO scheduler.TaskSetManager: Starting task 0.1 in stage
> 9.0 (TID 3054, idp09.foo.bar, PROCESS_LOCAL, 279023993 bytes)
> 4383.842: [GC (Allocation Failure) [PSYoungGen: 1473219K->313540K(2330112K)]
> 6540795K->5381274K(9320960K), 0.1694822 secs] [Times: user=1.30 sys=0.01,
> real=0.17 secs]
> 4384.836: [GC (Allocation Failure) [PSYoungGen: 1360342K->431799K(2448384K)]
> 6428075K->5499572K(9439232K), 0.2106620 secs] [Times: user=1.59 sys=0.00,
> real=0.21 secs]
> 4386.083: [GC (Allocation Failure) [PSYoungGen: 1732982K->275312K(2381312K)]
> 6800755K->5616957K(9372160K), 0.2064240 secs] [Times: user=1.58 sys=0.00,
> real=0.21 secs]
> 14/08/11 17:13:14 WARN storage.BlockManagerMasterActor: Removing
> BlockManager BlockManagerId(1, idp09.foo.bar, 46815, 0) with no recent heart
> beats: 81307ms exceeds 45000ms
> 14/08/11 17:13:35 INFO storage.BlockManagerMasterActor: Registering block
> manager idp09.foo.bar:46815 with 10.4 GB RAM
> 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in
> memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
> 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in
> memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
> 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in
> memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
> 14/08/11 17:13:35 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in
> memory on idp09.foo.bar:46815 (size: 39.5 KB, free: 10.4 GB)
> 14/08/11 17:13:43 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in
> memory on idp09.foo.bar:46815 (size: 136.0 B, free: 10.4 GB)
> 14/08/11 17:13:43 INFO spark.MapOutputTrackerMasterActor: Asked to send map
> output locations for shuffle 1 to sp...@idp09.foo.bar:45452
> 14/08/11 17:16:03 INFO scheduler.TaskSetManager: Finished task 0.1 in stage
> 9.0 (TID 3054) in 354311 ms on idp09.foo.bar (1/1)
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: Stage 9 (repartition at
> <console>:48) finished in 546.308 s
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: looking for newly runnable
> stages
> 14/08/11 17:16:03 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 9.0,
> whose tasks have all completed, from pool
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: running: Set()
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: waiting: Set(Stage 8)
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: failed: Set()
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: Missing parents for Stage 8:
> List()
> 14/08/11 17:16:03 INFO scheduler.DAGScheduler: Submitting Stage 8
> (MappedRDD[71] at map at KMeans.scala:123), which is now runnable
> 4751.664: [GC (Allocation Failure) [PSYoungGen: 1603872K->118240K(2490368K)]
> 6945517K->5459924K(9481216K), 0.1854085 secs] [Times: user=1.33 sys=0.00,
> real=0.19 secs]
> 4807.985: [GC (Allocation Failure) [PSYoungGen: 1595872K->492896K(2482176K)]
> 6937556K->5834920K(9473024K), 0.6883449 secs] [Times: user=5.36 sys=0.01,
> real=0.69 secs]
> 4832.448: [GC (Allocation Failure) [PSYoungGen: 1716332K->895136K(2263552K)]
> 7058357K->6816776K(9254400K), 1.2636489 secs] [Times: user=9.90 sys=0.01,
> real=1.27 secs]
> 14/08/11 17:17:41 INFO scheduler.DAGScheduler: Submitting 1000 missing tasks
> from Stage 8 (MappedRDD[71] at map at KMeans.scala:123)
> 14/08/11 17:17:41 INFO scheduler.TaskSchedulerImpl: Adding task set 8.0 with
> 1000 tasks
> 4834.762: [GC (Allocation Failure) [PSYoungGen: 2128155K->885978K(2168320K)]
> 8049796K->7702659K(9159168K), 8.5102780 secs] [Times: user=38.78 sys=1.61,
> real=8.51 secs]
> 4843.283: [Full GC (Ergonomics) [PSYoungGen: 885978K->0K(2168320K)]
> [ParOldGen: 6816680K->2286524K(6990848K)] 7702659K->2286524K(9159168K),
> [Metaspace: 81087K->81087K(1118208K)], 8.615370
> 7 secs] [Times: user=63.32 sys=0.33, real=8.62 secs]
> 4852.799: [GC (Allocation Failure) [PSYoungGen: 1085341K->850420K(2330112K)]
> 3371865K->3136952K(9320960K), 0.3394825 secs] [Times: user=2.55 sys=0.02,
> real=0.34 secs]
> 14/08/11 17:18:00 WARN scheduler.TaskSetManager: Stage 8 contains a task of
> very large size (272490 KB). The maximum recommended task size is 100 KB.
> 14/08/11 17:18:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
> 8.0 (TID 3055, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4854.097: [GC (Allocation Failure) [PSYoungGen: 2006494K->545140K(2330112K)]
> 4293027K->3409458K(9320960K), 0.3943651 secs] [Times: user=3.04 sys=0.01,
> real=0.40 secs]
> 14/08/11 17:18:01 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
> 8.0 (TID 3056, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4855.523: [GC (Allocation Failure) [PSYoungGen: 1703271K->882986K(2330112K)]
> 4567590K->4019826K(9320960K), 0.4778008 secs] [Times: user=3.69 sys=0.02,
> real=0.48 secs]
> 14/08/11 17:18:03 INFO scheduler.TaskSetManager: Starting task 2.0 in stage
> 8.0 (TID 3057, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4856.982: [GC (Allocation Failure) [PSYoungGen: 2005951K->577866K(2330112K)]
> 5142792K->3987245K(9320960K), 0.3770014 secs] [Times: user=2.89 sys=0.02,
> real=0.38 secs]
> 14/08/11 17:18:05 INFO scheduler.TaskSetManager: Starting task 3.0 in stage
> 8.0 (TID 3058, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4858.343: [GC (Allocation Failure) [PSYoungGen: 1738896K->310890K(2330112K)]
> 5148275K->3992807K(9320960K), 0.2853468 secs] [Times: user=2.17 sys=0.01,
> real=0.28 secs]
> 4859.519: [GC (Allocation Failure) [PSYoungGen: 1429616K->272650K(2330112K)]
> 5111533K->4227121K(9320960K), 0.2705028 secs] [Times: user=2.09 sys=0.00,
> real=0.27 secs]
> 14/08/11 17:18:06 INFO scheduler.TaskSetManager: Starting task 4.0 in stage
> 8.0 (TID 3059, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4860.734: [GC (Allocation Failure) [PSYoungGen: 1429338K->545108K(2389504K)]
> 5383809K->4772109K(9380352K), 0.3282623 secs] [Times: user=2.53 sys=0.02,
> real=0.33 secs]
> 14/08/11 17:18:08 INFO scheduler.TaskSetManager: Starting task 5.0 in stage
> 8.0 (TID 3060, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4862.090: [GC (Allocation Failure) [PSYoungGen: 1701120K->883050K(2114560K)]
> 5928121K->5382589K(9105408K), 0.4179785 secs] [Times: user=3.15 sys=0.00,
> real=0.41 secs]
> 14/08/11 17:18:09 INFO scheduler.TaskSetManager: Starting task 6.0 in stage
> 8.0 (TID 3061, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4863.484: [GC (Allocation Failure) [PSYoungGen: 2006771K->577866K(2366976K)]
> 6506311K->5349943K(9357824K), 0.3806139 secs] [Times: user=2.92 sys=0.02,
> real=0.38 secs]
> 14/08/11 17:18:11 INFO scheduler.TaskSetManager: Starting task 7.0 in stage
> 8.0 (TID 3062, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4864.936: [GC (Allocation Failure) [PSYoungGen: 1777373K->349002K(2330112K)]
> 6549451K->5393633K(9320960K), 0.3118865 secs] [Times: user=2.36 sys=0.01,
> real=0.31 secs]
> 4866.109: [GC (Allocation Failure) [PSYoungGen: 1428049K->272682K(2401280K)]
> 6472680K->5589859K(9392128K), 0.2053937 secs] [Times: user=1.58 sys=0.00,
> real=0.20 secs]
> 14/08/11 17:18:13 INFO scheduler.TaskSetManager: Starting task 8.0 in stage
> 8.0 (TID 3063, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4867.255: [GC (Allocation Failure) [PSYoungGen: 1428363K->545204K(2388992K)]
> 6745540K->6134903K(9379840K), 0.3292614 secs] [Times: user=2.52 sys=0.00,
> real=0.33 secs]
> 14/08/11 17:18:14 INFO scheduler.TaskSetManager: Starting task 9.0 in stage
> 8.0 (TID 3064, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4868.619: [GC (Allocation Failure) [PSYoungGen: 1700778K->883018K(2279424K)]
> 7290478K->6745255K(9270272K), 0.4138342 secs] [Times: user=3.20 sys=0.00,
> real=0.41 secs]
> 14/08/11 17:18:16 INFO scheduler.TaskSetManager: Starting task 10.0 in stage
> 8.0 (TID 3065, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4870.016: [GC (Allocation Failure) [PSYoungGen: 2005858K->577834K(2362880K)]
> 7868096K->6712625K(9353728K), 0.3216270 secs] [Times: user=2.48 sys=0.02,
> real=0.33 secs]
> 14/08/11 17:18:18 INFO scheduler.TaskSetManager: Starting task 11.0 in stage
> 8.0 (TID 3066, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4871.361: [GC (Allocation Failure) [PSYoungGen: 1777098K->349034K(2429440K)]
> 7911890K->6756372K(9420288K), 0.2425195 secs] [Times: user=1.86 sys=0.01,
> real=0.24 secs]
> 4872.470: [GC (Allocation Failure) [PSYoungGen: 1428179K->272586K(2411008K)]
> 7835517K->6952462K(9401856K), 0.2090806 secs] [Times: user=1.60 sys=0.01,
> real=0.21 secs]
> 4872.680: [Full GC (Ergonomics) [PSYoungGen: 272586K->0K(2411008K)]
> [ParOldGen: 6679875K->5790843K(6990848K)] 6952462K->5790843K(9401856K),
> [Metaspace: 81088K->81088K(1118208K)], 9.408670
> 1 secs] [Times: user=70.70 sys=0.29, real=9.40 secs]
> 14/08/11 17:18:29 INFO scheduler.TaskSetManager: Starting task 12.0 in stage
> 8.0 (TID 3067, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4883.028: [GC (Allocation Failure) [PSYoungGen: 1156929K->545236K(2479104K)]
> 6947773K->6336079K(9469952K), 0.2738816 secs] [Times: user=2.10 sys=0.00,
> real=0.28 secs]
> 14/08/11 17:18:30 INFO scheduler.TaskSetManager: Starting task 13.0 in stage
> 8.0 (TID 3068, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4884.347: [GC (Allocation Failure) [PSYoungGen: 1700618K->883018K(2306048K)]
> 7491461K->6946435K(9296896K), 0.4920853 secs] [Times: user=3.82 sys=0.01,
> real=0.50 secs]
> 14/08/11 17:18:32 INFO scheduler.TaskSetManager: Starting task 14.0 in stage
> 8.0 (TID 3069, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4885.818: [GC (Allocation Failure) [PSYoungGen: 2005731K->577898K(2436096K)]
> 8069149K->6913845K(9426944K), 0.3060761 secs] [Times: user=2.17 sys=0.02,
> real=0.30 secs]
> 14/08/11 17:18:33 INFO scheduler.TaskSetManager: Starting task 15.0 in stage
> 8.0 (TID 3070, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4887.211: [GC (Allocation Failure) [PSYoungGen: 1853207K->425322K(2391552K)]
> 8189155K->7033799K(9382400K), 0.3021801 secs] [Times: user=2.34 sys=0.01,
> real=0.30 secs]
> 4887.513: [Full GC (Ergonomics) [PSYoungGen: 425322K->0K(2391552K)]
> [ParOldGen: 6608477K->6684656K(6990848K)] 7033799K->6684656K(9382400K),
> [Metaspace: 81096K->81032K(1118208K)], 9.489051
> 5 secs] [Times: user=70.52 sys=0.34, real=9.49 secs]
> 14/08/11 17:18:44 INFO scheduler.TaskSetManager: Starting task 16.0 in stage
> 8.0 (TID 3071, idp19.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4898.115: [Full GC (Ergonomics) [PSYoungGen: 1314547K->0K(2391552K)]
> [ParOldGen: 6684656K->6899949K(6990848K)] 7999203K->6899949K(9382400K),
> [Metaspace: 81032K->81025K(1118208K)], 11.0145
> 761 secs] [Times: user=67.67 sys=0.88, real=11.02 secs]
> 4910.045: [Full GC (Ergonomics) [PSYoungGen: 1117462K->272491K(2391552K)]
> [ParOldGen: 6899949K->6878697K(6990848K)] 8017411K->7151189K(9382400K),
> [Metaspace: 81025K->81003K(1118208K)], 13
> .0508933 secs] [Times: user=96.11 sys=0.45, real=13.05 secs]
> 14/08/11 17:19:10 INFO scheduler.TaskSetManager: Starting task 17.0 in stage
> 8.0 (TID 3072, idp11.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4923.867: [Full GC (Ergonomics) [PSYoungGen: 1157002K->577697K(2391552K)]
> [ParOldGen: 6878697K->6878671K(6990848K)] 8035699K->7456368K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 11
> .8407076 secs] [Times: user=73.16 sys=0.35, real=11.84 secs]
> 4936.151: [Full GC (Ergonomics) [PSYoungGen: 1123485K->545009K(2391552K)]
> [ParOldGen: 6878671K->6878671K(6990848K)] 8002156K->7423681K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 10
> .0288176 secs] [Times: user=75.19 sys=0.35, real=10.03 secs]
> 14/08/11 17:19:33 INFO scheduler.TaskSetManager: Starting task 18.0 in stage
> 8.0 (TID 3073, idp41.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4946.717: [Full GC (Ergonomics) [PSYoungGen: 1122927K->697593K(2391552K)]
> [ParOldGen: 6878671K->6878671K(6990848K)] 8001599K->7576264K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 8.
> 4595299 secs] [Times: user=63.18 sys=0.26, real=8.45 secs]
> 4955.584: [Full GC (Ergonomics) [PSYoungGen: 1276308K->817527K(2391552K)]
> [ParOldGen: 6878671K->6878670K(6990848K)] 8154980K->7696198K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 10
> .1614967 secs] [Times: user=76.43 sys=0.29, real=10.16 secs]
> 4966.013: [Full GC (Ergonomics) [PSYoungGen: 1090782K->817502K(2391552K)]
> [ParOldGen: 6878670K->6878670K(6990848K)] 7969453K->7696173K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 10
> .6428199 secs] [Times: user=79.71 sys=0.35, real=10.64 secs]
> 14/08/11 17:20:03 INFO scheduler.TaskSetManager: Starting task 19.0 in stage
> 8.0 (TID 3074, idp42.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 4977.071: [Full GC (Ergonomics) [PSYoungGen: 1242847K->893797K(2391552K)]
> [ParOldGen: 6878670K->6878670K(6990848K)] 8121517K->7772468K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 9.
> 9548540 secs] [Times: user=74.76 sys=0.31, real=9.95 secs]
> 4987.156: [Full GC (Ergonomics) [PSYoungGen: 1047786K->970141K(2391552K)]
> [ParOldGen: 6878670K->6878670K(6990848K)] 7926457K->7848811K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 8.
> 4711455 secs] [Times: user=63.27 sys=0.33, real=8.47 secs]
> 4995.861: [Full GC (Ergonomics) [PSYoungGen: 1275597K->1122715K(2391552K)]
> [ParOldGen: 6878670K->6878670K(6990848K)] 8154267K->8001385K(9382400K),
> [Metaspace: 81003K->81003K(1118208K)], 1
> 0.3113909 secs] [Times: user=76.20 sys=0.31, real=10.31 secs]
> 5006.173: [Full GC (Allocation Failure) [PSYoungGen:
> 1122715K->1122715K(2391552K)] [ParOldGen: 6878670K->6876589K(6990848K)]
> 8001385K->7999305K(9382400K), [Metaspace: 81003K->79986K(11182
> 08K)], 12.8222611 secs] [Times: user=94.71 sys=0.43, real=12.82 secs]
> 5019.191: [Full GC (Ergonomics) [PSYoungGen: 1278710K->0K(2391552K)]
> [ParOldGen: 6876589K->2320712K(6990848K)] 8155299K->2320712K(9382400K),
> [Metaspace: 80014K->80014K(1118208K)], 8.33951
> 79 secs] [Times: user=62.12 sys=0.28, real=8.34 secs]
> 14/08/11 17:20:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from
> thread [spark-akka.actor.default-dispatcher-18] shutting down ActorSystem
> [spark]
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:3230)
>         at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
>         at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
>         at
> org.apache.spark.scheduler.Task$.serializeWithDependencies(Task.scala:132)
>         at
> org.apache.spark.scheduler.TaskSetManager.resourceOffer(TaskSetManager.scala:419)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7$$anonfun$apply$2.apply$mcVI$sp(TaskSchedulerImpl.scala:257)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:253)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3$$anonfun$apply$7.apply(TaskSchedulerImpl.scala:250)
>         at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:250)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$3.apply(TaskSchedulerImpl.scala:250)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:250)
>         at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:153)
>         at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:120)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/08/11 17:20:54 INFO scheduler.DAGScheduler: Failed to run takeSample at
> KMeans.scala:263
> 14/08/11 17:20:55 INFO scheduler.TaskSetManager: Starting task 21.0 in stage
> 8.0 (TID 3076, idp09.foo.bar, PROCESS_LOCAL, 279029826 bytes)
> 5028.749: [GC (Allocation Failure) [PSYoungGen: 1327230K->889140K(2304512K)]
> 3647943K->3209852K(9295360K), 0.3623403 secs] [Times: user=2.65 sys=0.02,
> real=0.37 secs]
> org.apache.spark.SparkException: Job cancelled because SparkContext was shut
> down
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:608)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:607)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at
> org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:607)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1203)
>         at
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
>         at
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
>         at akka.actor.ActorCell.terminate(ActorCell.scala:338)
>         at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
>         at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
>         at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:218)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-very-large-files-for-KMeans-training-cluster-centers-size-tp11937.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using very large files for KMeans training -- cluster centers size?

Reply via email to