Hi Nicolas, Do you know how many sstables is this new node suppose to receive ?
If I can find out this via nodetool netstats, then this would be 619 as following: # nodetool netstats Bootstrap b95371e0-0c0a-11e8-932b-f775227bf21c /192.168.1.215 - Receiving 71 files, 7744612158 bytes total. Already received 0 files, 893897583 bytes total /192.168.1.214 - Receiving 58 files, 5693392001 bytes total. Already received 0 files, 1078372756 bytes total /192.168.1.206 - Receiving 52 files, 3389096409 bytes total. Already received 3 files, 508592758 bytes total /192.168.1.213 - Receiving 59 files, 6041633329 bytes total. Already received 0 files, 1038760653 bytes total /192.168.1.231 - Receiving 79 files, 7579181689 bytes total. Already received 4 files, 38387859 bytes total /192.168.1.208 - Receiving 51 files, 3272885123 bytes total. Already received 3 files, 362450903 bytes total /192.168.1.207 - Receiving 56 files, 3028344200 bytes total. Already received 3 files, 57790197 bytes total /192.168.1.232 - Receiving 79 files, 7268716317 bytes total. Already received 1 files, 1127174421 bytes total /192.168.1.209 - Receiving 114 files, 21381846105 bytes total. Already received 1 files, 961497222 bytes total does disabling compaction_throughput_mb_per_sec or increasing concurrent_compactors > has any effect ? I will give it a try: # nodetool getcompactionthroughput Current compaction throughput: 128 MB/s # nodetool setcompactionthroughput 0 # nodetool getcompactionthroughput Current compaction throughput: 0 MB/s # nodetool getconcurrentcompactors Current concurrent compactors in the system is: 16 Which memtable_allocation_type are you using ? # grep memtable_allocation_type /etc/cassandra/conf/cassandra.yaml memtable_allocation_type: heap_buffers thanks so far, regards Juergen 2018-02-07 14:29 GMT+01:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: > Hi Jurgen, > > It does feel like some OOM during boostrap from previous C* v2, but that > sould be fixed in your version. > > Do you know how many sstables is this new node suppose to receive ? > > Juste a wild guess : it may have something to do with compaction not > keeping up because every other nodes are streaming data to this new one > (resulting in long lived object in the heap), does disabling > compaction_throughput_mb_per_sec or increasing concurrent_compactors has > any effect ? > > Which memtable_allocation_type are you using ? > > > On 7 February 2018 at 12:38, Jürgen Albersdorfer <jalbersdor...@gmail.com> > wrote: > >> Hi, I always face an issue when bootstrapping a Node having less than >> 184GB RAM (156GB JVM HEAP) on our 10 Node C* 3.11.1 Cluster. >> During bootstrap, when I watch the cassandra.log I observe a growth in >> JVM Heap Old Gen which gets not significantly freed any more. >> I know that JVM collects on Old Gen only when really needed. I can see >> collections, but there is always a remainder which >> seems to grow forever without ever getting freed. >> After the Node successfully Joined the Cluster, I can remove the extra >> 128GB of RAM I have given it for bootstrapping without any further effect. >> >> It feels like Cassandra will not forget about every single byte streamed >> over the Network over time during bootstrapping, - which would be a memory >> leak and a major problem, too. >> >> Or is there something I'm doing wrong? - Any Ideas? >> >> Here my observations on a failing Bootstrap - The following Node has 72GB >> RAM installed, 64GB of it are configured for JVM Heap Space. >> >> cassandra.log (truncated): >> INFO [Service Thread] 2018-02-07 11:12:49,604 GCInspector.java:284 - G1 >> Young Generation GC in 984ms. G1 Eden Space: 14763950080 -> 0; G1 Old Gen: >> 36960206856 -> 39661338640; G1 Survivor Space: 2785017856 -> 1476395008; >> INFO [Service Thread] 2018-02-07 11:13:00,108 GCInspector.java:284 - G1 >> Young Generation GC in 784ms. G1 Eden Space: 18387828736 >> <(838)%20782-8736> -> 0; G1 Old Gen: 39661338640 -> 41053847560; G1 >> Survivor Space: 1476395008 -> 1845493760; >> INFO [Service Thread] 2018-02-07 11:13:08,639 GCInspector.java:284 - G1 >> Young Generation GC in 718ms. G1 Eden Space: 16743661568 -> 0; G1 Old Gen: >> 41053847560 -> 42832232472; G1 Survivor Space: 1845493760 -> 1375731712; >> INFO [Service Thread] 2018-02-07 11:13:18,271 GCInspector.java:284 - G1 >> Young Generation GC in 546ms. G1 Eden Space: 15535702016 -> 0; G1 Old Gen: >> 42831004832 -> 44206736544; G1 Survivor Space: 1375731712 -> 1006632960; >> INFO [Service Thread] 2018-02-07 11:13:35,364 GCInspector.java:284 - G1 >> Young Generation GC in 638ms. G1 Eden Space: 14025752576 >> <(402)%20575-2576> -> 0; G1 Old Gen: 44206737048 -> 45213369488; G1 >> Survivor Space: 1778384896 -> 1610612736; >> INFO [Service Thread] 2018-02-07 11:13:42,898 GCInspector.java:284 - G1 >> Young Generation GC in 614ms. G1 Eden Space: 13388218368 -> 0; G1 Old Gen: >> 45213369488 -> 46152893584; G1 Survivor Space: 1610612736 -> 1006632960; >> INFO [Service Thread] 2018-02-07 11:13:58,291 GCInspector.java:284 - G1 >> Young Generation GC in 400ms. G1 Eden Space: 13119782912 -> 0; G1 Old Gen: >> 46136116376 -> 47171400848; G1 Survivor Space: 1275068416 -> 771751936; >> INFO [Service Thread] 2018-02-07 11:14:23,071 GCInspector.java:284 - G1 >> Young Generation GC in 303ms. G1 Eden Space: 11676942336 -> 0; G1 Old Gen: >> 47710958232 -> 48239699096; G1 Survivor Space: 1207959552 -> 973078528; >> INFO [Service Thread] 2018-02-07 11:14:46,157 GCInspector.java:284 - G1 >> Young Generation GC in 305ms. G1 Eden Space: 11005853696 -> 0; G1 Old Gen: >> 48903342232 -> 49289001104; G1 Survivor Space: 939524096 -> 973078528; >> INFO [Service Thread] 2018-02-07 11:14:53,045 GCInspector.java:284 - G1 >> Young Generation GC in 380ms. G1 Eden Space: 10569646080 -> 0; G1 Old Gen: >> 49289001104 -> 49586732696; G1 Survivor Space: 973078528 -> 1308622848; >> INFO [Service Thread] 2018-02-07 11:15:04,692 GCInspector.java:284 - G1 >> Young Generation GC in 360ms. G1 Eden Space: 9294577664 >> <(929)%20457-7664> -> 0; G1 Old Gen: 50671712912 -> 51269944472; G1 >> Survivor Space: 905969664 -> 805306368; >> WARN [Service Thread] 2018-02-07 11:15:07,317 GCInspector.java:282 - G1 >> Young Generation GC in 1102ms. G1 Eden Space: 2617245696 -> 0; G1 Old >> Gen: 51269944472 -> 47310521496; G1 Survivor Space: 805306368 -> >> 301989888; >> .... >> INFO [Service Thread] 2018-02-07 11:16:36,535 GCInspector.java:284 - G1 >> Young Generation GC in 377ms. G1 Eden Space: 7683964928 -> 0; G1 Old Gen: >> 51958433432 -> 52658554008; G1 Survivor Space: 1073741824 -> 1040187392; >> INFO [Service Thread] 2018-02-07 11:16:41,756 GCInspector.java:284 - G1 >> Young Generation GC in 340ms. G1 Eden Space: 7046430720 >> <(704)%20643-0720> -> 0; G1 Old Gen: 52624999576 -> 53299987616; G1 >> Survivor Space: 1040187392 -> 805306368; >> WARN [Service Thread] 2018-02-07 11:16:44,087 GCInspector.java:282 - G1 >> Young Generation GC in 1005ms. G1 Eden Space: 2617245696 -> 0; G1 Old >> Gen: 53299987616 -> 49659331752; G1 Survivor Space: 805306368 -> >> 436207616; >> ... >> INFO [Service Thread] 2018-02-07 11:25:40,902 GCInspector.java:284 - G1 >> Young Generation GC in 254ms. G1 Eden Space: 11475615744 -> 0; G1 Old Gen: >> 48904357040 -> 48904357544; G1 Survivor Space: 704643072 -> 805306368; >> INFO [Service Thread] 2018-02-07 11:26:11,424 GCInspector.java:284 - G1 >> Young Generation GC in 202ms. G1 Eden Space: 11005853696 -> 0; G1 Old Gen: >> 48904357544 -> 49321014960; G1 Survivor Space: 939524096 -> 536870912; >> WARN [Service Thread] 2018-02-07 11:26:44,484 GCInspector.java:282 - G1 >> Young Generation GC in 1295ms. G1 Eden Space: 2617245696 -> 0; G1 Old >> Gen: 49321014960 -> 46255753384; G1 Survivor Space: 805306368 -> >> 402653184; >> ... >> INFO [Service Thread] 2018-02-07 11:30:37,828 GCInspector.java:284 - G1 >> Young Generation GC in 958ms. G1 Eden Space: 2785017856 -> 0; G1 Old >> Gen: 51196393000 -> 50629766184; G1 Survivor Space: 637534208 -> >> 436207616; >> INFO [Service Thread] 2018-02-07 11:30:45,036 GCInspector.java:284 - G1 >> Young Generation GC in 270ms. G1 Eden Space: 10267656192 -> 0; G1 Old >> Gen: 50629766184 -> 50626254144; G1 Survivor Space: 436207616 -> >> 738197504; >> INFO [Service Thread] 2018-02-07 11:31:48,128 GCInspector.java:284 - G1 >> Young Generation GC in 984ms. G1 Eden Space: 2617245696 -> 0; G1 Old >> Gen: 51086410272 -> 50443965480; G1 Survivor Space: 805306368 -> >> 369098752; >> >> >> jvm.options as following (comments removed): >> ## Use the Hotspot garbage-first collector. >> -XX:+UseG1GC >> -XX:MaxGCPauseMillis=1000 >> -XX:InitiatingHeapOccupancyPercent=70 >> -XX:ParallelGCThreads=16 >> -XX:ConcGCThreads=16 >> >> ### GC logging options -- uncomment to enable >> -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> -XX:+PrintHeapAtGC >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCApplicationStoppedTime >> -XX:+PrintPromotionFailure >> #-XX:PrintFLSStatistics=1 >> #-Xloggc:/var/log/cassandra/gc.log >> -XX:+UseGCLogFileRotation >> -XX:NumberOfGCLogFiles=10 >> -XX:GCLogFileSize=10M >> >> I tried this with ParNewGC and ConcMarkSweepGC, too - and I have the same >> behavior there, too. >> >> From nodetool netstats I see that it wants to Stream about 55,9 GB. >> After 1,5h of streaming with more than 10MB/s (about 54GB seen with >> dstat) nodetool netstats displays that only 3,5GB of 55,9 GB already >> received. >> >> uptime >> 11:30:52 up 1:36, 3 users, load average: 106.01, 87.54, 66.01 >> >> nodetool netstats (truncated for better reading) >> Wed Feb 7 11:19:07 CET 2018 >> Mode: JOINING >> Bootstrap 56d204d0-0be9-11e8-ae30-617216855b4a >> /192.168.1.213 - Receiving 68 files, 6.774.831.556 bytes total. >> Already received 3 files, 279.238.740 bytes total >> /192.168.1.215 - Receiving 68 files, 5.721.460.494 bytes total. >> Already received 4 files, 109.051.913 bytes total >> /192.168.1.214 - Receiving 68 files, 7.497.726.056 bytes total. >> Already received 4 files, 870.592.708 bytes total >> /192.168.1.207 - Receiving 63 files, 4.945.809.501 bytes total. >> Already received 4 files, 700.599.427 bytes total >> /192.168.1.232 - Receiving 91 files, 7.344.537.682 bytes total. >> Already received 3 files, 237.482.005 bytes total >> /192.168.1.209 - Receiving 102 files, 15.931.849.729 bytes total. >> Already received 3 files, 1.108.754.920 bytes total >> /192.168.1.231 - Receiving 92 files, 7.927.882.516 bytes total. >> Already received 4 files, 269.514.936 bytes total >> >> >> nodetool status: >> Datacenter: main >> ================ >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host ID >> Rack >> UN 192.168.1.232 83,31 GiB 256 ? >> 510a0068-ee2b-4d1f-9965-9e29602d2f8f rack04 >> UN 192.168.1.206 51,41 GiB 256 ? >> a168b632-52e7-408a-ae7f-6ba6b9c55cea rack01 >> UN 192.168.1.207 57,66 GiB 256 ? >> 7401ab8f-114d-41b4-801d-53a4b042de52 rack01 >> UN 192.168.1.208 56,47 GiB 256 ? >> 767980ef-52f2-4c21-8567-324fc1db274c rack01... >> UJ 192.168.1.160 68,95 GiB 256 ? >> a3a5a169-512f-4e1f-8c0b-419c828f23e1 rack02 >> UN 192.168.1.209 94,27 GiB 256 ? >> 8757cb4a-183e-4828-8212-7715b5563935 rack02 >> UN 192.168.1.213 78,26 GiB 256 ? >> b1e9481c-4ba2-4396-837a-84be35737fe7 rack05 >> UN 192.168.1.214 80,66 GiB 256 ? >> 457fc606-7002-49ad-8da5-309b92093acf rack06 >> UN 192.168.1.231 87,5 GiB 256 ? >> 2017a9e8-3638-465e-bc4a-5e59e693fb49 rack03 >> UN 192.168.1.215 86,97 GiB 256 ? >> 5dfe4c35-8f8a-4305-824a-4610cec9411b rack07 >> >> thanks, and kind regards >> Juergen >> > >