Keith, If I'm understanding your schema it sounds like you have rows/partitions with TTLs that continuously grow over time. This sounds a lot like https://issues.apache.org/jira/browse/CASSANDRA-6654 which was marked as working as designed, but totally unintuitive. (Johnathan Ellis went off and spent the time to reproduce which indicates to me it was equally unintuitive to someone who is pretty knowlegeable on Cassandra.).
Ken On Thu, Jul 24, 2014 at 10:10 PM, Keith Wright <kwri...@nanigans.com> wrote: > One last item to add to this thread: we have consistently experienced > this behavior where over time performance degrades (previously we were > unable to bootstrap nodes to due long GC pauses from existing nodes). I > believe its due to tombstone build up (as I mentioned previously one of the > tables mentioned is showing a droppable tombstone ratio of > 30%). The > sku table is used to hold SKUs that recent users viewed/purchased. When we > write a new SKU we set the TTL to 30 days where the row key is the user id; > our read case is to fetch ALL skus the user has seen within the TTL. Since > the user sees SKUs consistently over time, this can result in a row with > many columns much of which are likely tombstoned (see CASSANDRA-6654 which > I filed for this which shows that C* does not handle this case well). > > I guess I’m just surprised that others aren’t using C* for similar usage > cases and thus having the same issue? > > I am hoping to upgrade to 2.0.9 which has improvements to remove > tombstones. > > From: Keith Wright <kwri...@nanigans.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Thursday, July 24, 2014 at 4:50 PM > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Cc: Don Jackson <djack...@nanigans.com> > > Subject: Re: Hot, large row > > When a node is showing the high CMS issue, io is actually low likely due > to the fact that none is going on during CMS GC. On a node not showing the > issue, iostat shows disk usage around 50% (these are SSD) and load hovers > around 10 for a dual octo core machine this is fine. > > In addition, nodetool compactionstats does not show that we are falling > behind in compactions. > > So not sure what is going on here. We are running CentOS 6.5 with java > 1.7.0_51. It does seem like things are getting worse and I’m considering > dropping and rebuilding all the tables (as I have the data in Hadoop). > This seems to be a repeated problem for us with Cassandra and now that > Aerospike has an open source version, we are very much considering > switching. > > Thanks again for the help and any insight you might have! > > avg-cpu: %user %nice %system %iowait %steal %idle > > 23.43 12.40 11.40 6.20 0.00 46.57 > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > > sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66 > 1.05 1.54 0.57 41.35 > > sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23 > 2.30 0.29 0.06 51.25 > > sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80 > 4.31 0.48 0.07 59.60 > > > avg-cpu: %user %nice %system %iowait %steal %idle > > 9.48 14.72 5.60 3.67 0.00 66.52 > > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > > sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53 > 0.04 0.13 0.12 4.25 > > sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94 > 0.99 0.18 0.08 45.10 > > sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28 > 1.10 0.19 0.08 47.45 > > From: DuyHai Doan <doanduy...@gmail.com> > Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Date: Thursday, July 24, 2014 at 4:44 PM > To: "user@cassandra.apache.org" <user@cassandra.apache.org> > Subject: Re: Hot, large row > > For global_user_event_skus_v2 > > 1. number of SSTables per read is quite huge. Considering you're using > LCS, it means that LCS cannot keep up with write rate and is left behind. > AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write > burst. Your high number of SSTables per read is quite fishy here. > > 2. Write latency is widespead up to 73.457 millisecs, meaning that your > node is getting behind for write in some cases. Most of write are still > below 1 millisec but we don't care. What we care here is the large trail of > write latency climbing up to 73 millisecs > > 3. Same remarks for read latency which is worst because the distribution > is even "flatter", worst cases going up to 100 ms. > > If I were you, I'll check for disk I/O first and maybe CPU usage > > > On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kwri...@nanigans.com> > wrote: > >> Cfhistograms for the tables I believe are most likely the issue are below >> on the node that most recently presented the issue. Any ideas? Note that >> these tables are LCS and have droppable tombstone ratios of 27% for >> global_user_event_skus_v2 and 2.7% for the other. Table definitions also >> below, you can see that they do use compound keys. >> >> Thanks >> >> CREATE TABLE global_user_event_skus_v2 ( >> >> user_id timeuuid, >> >> app_id int, >> >> type text, >> >> name text, >> >> sku text, >> >> creationtime timestamp, >> >> PRIMARY KEY (user_id, app_id, type, name, sku) >> >> ) WITH >> >> bloom_filter_fp_chance=0.100000 AND >> >> caching='KEYS_ONLY' AND >> >> comment='' AND >> >> dclocal_read_repair_chance=0.000000 AND >> >> gc_grace_seconds=86400 AND >> >> index_interval=128 AND >> >> read_repair_chance=0.100000 AND >> >> replicate_on_write='true' AND >> >> populate_io_cache_on_flush='false' AND >> >> default_time_to_live=2592000 AND >> >> speculative_retry='99.0PERCENTILE' AND >> >> memtable_flush_period_in_ms=0 AND >> >> compaction={'class': 'LeveledCompactionStrategy'} AND >> >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> >> users/global_user_event_skus_v2 histograms >> >> >> SSTables per Read >> >> 1 sstables: 1398122 >> >> 2 sstables: 231100 >> >> 3 sstables: 97334 >> >> 4 sstables: 56496 >> >> 5 sstables: 43046 >> >> 6 sstables: 35471 >> >> 7 sstables: 26888 >> >> 8 sstables: 21102 >> >> 10 sstables: 29522 >> >> 12 sstables: 18314 >> >> 14 sstables: 11368 >> >> 17 sstables: 10283 >> >> 20 sstables: 5766 >> >> 24 sstables: 3554 >> >> 29 sstables: 1681 >> >> 35 sstables: 521 >> >> 42 sstables: 102 >> >> 50 sstables: 1 >> >> >> Write Latency (microseconds) >> >> 2 us: 532 >> >> 3 us: 10728 >> >> 4 us: 49608 >> >> 5 us: 173718 >> >> 6 us: 404131 >> >> 7 us: 697468 >> >> 8 us: 981048 >> >> 10 us: 2643680 >> >> 12 us: 3183727 >> >> 14 us: 3150881 >> >> 17 us: 3746333 >> >> 20 us: 2298481 >> >> 24 us: 1647604 >> >> 29 us: 1070733 >> >> 35 us: 693300 >> >> 42 us: 442120 >> >> 50 us: 288618 >> >> 60 us: 205061 >> >> 72 us: 141194 >> >> 86 us: 98781 >> >> 103 us: 73629 >> >> 124 us: 54900 >> >> 149 us: 40518 >> >> 179 us: 30330 >> >> 215 us: 23953 >> >> 258 us: 18875 >> >> 310 us: 14147 >> >> 372 us: 10509 >> >> 446 us: 7984 >> >> 535 us: 6257 >> >> 642 us: 4903 >> >> 770 us: 4086 >> >> 924 us: 3252 >> >> 1109 us: 2676 >> >> 1331 us: 2129 >> >> 1597 us: 1753 >> >> 1916 us: 1488 >> >> 2299 us: 1237 >> >> 2759 us: 1078 >> >> 3311 us: 866 >> >> 3973 us: 704 >> >> 4768 us: 551 >> >> 5722 us: 553 >> >> 6866 us: 774 >> >> 8239 us: 443 >> >> 9887 us: 303 >> >> 11864 us: 222 >> >> 14237 us: 181 >> >> 17084 us: 146 >> >> 20501 us: 147 >> >> 24601 us: 499 >> >> 29521 us: 842 >> >> 35425 us: 548 >> >> 42510 us: 303 >> >> 51012 us: 560 >> >> 61214 us: 842 >> >> 73457 us: 600 >> >> 88148 us: 283 >> >> 105778 us: 173 >> >> 126934 us: 81 >> >> 152321 us: 18 >> >> 182785 us: 2 >> >> >> Read Latency (microseconds) >> >> 35 us: 2 >> >> 42 us: 72 >> >> 50 us: 737 >> >> 60 us: 3555 >> >> 72 us: 6926 >> >> 86 us: 8373 >> >> 103 us: 15287 >> >> 124 us: 30054 >> >> 149 us: 46164 >> >> 179 us: 67828 >> >> 215 us: 86741 >> >> 258 us: 92178 >> >> 310 us: 82205 >> >> 372 us: 69641 >> >> 446 us: 64710 >> >> 535 us: 58295 >> >> 642 us: 55247 >> >> 770 us: 57918 >> >> 924 us: 56928 >> >> 1109 us: 75762 >> >> 1331 us: 122306 >> >> 1597 us: 141564 >> >> 1916 us: 117301 >> >> 2299 us: 102293 >> >> 2759 us: 98789 >> >> 3311 us: 86923 >> >> 3973 us: 75304 >> >> 4768 us: 64012 >> >> 5722 us: 54973 >> >> 6866 us: 47221 >> >> 8239 us: 41483 >> >> 9887 us: 35015 >> >> 11864 us: 30249 >> >> 14237 us: 23505 >> >> 17084 us: 18364 >> >> 20501 us: 12992 >> >> 24601 us: 8837 >> >> 29521 us: 6497 >> >> 35425 us: 4840 >> >> 42510 us: 3536 >> >> 51012 us: 2471 >> >> 61214 us: 2784 >> >> 73457 us: 3664 >> >> 88148 us: 3219 >> >> 105778 us: 1980 >> >> 126934 us: 1050 >> >> 152321 us: 519 >> >> 182785 us: 184 >> >> 219342 us: 82 >> >> 263210 us: 39 >> >> 315852 us: 22 >> >> 379022 us: 16 >> >> 454826 us: 0 >> >> 545791 us: 0 >> >> 654949 us: 1 >> >> 785939 us: 1 >> >> 943127 us: 0 >> >> 1131752 us: 0 >> >> 1358102 us: 0 >> >> 1629722 us: 3 >> >> 1955666 us: 3 >> >> 2346799 us: 1 >> >> 2816159 us: 0 >> >> 3379391 us: 0 >> >> 4055269 us: 0 >> >> 4866323 us: 0 >> >> 5839588 us: 0 >> >> 7007506 us: 4 >> >> >> Partition Size (bytes) >> >> 35 bytes: 31 >> >> 42 bytes: 0 >> >> 50 bytes: 0 >> >> 60 bytes: 0 >> >> 72 bytes: 0 >> >> 86 bytes: 0 >> >> 103 bytes: 761108 >> >> 124 bytes: 23596164 >> >> 149 bytes: 22963275 >> >> 179 bytes: 3517971 >> >> 215 bytes: 6395335 >> >> 258 bytes: 6197982 >> >> 310 bytes: 4662589 >> >> 372 bytes: 5261934 >> >> 446 bytes: 3032032 >> >> 535 bytes: 3775310 >> >> 642 bytes: 2879935 >> >> 770 bytes: 3002743 >> >> 924 bytes: 2279900 >> >> 1109 bytes: 1976357 >> >> 1331 bytes: 1643875 >> >> 1597 bytes: 1422199 >> >> 1916 bytes: 1142746 >> >> 2299 bytes: 901584 >> >> 2759 bytes: 760173 >> >> 3311 bytes: 596874 >> >> 3973 bytes: 480502 >> >> 4768 bytes: 380185 >> >> 5722 bytes: 303315 >> >> 6866 bytes: 239013 >> >> 8239 bytes: 187337 >> >> 9887 bytes: 147218 >> >> 11864 bytes: 115684 >> >> 14237 bytes: 89718 >> >> 17084 bytes: 69160 >> >> 20501 bytes: 53070 >> >> 24601 bytes: 40496 >> >> 29521 bytes: 30344 >> >> 35425 bytes: 22656 >> >> 42510 bytes: 16419 >> >> 51012 bytes: 11621 >> >> 61214 bytes: 8020 >> >> 73457 bytes: 5561 >> >> 88148 bytes: 3708 >> >> 105778 bytes: 2434 >> >> 126934 bytes: 1561 >> >> 152321 bytes: 1017 >> >> 182785 bytes: 559 >> >> 219342 bytes: 349 >> >> 263210 bytes: 181 >> >> 315852 bytes: 121 >> >> 379022 bytes: 66 >> >> 454826 bytes: 22 >> >> 545791 bytes: 14 >> >> 654949 bytes: 10 >> >> 785939 bytes: 2 >> >> 943127 bytes: 0 >> >> 1131752 bytes: 4 >> >> 1358102 bytes: 2 >> >> 1629722 bytes: 1 >> >> 1955666 bytes: 2 >> >> 2346799 bytes: 0 >> >> 2816159 bytes: 0 >> >> 3379391 bytes: 0 >> >> 4055269 bytes: 0 >> >> 4866323 bytes: 1 >> >> >> Cell Count per Partition >> >> 1 cells: 50599458 >> >> 2 cells: 13975363 >> >> 3 cells: 7927898 >> >> 4 cells: 4981142 >> >> 5 cells: 3081166 >> >> 6 cells: 3034895 >> >> 7 cells: 1896720 >> >> 8 cells: 1508849 >> >> 10 cells: 2418845 >> >> 12 cells: 1631289 >> >> 14 cells: 1179269 >> >> 17 cells: 1301647 >> >> 20 cells: 919744 >> >> 24 cells: 864313 >> >> 29 cells: 735203 >> >> 35 cells: 597749 >> >> 42 cells: 469611 >> >> 50 cells: 367422 >> >> 60 cells: 312697 >> >> 72 cells: 251782 >> >> 86 cells: 197717 >> >> 103 cells: 161244 >> >> 124 cells: 132074 >> >> 149 cells: 103711 >> >> 179 cells: 81760 >> >> 215 cells: 63360 >> >> 258 cells: 48982 >> >> 310 cells: 37932 >> >> 372 cells: 28328 >> >> 446 cells: 21347 >> >> 535 cells: 15373 >> >> 642 cells: 10959 >> >> 770 cells: 7643 >> >> 924 cells: 5346 >> >> 1109 cells: 3613 >> >> 1331 cells: 2291 >> >> 1597 cells: 1558 >> >> 1916 cells: 885 >> >> 2299 cells: 577 >> >> 2759 cells: 327 >> >> 3311 cells: 169 >> >> 3973 cells: 116 >> >> 4768 cells: 57 >> >> 5722 cells: 23 >> >> 6866 cells: 15 >> >> 8239 cells: 9 >> >> 9887 cells: 3 >> >> 11864 cells: 0 >> >> 14237 cells: 5 >> >> 17084 cells: 1 >> >> 20501 cells: 0 >> >> 24601 cells: 2 >> >> 29521 cells: 0 >> >> 35425 cells: 0 >> >> 42510 cells: 0 >> >> 51012 cells: 0 >> >> 61214 cells: 1 >> >> >> CREATE TABLE global_user_events ( >> >> user_id timeuuid, >> >> app_id int, >> >> type text, >> >> name text, >> >> extra_param map<text, text>, >> >> paid boolean, >> >> value float, >> >> PRIMARY KEY (user_id, app_id, type, name) >> >> ) WITH >> >> bloom_filter_fp_chance=0.100000 AND >> >> caching='KEYS_ONLY' AND >> >> comment='' AND >> >> dclocal_read_repair_chance=0.000000 AND >> >> gc_grace_seconds=86400 AND >> >> index_interval=128 AND >> >> read_repair_chance=0.100000 AND >> >> replicate_on_write='true' AND >> >> populate_io_cache_on_flush='false' AND >> >> default_time_to_live=0 AND >> >> speculative_retry='99.0PERCENTILE' AND >> >> memtable_flush_period_in_ms=0 AND >> >> compaction={'class': 'LeveledCompactionStrategy'} AND >> >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> >> users/global_user_events histograms >> >> >> SSTables per Read >> >> 1 sstables: 1024446 >> >> 2 sstables: 632948 >> >> 3 sstables: 230768 >> >> 4 sstables: 112428 >> >> 5 sstables: 82651 >> >> 6 sstables: 61439 >> >> 7 sstables: 47518 >> >> 8 sstables: 38490 >> >> 10 sstables: 58054 >> >> 12 sstables: 41128 >> >> 14 sstables: 29642 >> >> 17 sstables: 31056 >> >> 20 sstables: 20474 >> >> 24 sstables: 17799 >> >> 29 sstables: 13744 >> >> 35 sstables: 9485 >> >> 42 sstables: 5933 >> >> 50 sstables: 3607 >> >> 60 sstables: 1618 >> >> 72 sstables: 263 >> >> 86 sstables: 10 >> >> >> Write Latency (microseconds) >> >> 10 us: 10 >> >> 12 us: 75 >> >> 14 us: 319 >> >> 17 us: 3550 >> >> 20 us: 20617 >> >> 24 us: 105254 >> >> 29 us: 311290 >> >> 35 us: 494404 >> >> 42 us: 516301 >> >> 50 us: 534061 >> >> 60 us: 421438 >> >> 72 us: 314377 >> >> 86 us: 375240 >> >> 103 us: 333402 >> >> 124 us: 816891 >> >> 149 us: 667832 >> >> 179 us: 430659 >> >> 215 us: 336063 >> >> 258 us: 380606 >> >> 310 us: 235592 >> >> 372 us: 68679 >> >> 446 us: 14988 >> >> 535 us: 6169 >> >> 642 us: 3763 >> >> 770 us: 2591 >> >> 924 us: 2270 >> >> 1109 us: 1985 >> >> 1331 us: 1744 >> >> 1597 us: 1564 >> >> 1916 us: 1489 >> >> 2299 us: 1326 >> >> 2759 us: 1114 >> >> 3311 us: 1024 >> >> 3973 us: 984 >> >> 4768 us: 833 >> >> 5722 us: 962 >> >> 6866 us: 1076 >> >> 8239 us: 891 >> >> 9887 us: 611 >> >> 11864 us: 489 >> >> 14237 us: 324 >> >> 17084 us: 306 >> >> 20501 us: 222 >> >> 24601 us: 426 >> >> 29521 us: 791 >> >> 35425 us: 820 >> >> 42510 us: 534 >> >> 51012 us: 530 >> >> 61214 us: 946 >> >> 73457 us: 1122 >> >> 88148 us: 714 >> >> 105778 us: 313 >> >> 126934 us: 127 >> >> 152321 us: 38 >> >> 182785 us: 3 >> >> 219342 us: 2 >> >> 263210 us: 0 >> >> 315852 us: 0 >> >> 379022 us: 0 >> >> 454826 us: 2 >> >> 545791 us: 0 >> >> 654949 us: 3 >> >> 785939 us: 0 >> >> 943127 us: 0 >> >> 1131752 us: 0 >> >> 1358102 us: 0 >> >> 1629722 us: 0 >> >> 1955666 us: 0 >> >> 2346799 us: 1 >> >> 2816159 us: 0 >> >> 3379391 us: 0 >> >> 4055269 us: 0 >> >> 4866323 us: 0 >> >> 5839588 us: 0 >> >> 7007506 us: 1 >> >> >> Read Latency (microseconds) >> >> 42 us: 1 >> >> 50 us: 2 >> >> 60 us: 2 >> >> 72 us: 4 >> >> 86 us: 15 >> >> 103 us: 23 >> >> 124 us: 29 >> >> 149 us: 48 >> >> 179 us: 67 >> >> 215 us: 80 >> >> 258 us: 316 >> >> 310 us: 1812 >> >> 372 us: 4999 >> >> 446 us: 8876 >> >> 535 us: 14586 >> >> 642 us: 24417 >> >> 770 us: 31943 >> >> 924 us: 37453 >> >> 1109 us: 77449 >> >> 1331 us: 139135 >> >> 1597 us: 177538 >> >> 1916 us: 169568 >> >> 2299 us: 177617 >> >> 2759 us: 202552 >> >> 3311 us: 198101 >> >> 3973 us: 179807 >> >> 4768 us: 159149 >> >> 5722 us: 142282 >> >> 6866 us: 126124 >> >> 8239 us: 111529 >> >> 9887 us: 97658 >> >> 11864 us: 84913 >> >> 14237 us: 70905 >> >> 17084 us: 54917 >> >> 20501 us: 40623 >> >> 24601 us: 29545 >> >> 29521 us: 21448 >> >> 35425 us: 15255 >> >> 42510 us: 11222 >> >> 51012 us: 8150 >> >> 61214 us: 7473 >> >> 73457 us: 10060 >> >> 88148 us: 10530 >> >> 105778 us: 7573 >> >> 126934 us: 4331 >> >> 152321 us: 2097 >> >> 182785 us: 836 >> >> 219342 us: 255 >> >> 263210 us: 84 >> >> 315852 us: 28 >> >> 379022 us: 8 >> >> 454826 us: 2 >> >> 545791 us: 0 >> >> 654949 us: 2 >> >> 785939 us: 1 >> >> 943127 us: 0 >> >> 1131752 us: 0 >> >> 1358102 us: 0 >> >> 1629722 us: 2 >> >> 1955666 us: 6 >> >> 2346799 us: 18 >> >> 2816159 us: 0 >> >> 3379391 us: 0 >> >> 4055269 us: 0 >> >> 4866323 us: 0 >> >> 5839588 us: 0 >> >> 7007506 us: 34 >> >> >> Partition Size (bytes) >> >> 35 bytes: 25 >> >> 42 bytes: 0 >> >> 50 bytes: 0 >> >> 60 bytes: 0 >> >> 72 bytes: 21761 >> >> 86 bytes: 694466658 >> >> 103 bytes: 219007195 >> >> 124 bytes: 7034596 >> >> 149 bytes: 110269079 >> >> 179 bytes: 101276811 >> >> 215 bytes: 38518027 >> >> 258 bytes: 103242356 >> >> 310 bytes: 51434914 >> >> 372 bytes: 38048245 >> >> 446 bytes: 42558831 >> >> 535 bytes: 32869176 >> >> 642 bytes: 17188026 >> >> 770 bytes: 14750559 >> >> 924 bytes: 15963256 >> >> 1109 bytes: 16125324 >> >> 1331 bytes: 11462272 >> >> 1597 bytes: 6810452 >> >> 1916 bytes: 5685739 >> >> 2299 bytes: 10232324 >> >> 2759 bytes: 5138522 >> >> 3311 bytes: 5847998 >> >> 3973 bytes: 4065752 >> >> 4768 bytes: 3468263 >> >> 5722 bytes: 2545431 >> >> 6866 bytes: 1701913 >> >> 8239 bytes: 993808 >> >> 9887 bytes: 495282 >> >> 11864 bytes: 208644 >> >> 14237 bytes: 74307 >> >> 17084 bytes: 22286 >> >> 20501 bytes: 5474 >> >> 24601 bytes: 1225 >> >> 29521 bytes: 210 >> >> 35425 bytes: 97 >> >> 42510 bytes: 48 >> >> 51012 bytes: 21 >> >> 61214 bytes: 15 >> >> 73457 bytes: 10 >> >> 88148 bytes: 3 >> >> 105778 bytes: 4 >> >> 126934 bytes: 1 >> >> 152321 bytes: 2 >> >> 182785 bytes: 1 >> >> >> Cell Count per Partition >> >> 1 cells: 914323128 >> >> 2 cells: 196020828 >> >> 3 cells: 165502164 >> >> 4 cells: 71735648 >> >> 5 cells: 54450466 >> >> 6 cells: 28995756 >> >> 7 cells: 9625976 >> >> 8 cells: 10257788 >> >> 10 cells: 24004757 >> >> 12 cells: 13087147 >> >> 14 cells: 13588347 >> >> 17 cells: 11976353 >> >> 20 cells: 6245374 >> >> 24 cells: 5627821 >> >> 29 cells: 9480533 >> >> 35 cells: 6008276 >> >> 42 cells: 5221357 >> >> 50 cells: 4355454 >> >> 60 cells: 3955488 >> >> 72 cells: 2872578 >> >> 86 cells: 1914447 >> >> 103 cells: 1216276 >> >> 124 cells: 652035 >> >> 149 cells: 279565 >> >> 179 cells: 99836 >> >> 215 cells: 29132 >> >> 258 cells: 6682 >> >> 310 cells: 1276 >> >> 372 cells: 258 >> >> 446 cells: 96 >> >> 535 cells: 48 >> >> 642 cells: 25 >> >> 770 cells: 9 >> >> 924 cells: 9 >> >> 1109 cells: 4 >> >> 1331 cells: 3 >> >> 1597 cells: 1 >> >> 1916 cells: 2 >> >> From: Jack Krupansky <j...@basetechnology.com> >> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Date: Thursday, July 24, 2014 at 4:07 PM >> >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Hot, large row >> >> Could it be some “fat columns” (cells with large blob or text values) >> rather than the cell-count per se? IOW, a “big row” rather than a “wide >> row”? >> >> And, could it be a large partition rather than a large row (many rows in >> a single partition)? Are clustering columns being used in the primary key? >> >> -- Jack Krupansky >> >> *From:* DuyHai Doan <doanduy...@gmail.com> >> *Sent:* Thursday, July 24, 2014 3:53 PM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Hot, large row >> >> Your extract of cfhistograms show that there are no particular "wide >> rows". The widest has 61214 cells which is big but not that huge to be >> really a concern. >> >> Turning on trace probabilty only tells give you some "hints" about what >> kind of queries are done, it does not give the exact partition key nor >> other statement values, especially when you are using prepared statements >> ... >> >> >> "I am considering reducing read_request_timeout_in_ms: 5000 in >> cassandra.yaml so that it reduces the impact when this occurs." --> Don't >> do that, you'll only sweep dust under the carpet. Find the real issue and >> fix it instead of changing parameter to hide it. >> >> One solution would be on client side, to activate some logging to show >> the CQL3 statements the application is issuing that may overload the >> server. I know that's better said than done but I don't have any other >> idea for the moment >> >> -------- Shameless self-promotion ------ >> >> To support this kind of live prod debugging & investigation that I add a >> new dynamic query logging feature in Achilles: >> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging >> >> Once you hit the issue, this kind of feature may save your day... >> >> >> >> >> >> >> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kwri...@nanigans.com> >> wrote: >> >>> I can see from cfhistograms that I do have some wide rows (see below). >>> I set trace probability as you suggested but the output doesn’t appear to >>> tell me what row was actually read unless I missed something. I just see >>> executing prepared statement. Any ideas how I can find the row in >>> question? >>> >>> I am considering reducing read_request_timeout_in_ms: 5000 in >>> cassandra.yaml so that it reduces the impact when this occurs. >>> >>> Any help in identifying my issue would be GREATLY appreciated >>> >>> >>> Cell Count per Partition >>> >>> 1 cells: 50449950 >>> >>> 2 cells: 14281828 >>> >>> 3 cells: 8093366 >>> >>> 4 cells: 5029200 >>> >>> 5 cells: 3103023 >>> >>> 6 cells: 3059903 >>> >>> 7 cells: 1903018 >>> >>> 8 cells: 1509297 >>> >>> 10 cells: 2420359 >>> >>> 12 cells: 1624895 >>> >>> 14 cells: 1171678 >>> >>> 17 cells: 1289391 >>> >>> 20 cells: 909777 >>> >>> 24 cells: 852081 >>> >>> 29 cells: 722925 >>> >>> 35 cells: 587067 >>> >>> 42 cells: 459473 >>> >>> 50 cells: 358744 >>> >>> 60 cells: 304146 >>> >>> 72 cells: 244682 >>> >>> 86 cells: 191045 >>> >>> 103 cells: 155337 >>> >>> 124 cells: 127061 >>> >>> 149 cells: 98913 >>> >>> 179 cells: 77454 >>> >>> 215 cells: 59849 >>> >>> 258 cells: 46117 >>> >>> 310 cells: 35321 >>> >>> 372 cells: 26319 >>> >>> 446 cells: 19379 >>> >>> 535 cells: 13783 >>> >>> 642 cells: 9993 >>> >>> 770 cells: 6973 >>> >>> 924 cells: 4713 >>> >>> 1109 cells: 3229 >>> >>> 1331 cells: 2062 >>> >>> 1597 cells: 1338 >>> >>> 1916 cells: 773 >>> >>> 2299 cells: 495 >>> >>> 2759 cells: 268 >>> >>> 3311 cells: 150 >>> >>> 3973 cells: 100 >>> >>> 4768 cells: 42 >>> >>> 5722 cells: 24 >>> >>> 6866 cells: 12 >>> >>> 8239 cells: 9 >>> >>> 9887 cells: 3 >>> >>> 11864 cells: 0 >>> >>> 14237 cells: 5 >>> >>> 17084 cells: 1 >>> >>> 20501 cells: 0 >>> >>> 24601 cells: 2 >>> >>> 29521 cells: 0 >>> >>> 35425 cells: 0 >>> >>> 42510 cells: 0 >>> >>> 51012 cells: 0 >>> >>> 61214 cells: 2 >>> >>> From: DuyHai Doan <doanduy...@gmail.com> >>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Date: Thursday, July 24, 2014 at 3:01 PM >>> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >>> Subject: Re: Hot, large row >>> >>> "How can I detect wide rows?" --> >>> >>> nodetool cfhistograms <keyspace> <suspected column family> >>> >>> Look at column "Column count" (last column) and identify a line in this >>> column with very high value of "Offset". In a well designed application you >>> should have a gaussian distribution where 80% of your row have a similar >>> number of columns. >>> >>> "Anyone know what debug level I can set so that I can see what reads the >>> hot node is handling? " --> >>> >>> "nodetool settraceprobability <value>", where value is a small number >>> (0.001) on the node where you encounter the issue. Activate the tracing for >>> a while (5 mins) and deactivate it (value = 0). Then look into >>> system_traces tables "events" & "sessions". It may help or not since the >>> tracing is done once every 1000. >>> >>> "Any way to get the server to blacklist these wide rows automatically?" >>> --> No >>> >>> >>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kwri...@nanigans.com> >>> wrote: >>> >>>> Hi all, >>>> >>>> We are seeing an issue where basically daily one of our nodes spikes >>>> in load and is churning in CMS heap pressure. It appears that reads are >>>> backing up and my guess is that our application is reading a large row >>>> repeatedly. Our write structure can lead itself to wide rows very >>>> infrequently (<0.001%) and we do our best to detect and delete them but >>>> obviously we’re missing a case. Hoping for assistance on the following >>>> questions: >>>> >>>> - How can I detect wide rows? >>>> - Anyone know what debug level I can set so that I can see what >>>> reads the hot node is handling? I’m hoping to see the “bad” row >>>> - Any way to get the server to blacklist these wide rows >>>> automatically? >>>> >>>> We’re using C* 2.0.6 with Vnodes. >>>> >>>> Thanks >>>> >>> >>> >> >> > > -- *Ken Hancock *| System Architect, Advanced Advertising SeaChange International 50 Nagog Park Acton, Massachusetts 01720 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC <http://www.schange.com/en-US/Company/InvestorRelations.aspx> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn] <http://www.linkedin.com/in/kenhancock> [image: SeaChange International] <http://www.schange.com/>This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.