Attached are the sstablemeta outputs from 2 SSTables of size 28 MB and 52 MB (out2). The records are inserted with different TTLs based on their nature ; test records with 1 day, typeA records with 6 months, typeB records with 1 year etc. There are also explicit DELETEs from this table, though it's much lower than the rate of inserts.
I am not sure how to interpret this output, or if it's the right SSTables that were picked. Please advise. Is there a way to get the sstables corresponding to the keys that timed out, though they are accessible later. On Mon, Sep 5, 2016 at 10:58 PM, Anshu Vajpayee <anshu.vajpa...@gmail.com> wrote: > We have seen read time out issue in cassandra due to high droppable > tombstone ratio for repository. > > Please check for high droppable tombstone ratio for your repo. > > On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin <romainh...@yahoo.fr> > wrote: > >> Yes dclocal_read_repair_chance will reduce the cross-DC traffic and >> latency, so you can swap the values ( https://issues.apache.org/ji >> ra/browse/CASSANDRA-7320 ). I guess the sstable_size_in_mb was set to 50 >> because back in the day (C* 1.0) the default size was way too small: 5 MB. >> So maybe someone in your company tried "10 * the default" i.e. 50 MB. Now >> the default is 160 MB. I don't say to change the value but just keep in >> mind that you're using a small value here, it could help you someday. >> >> Regarding the cells, the histograms shows an *estimation* of the min, >> p50, ..., p99, max of cells based on SSTables metadata. On your screenshot, >> the Max is 4768. So you have a partition key with ~ 4768 cells. The p99 is >> 1109, so 99% of your partition keys have less than (or equal to) 1109 >> cells. >> You can see these data of a given sstable with the tool sstablemetadata. >> >> Best, >> >> Romain >> >> >> >> Le Lundi 5 septembre 2016 15h17, Joseph Tech <jaalex.t...@gmail.com> a >> écrit : >> >> >> Thanks, Romain . We will try to enable the DEBUG logging (assuming it >> won't clog the logs much) . Regarding the table configs, read_repair_chance >> must be carried over from older versions - mostly defaults. I think >> sstable_size_in_mb >> was set to limit the max SSTable size, though i am not sure on the reason >> for the 50 MB value. >> >> Does setting dclocal_read_repair_chance help in reducing cross-DC >> traffic (haven't looked into this parameter, just going by the name). >> >> By the cell count definition : is it incremented based on the number of >> writes for a given name(key?) and value. This table is heavy on reads and >> writes. If so, the value should be much higher? >> >> On Mon, Sep 5, 2016 at 7:35 AM, Romain Hardouin <romainh...@yahoo.fr> >> wrote: >> >> Hi, >> >> Try to put org.apache.cassandra.db. ConsistencyLevel at DEBUG level, it >> could help to find a regular pattern. By the way, I see that you have set a >> global read repair chance: >> read_repair_chance = 0.1 >> And not the local read repair: >> dclocal_read_repair_chance = 0.0 >> Is there any reason to do that or is it just the old (pre 2.0.9) default >> configuration? >> >> The cell count is the number of triplets: (name, value, timestamp) >> >> Also, I see that you have set sstable_size_in_mb at 50 MB. What is the >> rational behind this? (Yes I'm curious :-) ). Anyway your "SSTables per >> read" are good. >> >> Best, >> >> Romain >> >> Le Lundi 5 septembre 2016 13h32, Joseph Tech <jaalex.t...@gmail.com> a >> écrit : >> >> >> Hi Ryan, >> >> Attached are the cfhistograms run within few mins of each other. On the >> surface, don't see anything which indicates too much skewing (assuming >> skewing ==keys spread across many SSTables) . Please confirm. Related to >> this, what does the "cell count" metric indicate ; didn't find a clear >> explanation in the documents. >> >> Thanks, >> Joseph >> >> >> On Thu, Sep 1, 2016 at 6:30 PM, Ryan Svihla <r...@foundev.pro> wrote: >> >> Have you looked at cfhistograms/tablehistograms your data maybe just >> skewed (most likely explanation is probably the correct one here) >> >> Regard, >> >> Ryan Svihla >> >> _____________________________ >> From: Joseph Tech <jaalex.t...@gmail.com> >> Sent: Wednesday, August 31, 2016 11:16 PM >> Subject: Re: Read timeouts on primary key queries >> To: <user@cassandra.apache.org> >> >> >> >> Patrick, >> >> The desc table is below (only col names changed) : >> >> CREATE TABLE db.tbl ( >> id1 text, >> id2 text, >> id3 text, >> id4 text, >> f1 text, >> f2 map<text, text>, >> f3 map<text, text>, >> created timestamp, >> updated timestamp, >> PRIMARY KEY (id1, id2, id3, id4) >> ) WITH CLUSTERING ORDER BY (id2 ASC, id3 ASC, id4 ASC) >> AND bloom_filter_fp_chance = 0.01 >> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' >> AND comment = '' >> AND compaction = {'sstable_size_in_mb': '50', 'class': >> 'org.apache.cassandra.db. compaction. LeveledCompactionStrategy'} >> AND compression = {'sstable_compression': 'org.apache.cassandra.io. >> compress.LZ4Compressor'} >> AND dclocal_read_repair_chance = 0.0 >> AND default_time_to_live = 0 >> AND gc_grace_seconds = 864000 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.1 >> AND speculative_retry = '99.0PERCENTILE'; >> >> and the query is select * from tbl where id1=? and id2=? and id3=? and >> id4=? >> >> The timeouts happen within ~2s to ~5s, while the successful calls have >> avg of 8ms and p99 of 15s. These times are seen from app side, the actual >> query times would be slightly lower. >> >> Is there a way to capture traces only when queries take longer than a >> specified duration? . We can't enable tracing in production given the >> volume of traffic. We see that the same query which timed out works fine >> later, so not sure if the trace of a successful run would help. >> >> Thanks, >> Joseph >> >> >> On Wed, Aug 31, 2016 at 8:05 PM, Patrick McFadin <pmcfa...@gmail.com> >> wrote: >> >> If you are getting a timeout on one table, then a mismatch of RF and node >> count doesn't seem as likely. >> >> Time to look at your query. You said it was a 'select * from table where >> key=?' type query. I would next use the trace facility in cqlsh to >> investigate further. That's a good way to find hard to find issues. You >> should be looking for clear ledge where you go from single digit ms to 4 or >> 5 digit ms times. >> >> The other place to look is your data model for that table if you want to >> post the output from a desc table. >> >> Patrick >> >> >> >> On Tue, Aug 30, 2016 at 11:07 AM, Joseph Tech <jaalex.t...@gmail.com> >> wrote: >> >> On further analysis, this issue happens only on 1 table in the KS which >> has the max reads. >> >> @Atul, I will look at system health, but didnt see anything standing out >> from GC logs. (using JDK 1.8_92 with G1GC). >> >> @Patrick , could you please elaborate the "mismatch on node count + RF" >> part. >> >> On Tue, Aug 30, 2016 at 5:35 PM, Atul Saroha <atul.sar...@snapdeal.com> >> wrote: >> >> There could be many reasons for this if it is intermittent. CPU usage + >> I/O wait status. As read are I/O intensive, your IOPS requirement should be >> met that time load. Heap issue if CPU is busy for GC only. Network health >> could be the reason. So better to look system health during that time when >> it comes. >> >> ------------------------------ ------------------------------ >> ------------------------------ --------------------------- >> Atul Saroha >> *Lead Software Engineer* >> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369 >> Plot # 362, ASF Centre - Tower A, Udyog Vihar, >> Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA >> >> On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech <jaalex.t...@gmail.com> >> wrote: >> >> Hi Patrick, >> >> The nodetool status shows all nodes up and normal now. From OpsCenter >> "Event Log" , there are some nodes reported as being down/up etc. during >> the timeframe of timeout, but these are Search workload nodes from the >> remote (non-local) DC. The RF is 3 and there are 9 nodes per DC. >> >> Thanks, >> Joseph >> >> On Mon, Aug 29, 2016 at 11:07 PM, Patrick McFadin <pmcfa...@gmail.com> >> wrote: >> >> You aren't achieving quorum on your reads as the error is explains. That >> means you either have some nodes down or your topology is not matching up. >> The fact you are using LOCAL_QUORUM might point to a datacenter mis-match >> on node count + RF. >> >> What does your nodetool status look like? >> >> Patrick >> >> On Mon, Aug 29, 2016 at 10:14 AM, Joseph Tech <jaalex.t...@gmail.com> >> wrote: >> >> Hi, >> >> We recently started getting intermittent timeouts on primary key queries >> (select * from table where key=<key>) >> >> The error is : com.datastax.driver.core.excep tions.ReadTimeoutException: >> Cassandra timeout during read query at consistency LOCAL_QUORUM (2 >> responses were required but only 1 replica >> a responded) >> >> The same query would work fine when tried directly from cqlsh. There are >> no indications in system.log for the table in question, though there were >> compactions in progress for tables in another keyspace which is more >> frequently accessed. >> >> My understanding is that the chances of primary key queries timing out is >> very minimal. Please share the possible reasons / ways to debug this issue. >> >> We are using Cassandra 2.1 (DSE 4.8.7). >> >> Thanks, >> Joseph >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > > -- > *Regards,* > *Anshu * > > >
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.010000 Minimum timestamp: 1470438103239000 Maximum timestamp: 1473094828984000 SSTable max local deletion time: 1507654828 Compression ratio: 0.29882690692498587 Estimated droppable tombstones: 0.3068117974841481 SSTable Level: 0 Repaired at: 0 ReplayPosition(segmentId=1472205080751, position=57737) Estimated tombstone drop times:%n 1473088180: 1280 1473088440: 1290 1473088689: 1310 1473088946: 1464 1473089219: 1444 1473089497: 1489 1473089795: 1713 1473090088: 1500 1473090397: 1805 1473090687: 1388 1473090958: 1466 1473091241: 1628 1473091530: 1746 1473091813: 1450 1473092061: 1259 1473092296: 1273 1473092519: 1231 1473092753: 1277 1473092992: 1388 1473093223: 1324 1473093454: 1364 1473093678: 1351 1473093916: 1445 1473094163: 1529 1473094428: 1743 1473094710: 2358 1473174524: 46 1473174861: 68 1473175253: 78 1473175539: 89 1473175810: 47 1473176066: 28 1473176364: 57 1473176650: 11 1473176980: 66 1473177465: 35 1473177746: 86 1473178119: 39 1473178382: 52 1473178727: 66 1473178944: 10 1473179222: 140 1473179598: 69 1473179912: 98 1473180217: 6 1473180544: 81 1473180865: 49 1473181154: 75 1488640175: 1848 1488640437: 1888 1488640696: 1783 1488640965: 2358 1488641280: 2438 1488641576: 2283 1488641855: 2088 1488642123: 2018 1488642427: 2629 1488642750: 2332 1488643058: 2327 1488643350: 2281 1488643611: 2056 1488643865: 1979 1488644094: 1713 1488644324: 1911 1488644573: 2131 1488644822: 1707 1488645043: 1704 1488645260: 1661 1488645476: 1776 1488645702: 2022 1488645939: 2035 1488646187: 2023 1488646438: 2212 1488646703: 2509 1504998103: 1 1507648181: 1624 1507648438: 1669 1507648698: 1746 1507648964: 1948 1507649245: 1883 1507649525: 1760 1507649785: 1744 1507650053: 1822 1507650324: 1790 1507650567: 1630 1507650820: 1669 1507651084: 1928 1507651353: 1859 1507651618: 2000 1507651919: 2342 1507652222: 1894 1507652478: 1850 1507652724: 1606 1507652954: 1882 1507653204: 2020 1507653488: 2371 1507653794: 2425 1507654090: 2355 1507654380: 2435 1507654701: 4267 Count Row Size Cell Count 1 0 60 2 0 3419 3 0 824 4 0 1789 5 0 11810 6 0 4801 7 0 432 8 0 709 10 0 339 12 0 113 14 0 84 17 0 13 20 0 4 24 0 0 29 0 0 35 0 0 42 0 0 50 0 0 60 0 0 72 0 0 86 0 0 103 215 0 124 778 0 149 5 0 179 4 0 215 129 0 258 108 0 310 3199 0 372 830 0 446 1348 0 535 11690 0 642 4307 0 770 865 0 924 586 0 1109 211 0 1331 77 0 1597 31 0 1916 14 0 2299 0 0 2759 0 0 3311 0 0 3973 0 0 4768 0 0 5722 0 0 6866 0 0 8239 0 0 9887 0 0 11864 0 0 14237 0 0 17084 0 0 20501 0 0 24601 0 0 29521 0 0 35425 0 0 42510 0 0 51012 0 0 61214 0 0 73457 0 0 88148 0 0 105778 0 0 126934 0 0 152321 0 0 182785 0 0 219342 0 0 263210 0 0 315852 0 0 379022 0 0 454826 0 0 545791 0 0 654949 0 0 785939 0 0 943127 0 0 1131752 0 0 1358102 0 0 1629722 0 0 1955666 0 0 2346799 0 0 2816159 0 0 3379391 0 0 4055269 0 0 4866323 0 0 5839588 0 0 7007506 0 0 8409007 0 0 10090808 0 0 12108970 0 0 14530764 0 0 17436917 0 0 20924300 0 0 25109160 0 0 30130992 0 0 36157190 0 0 43388628 0 0 52066354 0 0 62479625 0 0 74975550 0 0 89970660 0 0 107964792 0 0 129557750 0 0 155469300 0 0 186563160 0 0 223875792 0 0 268650950 0 0 322381140 0 0 386857368 0 0 464228842 0 0 557074610 0 0 668489532 0 0 802187438 0 0 962624926 0 0 1155149911 0 0 1386179893 0 0 1663415872 0 0 1996099046 0 0 2395318855 0 0 2874382626 0 3449259151 0 4139110981 0 4966933177 0 5960319812 0 7152383774 0 8582860529 0 10299432635 0 12359319162 0 14831182994 0 17797419593 0 21356903512 0 25628284214 0 30753941057 0 36904729268 0 44285675122 0 53142810146 0 63771372175 0 76525646610 0 91830775932 0 110196931118 0 132236317342 0 158683580810 0 190420296972 0 228504356366 0 274205227639 0 329046273167 0 394855527800 0 473826633360 0 568591960032 0 682310352038 0 818772422446 0 982526906935 0 1179032288322 0 1414838745986 0 Ancestors: [] Estimated cardinality: 24082
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Bloom Filter FP chance: 0.010000 Minimum timestamp: 1469014194110000 Maximum timestamp: 1473088050958000 SSTable max local deletion time: 1507648050 Compression ratio: 0.33329256554645664 Estimated droppable tombstones: 0.1604462041694866 SSTable Level: 2 Repaired at: 0 ReplayPosition(segmentId=1472205080495, position=242000) Estimated tombstone drop times:%n 1472293974: 33975 1472460567: 40569 1472621785: 47005 1472763430: 130629 1472901148: 39655 1473035274: 34167 1473150249: 36 1479196942: 286632 1479362551: 1347 1479488000: 1382 1479628178: 1550 1479778999: 1178 1479896919: 1624 1480021045: 1618 1480154765: 2354 1480318893: 1899 1480494073: 1793 1480666994: 2020 1480839763: 1736 1481011721: 2556 1481184075: 1873 1481354276: 2026 1481532538: 1882 1481703235: 2350 1481866320: 1922 1482009978: 1351 1482151699: 1578 1482301309: 2176 1482471997: 1832 1482652333: 1794 1482826941: 1998 1482970119: 1462 1483098773: 1694 1483228374: 1257 1483349652: 1171 1483527044: 1644 1483691738: 2064 1484575499: 7 1484678550: 185 1484850845: 111 1485034385: 992 1485307567: 16850 1485436838: 15731 1485592343: 19528 1485727310: 15062 1485857832: 19797 1485977955: 14364 1486102961: 18238 1486235653: 18939 1486376806: 25344 1486537031: 42391 1486658511: 21062 1486779858: 24470 1486900557: 29042 1487025183: 22828 1487163474: 24335 1487293380: 22496 1487423399: 26684 1487565634: 25580 1487735458: 54255 1487879197: 39179 1488017935: 49447 1488154279: 48401 1488285774: 57254 1488432734: 56824 1488574819: 46616 1502468942: 15 1502873867: 1790 1503042015: 1932 1503210218: 1910 1503374086: 2077 1503526671: 1613 1503695878: 1692 1503878112: 2124 1504031993: 4911 1504173132: 1881 1504334968: 3104 1504489996: 2664 1504622831: 2439 1504775175: 2685 1504907116: 22205 1505095707: 4663 1505245449: 23604 1505371434: 12417 1505551693: 25219 1505671138: 9300 1505799510: 11556 1505923504: 10509 1506044660: 10586 1506174124: 9877 1506327658: 13084 1506476113: 10634 1506589098: 2365 1506713516: 25542 1506848766: 35031 1507020430: 39925 1507188070: 44071 1507328084: 76179 1507465340: 39837 1507592640: 36630 Count Row Size Cell Count 1 0 6638 2 0 10491 3 0 3453 4 0 40507 5 0 164082 6 0 85172 7 0 5283 8 0 7223 10 0 6485 12 0 3391 14 0 1067 17 0 839 20 0 245 24 1 84 29 0 35 35 0 5 42 0 0 50 0 0 60 0 0 72 0 0 86 0 0 103 713 0 124 4586 0 149 58 0 179 1248 0 215 253 0 258 12856 0 310 24 0 372 36865 0 446 76005 0 535 148029 0 642 30539 0 770 11504 0 924 8190 0 1109 1657 0 1331 1372 0 1597 750 0 1916 240 0 2299 84 0 2759 20 0 3311 5 0 3973 1 0 4768 0 0 5722 0 0 6866 0 0 8239 0 0 9887 0 0 11864 0 0 14237 0 0 17084 0 0 20501 0 0 24601 0 0 29521 0 0 35425 0 0 42510 0 0 51012 0 0 61214 0 0 73457 0 0 88148 0 0 105778 0 0 126934 0 0 152321 0 0 182785 0 0 219342 0 0 263210 0 0 315852 0 0 379022 0 0 454826 0 0 545791 0 0 654949 0 0 785939 0 0 943127 0 0 1131752 0 0 1358102 0 0 1629722 0 0 1955666 0 0 2346799 0 0 2816159 0 0 3379391 0 0 4055269 0 0 4866323 0 0 5839588 0 0 7007506 0 0 8409007 0 0 10090808 0 0 12108970 0 0 14530764 0 0 17436917 0 0 20924300 0 0 25109160 0 0 30130992 0 0 36157190 0 0 43388628 0 0 52066354 0 0 62479625 0 0 74975550 0 0 89970660 0 0 107964792 0 0 129557750 0 0 155469300 0 0 186563160 0 0 223875792 0 0 268650950 0 0 322381140 0 0 386857368 0 0 464228842 0 0 557074610 0 0 668489532 0 0 802187438 0 0 962624926 0 0 1155149911 0 0 1386179893 0 0 1663415872 0 0 1996099046 0 0 2395318855 0 0 2874382626 0 3449259151 0 4139110981 0 4966933177 0 5960319812 0 7152383774 0 8582860529 0 10299432635 0 12359319162 0 14831182994 0 17797419593 0 21356903512 0 25628284214 0 30753941057 0 36904729268 0 44285675122 0 53142810146 0 63771372175 0 76525646610 0 91830775932 0 110196931118 0 132236317342 0 158683580810 0 190420296972 0 228504356366 0 274205227639 0 329046273167 0 394855527800 0 473826633360 0 568591960032 0 682310352038 0 818772422446 0 982526906935 0 1179032288322 0 1414838745986 0 Ancestors: [537] Estimated cardinality: 333024