Are you running repairs within gc_grace_seconds? (default is 10 days) http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
Double check if you set cold_reads_to_omit to 0.0 on tables with STCS that you do not read often. Are you using default values for the properties min_compaction_threshold(4) and max_compaction_threshold(32)? Which Consistency Level are you using for reading operations? Check if you are not reading from DC_B due to your Replication Factor and CL. http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam <ptrstp...@gmail.com> wrote: > I don't have problems with DC_B (replica) only in DC_A(my system write only > to it) I have read timeouts. > > I checked in OpsCenter SSTable count and I have: > 1) in DC_A same +-10% for last week, a small increase for last 24h (it is > more than 15000-20000 SSTables depends on node) > 2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics. > Now I have less then 1000 SSTables > > What did you measure during system optimizations? Or do you have an idea > what more should I check? > 1) I look at CPU Idle (one node is 50% idle, rest 70% idle) > 2) Disk queue -> mostly is it near zero: avg 0.09. Sometimes there are > spikes > 3) system RAM usage is almost full > 4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A > it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s) > > something else? > > > > On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar <ronibaltha...@gmail.com> > wrote: >> >> Hi, >> >> You can check if the number of SSTables is decreasing. Look for the >> "SSTable count" information of your tables using "nodetool cfstats". >> The compaction history can be viewed using "nodetool >> compactionhistory". >> >> About the timeouts, check this out: >> http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure >> Also try to run "nodetool tpstats" to see the threads statistics. It >> can lead you to know if you are having performance problems. If you >> are having too many pending tasks or dropped messages, maybe will you >> need to tune your system (eg: driver's timeout, concurrent reads and >> so on) >> >> Regards, >> >> Roni Balthazar >> >> On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam <ptrstp...@gmail.com> wrote: >> > Hi, >> > Thanks for your "tip" it looks that something changed - I still don't >> > know >> > if it is ok. >> > >> > My nodes started to do more compaction, but it looks that some >> > compactions >> > are really slow. >> > In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput >> > to >> > 999, but I do not see difference. >> > >> > Can we check something more? Or do you have any method to monitor >> > progress >> > with small files? >> > >> > Regards >> > >> > On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar >> > <ronibaltha...@gmail.com> >> > wrote: >> >> >> >> HI, >> >> >> >> Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was >> >> the solution... >> >> The number of SSTables decreased from many thousands to a number below >> >> a hundred and the SSTables are now much bigger with several gigabytes >> >> (most of them). >> >> >> >> Cheers, >> >> >> >> Roni Balthazar >> >> >> >> >> >> >> >> On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam <ptrstp...@gmail.com> wrote: >> >> > After some diagnostic ( we didn't set yet cold_reads_to_omit ). >> >> > Compaction >> >> > are running but VERY slow with "idle" IO. >> >> > >> >> > We had a lot of "Data files" in Cassandra. In DC_A it is about >> >> > ~120000 >> >> > (only >> >> > xxx-Data.db) in DC_B has only ~4000. >> >> > >> >> > I don't know if this change anything but: >> >> > 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really >> >> > big >> >> > ones, >> >> > but most is really small (almost 10000 files are less then 100mb). >> >> > 2) in DC_B avg size of Data.db is much bigger ~260mb. >> >> > >> >> > Do you think that above flag will help us? >> >> > >> >> > >> >> > On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam <ptrstp...@gmail.com> wrote: >> >> >> >> >> >> I set setcompactionthroughput 999 permanently and it doesn't change >> >> >> anything. IO is still same. CPU is idle. >> >> >> >> >> >> On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar >> >> >> <ronibaltha...@gmail.com> >> >> >> wrote: >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> You can run "nodetool compactionstats" to view statistics on >> >> >>> compactions. >> >> >>> Setting cold_reads_to_omit to 0.0 can help to reduce the number of >> >> >>> SSTables when you use Size-Tiered compaction. >> >> >>> You can also create a cron job to increase the value of >> >> >>> setcompactionthroughput during the night or when your IO is not >> >> >>> busy. >> >> >>> >> >> >>> From http://wiki.apache.org/cassandra/NodeTool: >> >> >>> 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999 >> >> >>> 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16 >> >> >>> >> >> >>> Cheers, >> >> >>> >> >> >>> Roni Balthazar >> >> >>> >> >> >>> On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam <ptrstp...@gmail.com> >> >> >>> wrote: >> >> >>> > One think I do not understand. In my case compaction is running >> >> >>> > permanently. >> >> >>> > Is there a way to check which compaction is pending? The only >> >> >>> > information is >> >> >>> > about total count. >> >> >>> > >> >> >>> > >> >> >>> > On Monday, February 16, 2015, Ja Sam <ptrstp...@gmail.com> wrote: >> >> >>> >> >> >> >>> >> Of couse I made a mistake. I am using 2.1.2. Anyway night build >> >> >>> >> is >> >> >>> >> available from >> >> >>> >> http://cassci.datastax.com/job/cassandra-2.1/ >> >> >>> >> >> >> >>> >> I read about cold_reads_to_omit It looks promising. Should I set >> >> >>> >> also >> >> >>> >> compaction throughput? >> >> >>> >> >> >> >>> >> p.s. I am really sad that I didn't read this before: >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> On Monday, February 16, 2015, Carlos Rolo <r...@pythian.com> >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> Hi 100% in agreement with Roland, >> >> >>> >>> >> >> >>> >>> 2.1.x series is a pain! I would never recommend the current >> >> >>> >>> 2.1.x >> >> >>> >>> series >> >> >>> >>> for production. >> >> >>> >>> >> >> >>> >>> Clocks is a pain, and check your connectivity! Also check >> >> >>> >>> tpstats >> >> >>> >>> to >> >> >>> >>> see >> >> >>> >>> if your threadpools are being overrun. >> >> >>> >>> >> >> >>> >>> Regards, >> >> >>> >>> >> >> >>> >>> Carlos Juzarte Rolo >> >> >>> >>> Cassandra Consultant >> >> >>> >>> >> >> >>> >>> Pythian - Love your data >> >> >>> >>> >> >> >>> >>> rolo@pythian | Twitter: cjrolo | Linkedin: >> >> >>> >>> linkedin.com/in/carlosjuzarterolo >> >> >>> >>> Tel: 1649 >> >> >>> >>> www.pythian.com >> >> >>> >>> >> >> >>> >>> On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer >> >> >>> >>> <r.etzenham...@t-online.de> wrote: >> >> >>> >>>> >> >> >>> >>>> Hi, >> >> >>> >>>> >> >> >>> >>>> 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 >> >> >>> >>>> (suggested >> >> >>> >>>> by >> >> >>> >>>> Al >> >> >>> >>>> Tobey from DataStax) >> >> >>> >>>> 7) minimal reads (usually none, sometimes few) >> >> >>> >>>> >> >> >>> >>>> those two points keep me repeating an anwser I got. First >> >> >>> >>>> where >> >> >>> >>>> did >> >> >>> >>>> you >> >> >>> >>>> get 2.1.3 from? Maybe I missed it, I will have a look. But if >> >> >>> >>>> it >> >> >>> >>>> is >> >> >>> >>>> 2.1.2 >> >> >>> >>>> whis is the latest released version, that version has many >> >> >>> >>>> bugs - >> >> >>> >>>> most of >> >> >>> >>>> them I got kicked by while testing 2.1.2. I got many problems >> >> >>> >>>> with >> >> >>> >>>> compactions not beeing triggred on column families not beeing >> >> >>> >>>> read, >> >> >>> >>>> compactions and repairs not beeing completed. See >> >> >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> >>> >>>> https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1 >> >> >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> >>> >>>> https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html >> >> >>> >>>> >> >> >>> >>>> Apart from that, how are those both datacenters connected? >> >> >>> >>>> Maybe >> >> >>> >>>> there >> >> >>> >>>> is a bottleneck. >> >> >>> >>>> >> >> >>> >>>> Also do you have ntp up and running on all nodes to keep all >> >> >>> >>>> clocks >> >> >>> >>>> in >> >> >>> >>>> thight sync? >> >> >>> >>>> >> >> >>> >>>> Note: I'm no expert (yet) - just sharing my 2 cents. >> >> >>> >>>> >> >> >>> >>>> Cheers, >> >> >>> >>>> Roland >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> -- >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> > >> >> >> >> >> >> >> >> > >> > >> > > >