OK. Thanks I'm using STCS. Anyway, IMHO, this is one of the main bottlenecks for using big/dense node in Cassandra (which reduces cluster size and data center costs) and it could be almost solved (at least for me), if we could eliminate number of sstables at receiver side (either by sending bigger sstables at sending side or by merging sstables in memtable at receiving side)
Sent using https://www.zoho.com/mail/ ---- On Mon, 03 Aug 2020 19:17:33 +0430 Jeff Jirsa <jji...@gmail.com> wrote ---- Memtable really isn't involved here, each data file is copied over as-is and turned into a new data file, it doesn't read into the memtable (though it does deserialize and re-serialize, which temporarily has it in memory, but isn't in the memtable itself). You can cut down on the number of data files copied in by using fewer vnodes, or by changing your compaction parameters (e.g. if you're using LCS, change sstable size from 160M to something higher), but there's no magic to join / compact those data files on the sending side before sending. On Mon, Aug 3, 2020 at 4:15 AM onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: IMHO (reading system.log) each streamed-in file from any node would be write down as a separate sstable to the disk and won't be wait in memtable until enough amount of memtable has been created inside memory, so there would be more compactions because of multiple small sstables. Is there any configuration in cassandra to force streamed-in to pass memtable-sstable cycle, to have bigger sstables at first place? Sent using https://www.zoho.com/mail/ ============ Forwarded message ============ From: onmstester onmstester <mailto:onmstes...@zoho.com.INVALID> To: "user"<mailto:user@cassandra.apache.org> Date: Sun, 02 Aug 2020 08:35:30 +0430 Subject: Re: streaming stuck on joining a node with TBs of data ============ Forwarded message ============ Thanks Jeff, Already used netstats and it only shows that streaming from a single node remained and stuck and bunch of dropped messages, next time i will check tpstats too. Currently i stopped the joining/stucked node, make the auto_bootstrap false and started the node and its UN now, is this OK too? What about streaming tables one by one, any idea? Sent using https://www.zoho.com/mail/ ---- On Sat, 01 Aug 2020 21:44:09 +0430 Jeff Jirsa <mailto:jji...@gmail.com> wrote ---- Nodetool tpstats and netstats should give you a hint why it’s not joining If you don’t care about consistency and you just want it joined in its current form (which is likely strictly incorrect but I get it), “nodetool disablegossip && nodetool enablegossip” in rapid succession (must be less than 30 seconds in between commands) will PROBABLY change it from joining to normal (unclean, unsafe, do this at your own risk). On Jul 31, 2020, at 11:46 PM, onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: No Secondary index, No SASI, No materialized view Sent using https://www.zoho.com/mail/ ---- On Sat, 01 Aug 2020 11:02:54 +0430 Jeff Jirsa <mailto:jji...@gmail.com> wrote ---- Are there secondary indices involved? On Jul 31, 2020, at 10:51 PM, onmstester onmstester <mailto:onmstes...@zoho.com.invalid> wrote: Hi, I'm going to join multiple new nodes to already existed and running cluster. Each node should stream in >2TB of data, and it took a few days (with 500Mb streaming) to almost get finished. But it stuck on streaming-in from one final node, but i can not see any bottleneck on any side (source or destination node), the only problem is 400 pending compactions on joining node, which i disabled auto_compaction, but no improvement. 1. How can i safely stop streaming/joining the new node and make it UN, then run repair on the node? 2. On bootstrap a new node, multiple tables would be streamed-in simultaneously and i think that this would increase number of compactions in compare with a scenario that "the joining node first stream-in one table then switch to another one and etc". Am i right and this would decrease compactions? If so, is there a config or hack in cassandra to force that? Sent using https://www.zoho.com/mail/