thanks! I'll be watching this issue closely. On Apr 9, 2011, at 5:41 AM, Chris Goffinet wrote:
> We also have a ticket open at > > https://issues.apache.org/jira/browse/CASSANDRA-2399 > > We have observed in production the impact of streaming data to new nodes > being added. We actually have our entire dataset in page cache in one of our > clusters, our 99th percentiles go from 20ms to >1 second on streaming nodes > when bootstrapping in new nodes because of blowing out the page cache during > the process. We are hoping to have this addressed soon. I think throttling of > streaming would be good too, to help minimize saturating the network card on > the streaming node. Dynamic snitch should help with this, we'll try to report > back our results very soon on what it looks like for that case. > > -Chris > > On Apr 8, 2011, at 7:35 PM, aaron morton wrote: > >> My brain just started working. The streaming for the move may need to be >> throttled, but once the file has been received the bloom filters, row >> indexes and secondary indexes are built. That will also take some effort, do >> you have any secondary indexes? >> >> If you are doing a move again could you try turing up logging to DEBUG on >> one of the neighbour nodes. Once the file has been received you will see a >> message saying "Finished {file_name}. Sending ack to {remote_ip}". After >> this log message the rebuilds will start, would be interesting to see what >> is more heavy weight I'm guessing the rebuilds. >> >> This is similar to https://issues.apache.org/jira/browse/CASSANDRA-2156 but >> that ticket will not cover this case. I've added this use case to the >> comments, please check there if you want to follow along. >> >> Cheers >> Aaron >> >> >> On 6 Apr 2011, at 16:26, Jonathan Colby wrote: >> >>> thanks for the response Aaron. Our cluster has 6 nodes with 10 GB load on >>> each. RF=3. AMD 64 bit Blades, Quad Core, 8 GB ram, running Debian >>> Linux. Swap off. Cassandra 0.7.4 >>> >>> >>> On Apr 6, 2011, at 2:40 AM, aaron morton wrote: >>> >>>> Not that I know of, may be useful to be able to throttle things. But if >>>> the receiving node has little head room it may still be overwhelmed. >>>> >>>> Currently there is a single thread for streaming. If we were to throttle >>>> it may be best to make it multi threaded with a single concurrent stream >>>> per end point. >>>> >>>> Out of interest how many nodes do you have and whats the RF? >>>> >>>> Aaron >>>> >>>> >>>> On 6 Apr 2011, at 01:16, Jonathan Colby wrote: >>>> >>>>> >>>>> When doing a move, decommission, loadbalance, etc. data is streamed to >>>>> the next node in such a way that it really strains the receiving node - >>>>> to the point where it has a problem serving requests. >>>>> >>>>> Any way to throttle the streaming of data? >>>> >>> >> >