Just ran this on the new node: nodetool netstats | grep "Streaming from" | wc -l 10
Seems like the new node is receiving data from 10 other nodes. Is that expected in a vnodes enabled environment? Ruchir. On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ruchir....@gmail.com> wrote: > Also not sure if this is relevant but just noticed the nodetool tpstats > output: > > Pool Name Active Pending Completed Blocked > All time blocked > FlushWriter 0 0 1136 0 > 512 > > Looks like about 50% of flushes are blocked. > > > On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ruchir....@gmail.com> wrote: > >> Yes num_tokens is set to 256. initial_token is blank on all nodes >> including the new one. >> >> >> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <mark.re...@boxever.com> >> wrote: >> >>> My understanding was that if initial_token is left empty on the new >>>> node, it just contacts the heaviest node and bisects its token range. >>> >>> >>> If you are using vnodes and you have num_tokens set to 256 the new node >>> will take token ranges dynamically. What is the configuration of your other >>> nodes, are you setting num_tokens or initial_token on those? >>> >>> >>> Mark >>> >>> >>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ruchir....@gmail.com> wrote: >>> >>>> Thanks Patricia for your response! >>>> >>>> On the new node, I just see a lot of the following: >>>> >>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400) >>>> Writing Memtable >>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java >>>> (line 262) Compacted 12 sstables to >>>> >>>> so basically it is just busy flushing, and compacting. Would you have >>>> any ideas on why the 2x disk space blow up. My understanding was that if >>>> initial_token is left empty on the new node, it just contacts the heaviest >>>> node and bisects its token range. And the heaviest node is around 2.1 TB, >>>> and the new node is already at 4 TB. Could this be because compaction is >>>> falling behind? >>>> >>>> Ruchir >>>> >>>> >>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla < >>>> patri...@thelastpickle.com> wrote: >>>> >>>>> Ruchir, >>>>> >>>>> What exactly are you seeing in the logs? Are you running major >>>>> compactions on the new bootstrapping node? >>>>> >>>>> With respect to the seed list, it is generally advisable to use 3 seed >>>>> nodes per AZ / DC. >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ruchir....@gmail.com> >>>>> wrote: >>>>> >>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster >>>>>> where the average data size per node is about 2.1 TB. The bootstrap >>>>>> streaming has been going on for 2 days now, and the disk size on the new >>>>>> node is already above 4 TB and still going. Is this because the new node >>>>>> is >>>>>> running major compactions while the streaming is going on? >>>>>> >>>>>> One thing that I noticed that seemed off was the seeds property in >>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property >>>>>> on the existing 12 nodes consists of all the other nodes except the >>>>>> thirteenth node. Is this an issue? >>>>>> >>>>>> Any other insight is appreciated? >>>>>> >>>>>> Ruchir. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Patricia Gorla >>>>> @patriciagorla >>>>> >>>>> Consultant >>>>> Apache Cassandra Consulting >>>>> http://www.thelastpickle.com <http://thelastpickle.com> >>>>> >>>> >>>> >>> >> >