Yes, Gossip goes through MD too. On Thu, May 27, 2010 at 11:03 AM, Anthony Molinaro <antho...@alumni.caltech.edu> wrote: > > On Thu, May 27, 2010 at 08:04:18AM -0600, Jonathan Ellis wrote: >> This is a relic of when Gossip was over UDP and had to worry about >> packet size. I created >> https://issues.apache.org/jira/browse/CASSANDRA-1138 to remove those >> notifications. > > Ahh, okay, well its odd that a limit was set even with UDP. I send large > UDP packets all the time with LWES and don't have many issues, but glad > to hear it will be fixed (I may patch locally a larger packet size as > a short term workaround). Looking at the code it seems like if you hit > either of these notifications the message is not serialized (ie serialize > calls return false), would this explain why if I restart a machine in the > cluster in this state it only sees some of the ring? > > In other words maybe with a fresh restart of everything, there is some > part of the serialized message which is small enough that all 27 machines > can be in there, however, once they've been running for a little bit they > start to creep over the limit, then suddenly gossiping starts to fail > as responses from some nodes are never sent, and I start seeing inconsistency > in the rings? > > I think this hypothesis could be tested by just increasing the MAX size > so I think I will try that. > >> I think the correlation with MessageDeserializer is a red herring. >> Gossip only happens once per second so I don't see how that could back >> MD up. > > Yeah, I couldn't see either, just the 'Stopping deserialization' message > made me think it might (as only the nodes with a backed up MessageDeserializer > had that message). Do gossip messages flow through the MessageDeserializer? > > Thanks for the response, > > -Anthony > >> On Tue, May 25, 2010 at 5:33 PM, Anthony Molinaro >> <antho...@alumni.caltech.edu> wrote: >> > Hi, >> > >> > I just noticed I have lots of these messages >> > >> > INFO [GMFD:1] 2010-05-25 23:21:04,070 GossipDigestSynMessage.java (line >> > 152) >> > Remaining bytes zero. Stopping deserialization in EndPointState. >> > INFO [GMFD:1] 2010-05-25 23:21:05,224 GossipDigestSynMessage.java (line >> > 129) >> > @@@@ Breaking out to respect the MTU size in EPS. Estimate is 56 @@@@ >> > >> > The first message only occurs on some machines in my cluster. The second >> > on all of them. >> > >> > The ones with the first message seem to be building up quite a backlog >> > in their MessageDeserializer PendingTasks. >> > >> > I assume there is a correlation, what could be causing this sort of thing? >> > >> > This cluster is now at 27 m1.xlarge boxes on ec2 running 0.6.2 of some >> > flavor. >> > >> > Thanks, >> > >> > -Anthony >> > >> > -- >> > ------------------------------------------------------------------------ >> > Anthony Molinaro <antho...@alumni.caltech.edu> >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> >
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com