[ 
https://issues.apache.org/jira/browse/CASSANDRA-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117613#comment-14117613
 ] 

graham sanderson commented on CASSANDRA-7734:
---------------------------------------------

Additional data point - while investigating 2 errors in production, I noticed 
that in each case the exception is next to a transition of the version to null. 
Note it is always the same thread and time, so it seems like it must be the 
same request

e.g.

{code}
INFO [Thread-46312] 2014-09-01 07:05:26,055 MessagingService.java (line 782) 
Reseting version 7 -> null for /172.16.26.16
ERROR [Thread-46312] 2014-09-01 07:05:26,055 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thread-46312,5,main]
java.lang.NullPointerException
        at 
org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:150)
        at 
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:175)
        at 
org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:134)
        at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99)
        at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153)
        at 
org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130)
{code}

> Schema pushes (seemingly) randomly not happening
> ------------------------------------------------
>
>                 Key: CASSANDRA-7734
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7734
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: graham sanderson
>            Assignee: Aleksey Yeschenko
>
> We have been seeing problems since upgrade to 2.0.9 from 2.0.5.
> Basically after a while, new schema changes (we periodically add tables) 
> start propagating very slowly to some nodes and fast to others. It looks from 
> the logs and trace that in this case the "push" of the schema never happens 
> (note a node has decided not to push to another node, it doesn't seem to 
> start again) from the originating node to some of the other nodes. In this 
> case though, we do see the other node end up pulling the schema some time 
> later when it notices its schema is out of date.
> Here is code from 2.0.9 MigrationManager.announce
> {code}
>        for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
>         {
>             // only push schema to nodes with known and equal versions
>             if (!endpoint.equals(FBUtilities.getBroadcastAddress()) &&
>                     MessagingService.instance().knowsVersion(endpoint) &&
>                     MessagingService.instance().getRawVersion(endpoint) == 
> MessagingService.current_version)
>                 pushSchemaMutation(endpoint, schema);
>         }
> {code}
> and from 2.0.5
> {code}
>         for (InetAddress endpoint : Gossiper.instance.getLiveMembers())
>         {
>             if (endpoint.equals(FBUtilities.getBroadcastAddress()))
>                 continue; // we've dealt with localhost already
>             // don't send schema to the nodes with the versions older than 
> current major
>             if (MessagingService.instance().getVersion(endpoint) < 
> MessagingService.current_version)
>                 continue;
>             pushSchemaMutation(endpoint, schema);
>       }
> {code}
> the old getVersion() call would return MessagingService.current_version if 
> the version was unknown, so the push would occur in this case. I don't have 
> logging to prove this, but have strong suspicion that the version may end up 
> null in some cases (which would have allowed schema propagation in 2.0.5, but 
> not by somewhere after that and <= 2.0.9)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to