This problem still stays unresolved despite numerous restarts to the cluster. I cant seem to find a way out of this one, and I am not really looking for a workaround, kinda need this to work if i need to go to production.
Turned on the ALL logging in log4j, and now I see the following exception (EOFException) on the destination. After receiving each file, it seems to be throwing this exception. The transfer is successful except for this exception. The source successful declares the transfer complete. But the destination does not move out of the bootstrapping mode, and just sits there. DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line 65) Receiving stream: finished reading chunk, awaiting more DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line 87) Removing stream context /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369 DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java (line 73) Sending a streaming finished message with org.apache.cassandra.streaming.completedfilesta...@54828e7 to IP1 TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line 82) eof reading from socket; closing java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line 51) Receiving stream DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line 54) Creating file for /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line 65) Receiving stream: finished reading chunk, awaiting more DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line 87) Removing stream context /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045 DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java (line 73) Sending a streaming finished message with org.apache.cassandra.streaming.completedfilesta...@7b41a32f to IP1 TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line 82) eof reading from socket; closing java.io.EOFException at java.io.DataInputStream.readInt(Unknown Source) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59) /G On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh <gurpreet.si...@gmail.com>wrote: > Hi Vineet, > I have tracked the nodetool streams to completion each time. Below are the > logs on the source and destination node. There are 3 sstables being > transferred, and the transfer seems to be successful. However, after the > streams finish, the source prints out messages about the dropped messages, > which may point to the problem. ideas? I checked port 7000 is open for > communication. 9160 is not up on the node being bootstrapped, but that comes > up after the node is bootstrapped, is that right? > > Thanks a ton, > /G > > *Logs on the source node (IP2):* > * > * > INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79) > Flushing memtables for userdata... > INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95) > Performing anticompaction ... > INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java > (line 339) AntiCompacting > [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')] > INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129) > GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is > 4415946752 > INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java > (line 396) AntiCompacted to > /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db. > 49074138589/36770836242 bytes for 5990912 keys. Time: 1438607ms. > INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java > (line 339) AntiCompacting > [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')] > INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java > (line 396) AntiCompacted to > /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db. > 28185244/21126422 bytes for 47722 keys. Time: 2310ms. > INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java > (line 339) AntiCompacting > [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')] > INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java > (line 396) AntiCompacted to > /data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db. > 2927724285/2195768325 bytes for 3976118 keys. Time: 179766ms. > INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line 127) > Stream context metadata > /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Index.db:522051369, > 3 > sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:7489045, > 3 > sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db:36770836242, > 3 > sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Index.db:3373143, > 3 > sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Filter.db:59965, > 3 > sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422, > 3 > sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:282956452, > 3 > sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Filter.db:4970125, > 3 > sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db:2195768325 > INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 132) > Sending a stream initiate message to IP1... > INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137) > Waiting for transfer to IP1 to complete > *WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592 > MessagingService.java (line 501) Dropped 9 messages in the last 1000ms* > * *INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line 141) > Done with transfer to IP1 > INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670 > SSTableDeletingReference.java (line 104) Deleted > /data/cassandra/datadir/cassandradb/system/LocationInfo-17-Data.db > *WARN [DroppedMessagesLogger] 2010-09-14 10:28:01,602 > MessagingService.java (line 501) Dropped 1 messages in the last 1000ms* > * *INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133 > SSTableDeletingReference.java (line 104) Deleted > /data/mysql/cassandrastorage/system/LocationInfo-19-Data.db > INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134 > SSTableDeletingReference.java (line 104) Deleted > /data/cassandra/datadir/cassandradb/system/LocationInfo-18-Data.db > INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134 > SSTableDeletingReference.java (line 104) Deleted > /data/cassandra/datadir/cassandradb/system/LocationInfo-20-Data.db > > > > *Logs on new node being bootstrapped (IP1):* > > INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line 104) New token > will be 149298847080838649048722691811739653739 to assume load from IP2 > INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388) > Joining: sleeping 30000 ms for pending range setup > INFO [main] 2010-09-14 09:54:07,792 StorageService.java (line 388) > Bootstrapping > INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 120) > Sampling index for > /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db > INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCompletionHandler.java > (line 64) Streaming added > /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db > INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 120) > Sampling index for > /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db > INFO [Thread-23] 2010-09-14 10:26:52,311 StreamCompletionHandler.java > (line 64) Streaming added > /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db > INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180) > InetAddress IP2 is now dead. > INFO [Thread-26] 2010-09-14 10:27:18,858 SSTableReader.java (line 120) > Sampling index for > /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db > INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.java > (line 64) Streaming added > /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db > INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,605 > HintedHandOffManager.java (line 165) Started hinted handoff for endPoint IP2 > INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578) InetAddress > IP2 is now UP > INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,606 > HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to > endpoint IP2 > INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (line > 357) LocationInfo has reached its threshold; switching in a fresh Memtable > at > CommitLogContext(file='/data/cassandra/datadir/commitlog/CommitLog-1284483127473.log', > position=1353) > INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (line > 609) Enqueuing flush of memtable-locationi...@2057079871(210 bytes, 6 > operations) > INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382 Memtable.java (line > 148) Writing memtable-locationi...@2057079871(210 bytes, 6 operations) > INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (line > 162) Completed flushing > /data/cassandra/datadir/cassandradb/system/LocationInfo-10-Data.db > > > On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel <vineetdan...@gmail.com>wrote: > >> Hi Gurpreet >> >> What is the output of nodetool -h <hostname/IP> streams -->( to see >> what is going on between the nodes) . If you dont see anything happening >> try switching off firewall or iptables. >> >> >> Regards >> Vineet Daniel >> Cell : +918106217121 >> Websites : >> Blog <http://vinetedaniel.blogspot.com> | >> Linkedin<http://in.linkedin.com/in/vineetdaniel> >> | Twitter <https://twitter.com/vineetdaniel> >> >> >> >> >> >> On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh < >> gurpreet.si...@gmail.com> wrote: >> >>> I tried this again, it happenned yet again. >>> This time while the transfer messages seemed tobe in order, i also >>> noticed that the source logs talk about having 9 dropped messages in the >>> last 1000 ms. The only activity on the whole cluster is this bootstrapping, >>> there is no read/write traffic going on. >>> >>> /G >>> >>> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh < >>> gurpreet.si...@gmail.com> wrote: >>> >>>> I am using cassandra 0.6.5. >>>> >>>> >>>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh < >>>> gurpreet.si...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap >>>>> 2 more machines, one at a time. >>>>> For both these machines, the bootstrapping stays stuck after the >>>>> streaming is done. >>>>> >>>>> When the nodes come up for bootstrapping, I see all the relevant >>>>> messages about getting a new token, assuming load from a particular host. >>>>> I >>>>> see a couple of nodes anticompacting data to send, and at a later point >>>>> the >>>>> node that is bootstrapping prints the right streaming mesgs. However, once >>>>> the streaming is over, the node just doesnt do anything. Previously while >>>>> bootstrapping, I have seen that after the streaming is done, the node >>>>> restarts and becomes part of the ring by itself. I dont see this behaviour >>>>> with both the nodes I tried today. >>>>> I even restarted all the nodes in the cluster, and tried bootstrapping >>>>> one of the nodes again, but it again was stuck after streaming. It seems >>>>> to >>>>> have copied the relevant load as well. >>>>> Any ideas as to what could be going on here? >>>>> >>>>> /G >>>>> >>>> >>>> >>> >> >