Re: Nodes Timing Out
unlimited On Sat, Mar 27, 2010 at 12:09 PM, Chris Goffinet wrote: > what's the ulimit set to? > > -Chris > > On Mar 27, 2010, at 10:29 AM, James Golick wrote: > > Hey, > > I put our first cluster in to production (writing but not reading) a couple > of days ago. Right now, it's got two pretty sizeable nodes taking about 200 > writes per second each and virtually no reads. > > Eventually, though, (and this has happened twice), both nodes seem to start > timing out. If I run nodetool cfstats, I get: > > [ja...@cassandra1 ~]# /opt/cassandra/bin/nodetool -h > cassandra1.fetlife.com cfstats > Keyspace: system > Read Count: 39 > Read Latency: 0.35925641025641025 ms. > Write Count: 3 > Write Latency: 0.166 ms. > Pending Tasks: 66 > Column Family: HintsColumnFamily > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > > and then it just hangs there. > > Any ideas? > > - James > > >
Hector mailing lists
I've created two mailing lists for hector, one for users and one for developers (we are 3 now), hope you find it useful. http://wiki.github.com/rantav/hector/mailing-lists ** * Users: hector-us...@googlegroups.com Page: http://groups.google.com/group/hector-users For hector users to ask question and share your experience. Anyone can post messages and anyone may join. Developers: hector-...@googlegroups.com Page: http://groups.google.com/group/hector-dev For cutting edge development of hector itself. Anyone can post messages, but join by invitation only. ***
Re: Hackathon?!?
Awesome! 2 tickets left. -Chris On Mar 27, 2010, at 11:42 PM, Evan Weaver wrote: > Me too. > > On Tue, Mar 23, 2010 at 12:48 PM, Jeff Hodges wrote: >> I'll be there. >> -- >> Jeff >> >> On Mon, Mar 22, 2010 at 8:40 PM, Eric Florenzano wrote: >>> Nice, I'll go! >>> >>> -Eric Florenzano >> > > > > -- > Evan Weaver
Re: Nodes Timing Out
ulimit -n returns you unlimited ? 2010/3/28 James Golick : > unlimited > > On Sat, Mar 27, 2010 at 12:09 PM, Chris Goffinet wrote: >> >> what's the ulimit set to? >> -Chris >> On Mar 27, 2010, at 10:29 AM, James Golick wrote: >> >> Hey, >> I put our first cluster in to production (writing but not reading) a >> couple of days ago. Right now, it's got two pretty sizeable nodes taking >> about 200 writes per second each and virtually no reads. >> Eventually, though, (and this has happened twice), both nodes seem to >> start timing out. If I run nodetool cfstats, I get: >> [ja...@cassandra1 ~]# /opt/cassandra/bin/nodetool -h >> cassandra1.fetlife.com cfstats >> Keyspace: system >> Read Count: 39 >> Read Latency: 0.35925641025641025 ms. >> Write Count: 3 >> Write Latency: 0.166 ms. >> Pending Tasks: 66 >> Column Family: HintsColumnFamily >> SSTable count: 0 >> Space used (live): 0 >> Space used (total): 0 >> and then it just hangs there. >> Any ideas? >> - James > >
Re: 0.5.1 exception: java.io.IOException: Reached an EOL or something bizzare occured
I got the same error when the nodes are using lot of I/O, i.e during compaction. 2010/3/28 Eric Yu : > I have not restart my nodes. > OK, may be I should give 0.6 a try. > > On Sun, Mar 28, 2010 at 9:53 AM, Jonathan Ellis wrote: >> >> It means that a MessagingService socket closed unexpectedly. If >> you're starting and restarting nodes that could cause it. >> >> This code is obsolete in 0.6 anyway. >> >> On Sat, Mar 27, 2010 at 8:51 PM, Eric Yu wrote: >> > And one more clue here, when ReplicateFactor is 1, it's OK, after >> > changed to >> > 2, the exception occurred. >> > >> > On Sun, Mar 28, 2010 at 9:46 AM, Eric Yu wrote: >> >> >> >> Hi Jonathan, >> >> >> >> I upgraded my jdk to latest version, and I am sure I start Cassandra >> >> with >> >> it (set JAVA_HOME in cassansra.in.sh). >> >> But the exception still there, any idea? >> >> >> >> On Sun, Mar 28, 2010 at 12:02 AM, Jonathan Ellis >> >> wrote: >> >>> >> >>> This means you need to upgrade your jdk to build 18 or later >> >>> >> >>> On Sat, Mar 27, 2010 at 10:55 AM, Eric Yu wrote: >> >>> > Hi, list >> >>> > I got this exception when insert into a cluster with 5 node, is this >> >>> > a >> >>> > bug >> >>> > or something else is wrong. >> >>> > >> >>> > here is the system log: >> >>> > >> >>> > INFO [GMFD:1] 2010-03-27 23:15:16,145 Gossiper.java (line 543) >> >>> > InetAddress >> >>> > /172.19.15.210 is now UP >> >>> > ERROR [Timer-1] 2010-03-27 23:23:27,739 TcpConnection.java (line >> >>> > 308) >> >>> > Closing down connection java.nio.channels.SocketChannel[connected >> >>> > local=/172.19.15.209:58261 remote=/172.19.15.210:7000] with 342218 >> >>> > writes >> >>> > remaining. >> >>> > INFO [Timer-1] 2010-03-27 23:23:27,792 Gossiper.java (line 194) >> >>> > InetAddress >> >>> > /172.19.15.210 is now dead. >> >>> > INFO [GMFD:1] 2010-03-27 23:23:32,214 Gossiper.java (line 543) >> >>> > InetAddress >> >>> > /172.19.15.210 is now UP >> >>> > ERROR [Timer-1] 2010-03-27 23:24:47,846 TcpConnection.java (line >> >>> > 308) >> >>> > Closing down connection java.nio.channels.SocketChannel[connected >> >>> > local=/172.19.15.209:59801 remote=/172.19.15.210:7000] with 256285 >> >>> > writes >> >>> > remaining. >> >>> > INFO [Timer-1] 2010-03-27 23:24:47,846 Gossiper.java (line 194) >> >>> > InetAddress >> >>> > /172.19.15.210 is now dead. >> >>> > WARN [MESSAGING-SERVICE-POOL:1] 2010-03-27 23:25:05,580 >> >>> > TcpConnection.java >> >>> > (line 484) Problem reading from socket connected to : >> >>> > java.nio.channels.SocketChannel[connected local=/172.19.15.209:7000 >> >>> > remote=/172.19.15.210:55473] >> >>> > INFO [GMFD:1] 2010-03-27 23:25:05,580 Gossiper.java (line 543) >> >>> > InetAddress >> >>> > /172.19.15.210 is now UP >> >>> > WARN [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580 >> >>> > TcpConnection.java >> >>> > (line 484) Problem reading from socket connected to : >> >>> > java.nio.channels.SocketChannel[connected local=/172.19.15.209:7000 >> >>> > remote=/172.19.15.210:45504] >> >>> > WARN [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580 >> >>> > TcpConnection.java >> >>> > (line 485) Exception was generated at : 03/27/2010 23:25:05 on >> >>> > thread >> >>> > MESSAGING-SERVICE-POOL:2 >> >>> > Reached an EOL or something bizzare occured. Reading from: >> >>> > /172.19.15.210 >> >>> > BufferSizeRemaining: 16 >> >>> > java.io.IOException: Reached an EOL or something bizzare occured. >> >>> > Reading >> >>> > from: /172.19.15.210 BufferSizeRemaining: 16 >> >>> > at >> >>> > org.apache.cassandra.net.io.StartState.doRead(StartState.java:44) >> >>> > at >> >>> > >> >>> > org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39) >> >>> > at >> >>> > org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95) >> >>> > at >> >>> > >> >>> > >> >>> > org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445) >> >>> > at >> >>> > >> >>> > >> >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> >>> > at >> >>> > >> >>> > >> >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> >>> > at java.lang.Thread.run(Thread.java:636) >> >>> > >> >>> > INFO [MESSAGING-SERVICE-POOL:2] 2010-03-27 23:25:05,580 >> >>> > TcpConnection.java >> >>> > (line 315) Closing errored connection >> >>> > java.nio.channels.SocketChannel[connected local=/172.19.15.209:7000 >> >>> > remote=/172.19.15.210:45504] >> >>> > WARN [MESSAGING-SERVICE-POOL:1] 2010-03-27 23:25:05,632 >> >>> > TcpConnection.java >> >>> > (line 485) Exception was generated at : 03/27/2010 23:25:05 on >> >>> > thread >> >>> > MESSAGING-SERVICE-POOL:1 >> >>> > >> >> >> > >> > > >
Re: Nodes Timing Out
Oops, I was doing ulimit. ulimit -n returns 1024. On Sun, Mar 28, 2010 at 3:25 AM, Benoit Perroud wrote: > ulimit -n returns you unlimited ? > > > 2010/3/28 James Golick : > > unlimited > > > > On Sat, Mar 27, 2010 at 12:09 PM, Chris Goffinet > wrote: > >> > >> what's the ulimit set to? > >> -Chris > >> On Mar 27, 2010, at 10:29 AM, James Golick wrote: > >> > >> Hey, > >> I put our first cluster in to production (writing but not reading) a > >> couple of days ago. Right now, it's got two pretty sizeable nodes taking > >> about 200 writes per second each and virtually no reads. > >> Eventually, though, (and this has happened twice), both nodes seem to > >> start timing out. If I run nodetool cfstats, I get: > >> [ja...@cassandra1 ~]# /opt/cassandra/bin/nodetool -h > >> cassandra1.fetlife.com cfstats > >> Keyspace: system > >> Read Count: 39 > >> Read Latency: 0.35925641025641025 ms. > >> Write Count: 3 > >> Write Latency: 0.166 ms. > >> Pending Tasks: 66 > >> Column Family: HintsColumnFamily > >> SSTable count: 0 > >> Space used (live): 0 > >> Space used (total): 0 > >> and then it just hangs there. > >> Any ideas? > >> - James > > > > >
Re: Newbie Performance Question
Ok - so I guess that between 1400 and 3500 inserts per second is reasonably good results -- we are going to continue working on our custom code but it seems like we need a design that uses lots of row-keys and fewer column family keys and is heavily threaded. Thanks for your help in pointing out this utility/test harness. On Fri, Mar 26, 2010 at 4:14 PM, Scott White wrote: > Right, that's what I meant, thanks for the correction. > > On Fri, Mar 26, 2010 at 1:11 PM, Brandon Williams wrote: > >> On Fri, Mar 26, 2010 at 3:08 PM, Scott White wrote: >> >>> Yep I believe those are inserts per second. Take the last line: >>> >>> "811653,1666,250" >>> >>> I believe that's telling you that during that 10 second interval you did >>> 1666 inserts but your overall insert rate is 811653/250 = 3246.612 >>> inserts/sec. >>> >> >> Actually it averaged 1666 inserts per second in that 10 second interval, >> but you're correct on the average. >> >> -Brandon >> > >
Re: Unreliable transport layer
Or, use OpenPGM (http://code.google.com/p/openpgm/) as an alternative? But they don't have any Java bindings yet. ZeroMQ (http://www.zeromq.org/) uses this. -- View this message in context: http://n2.nabble.com/Unreliable-transport-layer-tp4684470p4814206.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Unreliable transport layer
I think Open PGM uses IP mulitcast and this is not available in all data centers. For instance inside FB mulicast across racks is disabled for a multitude of reasons. However using TCP should not be a major concern as long as some discipline is in place w.r.t how size of packets are restricted as you scale to several hundreds of nodes. For instance you could cap the packet size at 4K post compression and that will let you scale to perhaps a few thousand nodes. Cheers Avinash On Sun, Mar 28, 2010 at 11:40 AM, Ashwin Jayaprakash < ashwin.jayaprak...@gmail.com> wrote: > > Or, use OpenPGM (http://code.google.com/p/openpgm/) as an alternative? But > they don't have any Java bindings yet. > > ZeroMQ (http://www.zeromq.org/) uses this. > -- > View this message in context: > http://n2.nabble.com/Unreliable-transport-layer-tp4684470p4814206.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >
Re: Cassandra cluster can not been installed in different subnet ?
This is the log, it seems everything is fine: INFO [main] 2010-02-23 16:18:24,740 SystemTable.java (line 137) Saved Token not found. Using 32282591296642763138586882639145887056 INFO [main] 2010-02-23 16:18:24,855 StorageService.java (line 281) Starting up server gossip INFO [main] 2010-02-23 16:18:24,992 StorageService.java (line 303) Starting in bootstrap mode (first, sleeping to get load information) INFO [main] 2010-02-23 16:21:27,852 RecoveryManager.java (line 64) Replaying /var/lib/cassandra/commitlog/CommitLog-1266959904759.log INFO [main] 2010-02-23 16:21:27,962 ColumnFamilyStore.java (line 393) LocationInfo has reached its threshold; switching in a fresh Memtable INFO [main] 2010-02-23 16:21:27,962 ColumnFamilyStore.java (line 1035) Enqueuing flush of Memtable(LocationInfo)@1858155990 INFO [FLUSH-SORTER-POOL:1] 2010-02-23 16:21:27,965 Memtable.java (line 183) Sorting Memtable(LocationInfo)@1858155990 INFO [FLUSH-WRITER-POOL:1] 2010-02-23 16:21:27,970 Memtable.java (line 192) Writing Memtable(LocationInfo)@1858155990 INFO [FLUSH-WRITER-POOL:1] 2010-02-23 16:21:28,128 Memtable.java (line 209) Completed flushing /var/lib/cassandra/data/system/LocationInfo-1-Data.db INFO [main] 2010-02-23 16:21:28,159 RecoveryManager.java (line 67) Log replay complete INFO [main] 2010-02-23 16:21:28,233 SystemTable.java (line 159) Saved Token found: 32282591296642763138586882639145887056 INFO [main] 2010-02-23 16:21:28,236 StorageService.java (line 281) Starting up server gossip INFO [main] 2010-02-23 16:21:28,275 StorageService.java (line 303) Starting in bootstrap mode (first, sleeping to get load information) INFO [main] 2010-02-23 16:23:01,102 SSTableReader.java (line 169) Sampling index for /var/lib/cassandra/data/system/LocationInfo-1-Data.db INFO [main] 2010-02-23 16:23:01,120 RecoveryManager.java (line 64) Replaying /var/lib/cassandra/commitlog/CommitLog-1266960087857.log INFO [main] 2010-02-23 16:23:01,215 ColumnFamilyStore.java (line 393) LocationInfo has reached its threshold; switching in a fresh Memtable INFO [main] 2010-02-23 16:23:01,218 ColumnFamilyStore.java (line 1035) Enqueuing flush of Memtable(LocationInfo)@809222561 INFO [FLUSH-SORTER-POOL:1] 2010-02-23 16:23:01,222 Memtable.java (line 183) Sorting Memtable(LocationInfo)@809222561 INFO [FLUSH-WRITER-POOL:1] 2010-02-23 16:23:01,226 Memtable.java (line 192) Writing Memtable(LocationInfo)@809222561 INFO [FLUSH-WRITER-POOL:1] 2010-02-23 16:23:01,338 Memtable.java (line 209) Completed flushing /var/lib/cassandra/data/system/LocationInfo-2-Data.db INFO [main] 2010-02-23 16:23:01,348 RecoveryManager.java (line 67) Log replay complete INFO [main] 2010-02-23 16:23:01,427 SystemTable.java (line 159) Saved Token found: 32282591296642763138586882639145887056 INFO [main] 2010-02-23 16:23:01,430 StorageService.java (line 281) Starting up server gossip INFO [main] 2010-02-23 16:23:01,462 StorageService.java (line 303) Starting in bootstrap mode (first, sleeping to get load information) INFO [FLUSH-TIMER] 2010-02-23 17:23:01,640 ColumnFamilyStore.java (line 393) LocationInfo has reached its threshold; switching in a fresh Memtable INFO [FLUSH-TIMER] 2010-02-23 17:23:01,646 ColumnFamilyStore.java (line 1035) Enqueuing flush of Memtable(LocationInfo)@995102461 INFO [FLUSH-SORTER-POOL:1] 2010-02-23 17:23:01,646 Memtable.java (line 183) Sorting Memtable(LocationInfo)@995102461 INFO [FLUSH-WRITER-POOL:1] 2010-02-23 17:23:01,647 Memtable.java (line 192) Writing Memtable(LocationInfo)@995102461 INFO [HINTED-HANDOFF-POOL:1] 2010-02-23 17:23:01,840 ColumnFamilyStore.java (line 875) Compacting [] INFO [FLUSH-WRITER-POOL:1] 2010-02-23 17:23:01,862 Memtable.java (line 209) Completed flushing /var/lib/cassandra/data/system/LocationInfo-3-Data.db INFO [HINTED-HANDOFF-POOL:1] 2010-02-23 18:23:02,217 ColumnFamilyStore.java (line 875) Compacting [] INFO [HINTED-HANDOFF-POOL:1] 2010-02-23 19:23:02,767 ColumnFamilyStore.java (line 875) Compacting [] INFO [HINTED-HANDOFF-POOL:1] 2010-02-23 20:23:03,578 ColumnFamilyStore.java (line 875) Compacting [] INFO [main] 2010-03-26 05:05:50,581 SSTableReader.java (line 169) Sampling index for /var/lib/cassandra/data/system/LocationInfo-1-Data.db INFO [main] 2010-03-26 05:05:50,654 SSTableReader.java (line 169) Sampling index for /var/lib/cassandra/data/system/LocationInfo-2-Data.db INFO [main] 2010-03-26 05:05:50,665 SSTableReader.java (line 169) Sampling index for /var/lib/cassandra/data/system/LocationInfo-3-Data.db INFO [main] 2010-03-26 05:05:50,698 RecoveryManager.java (line 64) Replaying /var/lib/cassandra/commitlog/CommitLog-1266960181124.log INFO [main] 2010-03-26 05:05:50,837 RecoveryManager.java (line 67) Log replay complete INFO [main] 2010-03-26 05:05:50,992 SystemTable.java (line 159) Saved Token found: 32282591296642763138586882639145887056 INFO [main] 2010-03-26 05:05:51,040 StorageService.java (line 281) Starting up server gossip INFO [main] 2010-03-26 05:05:51,217 CassandraDaemon.java
Re: FW: Re: Is ReplicationFactor (eventually) guaranteed?
Attached log and conf file to https://issues.apache.org/jira/browse/CASSANDRA-924. Thanks. On Sat, Mar 27, 2010 at 2:43 PM, Stu Hood wrote: > Could you try running your experiment again with DEBUG logging enabled, and > then attaching the logs to a JIRA? > > -Original Message- > From: "Jianing Hu" > Sent: Saturday, March 27, 2010 12:07pm > To: user@cassandra.apache.org > Subject: Re: FW: Re: Is ReplicationFactor (eventually) guaranteed? > > Here's the conf file, with comments removed. Thanks a lot for your help. > > > dev > false > > > 0.01 > > > > > CompareWith="UTF8Type" > CompareSubcolumnsWith="UTF8Type" > Name="Super1" > Comment="A column family with supercolumns, whose > column and subcolumn names are UTF8 strings"/> > > > org.apache.cassandra.dht.OrderPreservingPartitioner > foo3 > org.apache.cassandra.locator.EndPointSnitch > org.apache.cassandra.locator.RackUnawareStrategy > 2 > /var/lib/cassandra/commitlog > > /var/lib/cassandra/data > > /var/lib/cassandra/callouts > /var/lib/cassandra/staging > > cs1 > cs2 > cs3 > > 5000 > 128 > 10.0.1.1 > > 7000 > > 7001 > 10.0.1.1 > 9160 > false > 64 > 32 > 8 > 64 > 64 > 0.1 > 60 > 16 > 64 > periodic > 1 > 864000 > 256 > > > > > On Fri, Mar 26, 2010 at 10:00 PM, Stu Hood wrote: >> Ack... very sorry. I read the original message too quickly. >> >> The fact that neither read-repair nor anti-entropy are working is suspicious >> though. Do you think you could paste your config somewhere? >> >> -Original Message- >> From: "Stu Hood" >> Sent: Friday, March 26, 2010 11:57pm >> To: user@cassandra.apache.org >> Subject: Re: Is ReplicationFactor (eventually) guaranteed? >> >> replication factor == 1 means that there is only one copy of the data. And >> you deleted it. Repair depends on the replication factor being greater than >> 1. >> >> -Original Message- >> From: "Jianing Hu" >> Sent: Friday, March 26, 2010 9:33pm >> To: user@cassandra.apache.org >> Subject: Re: Is ReplicationFactor (eventually) guaranteed? >> >> That's not what I saw in my test. I'm probably making some noob >> mistakes. Can someone enlighten me? Here's what I did: >> 1) Bring up a cluster with three servers cs1,2,3, with their initial >> token set to 'foo3', 'foo6', and 'foo9', respectively. >> ReplicationFactor is set to 2 on all 3. >> 2) Insert 9 columns with keys from 'foo1' to 'foo9', and flush. Now I >> have foo1,2,3,7,8,9 on cs1, foo1,2,3,4,5,6, on cs2, and foo4,5,6,7,8,9 >> on cs3. So far so good >> 3) Bring down cs3 and wipe out its data directory >> 4) Bring up cs3 >> 5) run repair Keyspace1 on cs3, the flush >> At this point I expect to see cs3 getting its data back. But there's >> nothing in its data directory. I also tried getting all columns with >> ConsistencyLevel::ALL to see if that'll do a read pair. But still >> cs3's data directory is empty. What am I doing wrong? >> >> This is 0.5.1 BTW. >> >> Thanks, >> - Jianing >> >> On Fri, Mar 26, 2010 at 6:12 PM, Rob Coli wrote: >>> On 3/26/10 5:57 PM, Jianing Hu wrote: In a cluster with ReplicationFactor> 1, if one server goes down, will new replicas be created on other servers to satisfy the set ReplicationFactor? >>> >>> Yes, via Anti-Entropy. >>> >>> http://wiki.apache.org/cassandra/AntiEntropy >>> http://wiki.apache.org/cassandra/ArchitectureAntiEntropy >>> >>> It's worth noting that "hot" keys are likely to be re-replicated by Read >>> Repair before Anti Entropy is triggered. >>> >>> http://wiki.apache.org/cassandra/ReadRepair >>> >>> =Rob >>> >>> >>> >>> >> >> >> >> > > >
Multi-indexing data
Hi, I have a question about Cassandra's data model I was hoping you guys could help me with. Most of our queries are performed against a series of tables containing crypto keys and their associated meta data. A key could have any number of identifiable attributes that need to be searchable: iasn, 64bit key id, 32bit key id, expiration, revoker etc… From what I understand I don't believe tagging the same information with multiple keys is supported. The best that I could think of was to add key/value pairs in the form "keyid_32_0x73A5DC55:some junk data" to the row or maintain a separate set of columns that provide "keyid_32_0x73A5DC55:key row name" mappings and perform two queries. I'm not a fan of either of these options. Is there some other solution that I may have overlooked? Thanks, -Matt