Re: Couple of minor logging / error message things

2010-09-29 Thread aaron morton
Added CASSANDRA-1556 for the error message. 

Aaron

On 29 Sep 2010, at 16:45, Jonathan Ellis wrote:

> On Tue, Sep 28, 2010 at 10:28 PM, Aaron Morton  
> wrote:
>> Noticed these when working against the current 0.7.0 beta2 (#3) build...
>> When sending a system_add_keyspace request with an invalid keyspace the
>> response to the client is fine...
>> 
>> (python)
>> InvalidRequestException: InvalidRequestException(why='Invalid keyspace name:
>> Test Keyspace 1285729085.78')
>> However it's not logged in the system.log.
> 
> We don't log any InvalidRequest.  Not sure if it's worth
> special-casing system_ methods there.
> 
>> In thrift/CassandraServer system_add_keyspace() the strategy_class string
>> from the KsDef is used to load a class. The ClassNotFoundError is then
>> caught and used to build an InvalidRequestException. If the strategy_class
>> is missing or empty, the error returned to the client is
>> (python)
>> InvalidRequestException: InvalidRequestException(why='')
> 
> Can you create a ticket for this?
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com



Re: Truncate + Snapshot + Cannot Allocate Memory == Timeout

2010-09-29 Thread aaron morton
Created CASSANDRA-1557 for the error masking. 

Aaron

On 29 Sep 2010, at 18:19, Jonathan Ellis wrote:

> On Tue, Sep 28, 2010 at 11:25 PM, Aaron Morton  
> wrote:
>> 1) Is the memory error just a result of me letting my machine run stupidly
>> low on memory?
> 
> No, it's the JVM forking to run ln.  Enable overcommit, or get JNA so
> it does the link w/ native code instead.
> 
>> 2) Should it have returned an ApplicationError or some such in this case?
>> The code in ColumnFamilyStore:1368 is catching the IOException from the call
>> to FileUtils.createHardLink and wrapping it in an IOError. However the code
>> in TruncateVerbHandler:56 is looking for the IOException.
> 
> That does sound like a bug.
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com



Re: avro + cassandra + ruby

2010-09-29 Thread Gary Dusbabek
We have a system test that tests this (in avro python).  see
test/system/test_avro_standard.py:TestStandardOperations.test_multiget_slice_simple.

On Wed, Sep 29, 2010 at 01:06, Gabor Torok  wrote:
> Hi,
> I'm attempting to use avro to talk to cassandra because the ruby thrift 
> client's read performance is pretty bad (I measured 4x slower than java).
>
> However, I run into a problem when calling multiget_slice.
> The server gives a KeyspaceNotDefinedException because 
> clientState.getKeyspace() returns null.
> It seems this is because ClientState stores the keyspace in a ThreadLocal.
>
> I call set_keyspace and clientState stores the keyspace value. I guess the 
> next avro call to multiget_slice runs in a different thread so it can't 
> retrieve the value.
>
> In ruby, I use Avro::IPC::HTTPTransceiver as the transport which I believe is 
> a stateless transport. I also tried SocketTransport, but that died with a 
> malloc exception.
>
> Is this a problem with the ruby avro library (I use avro 1.4.0), or how the 
> server handles avro threads?
> Any help would be appreciated!
>
> Thanks,
> --Gabor
>


Cassandra documentation available

2010-09-29 Thread Jonathan Ellis
Riptano has posted Cassandra documentation at
http://www.riptano.com/docs/, starting with 0.6.5.  We're working to
add sections on Hadoop integration and on troubleshooting, as well as
a version for 0.7.  Feedback is welcome!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra documentation available

2010-09-29 Thread Guilherme Defreitas
Hi,

There is an issue in http://www.riptano.com/docs/0.6.5/intro/strengths in
the Reliable chapter.
"Because all nodes are symmetric and there are no "master" nodes, there is
single point of failure." when the correct should be " Because all nodes are
symmetric and there are no "master" nodes, there is *NO *single point of
failure."


On Wed, Sep 29, 2010 at 11:51 AM, Jonathan Ellis  wrote:

> Riptano has posted Cassandra documentation at
> http://www.riptano.com/docs/, starting with 0.6.5.  We're working to
> add sections on Hadoop integration and on troubleshooting, as well as
> a version for 0.7.  Feedback is welcome!
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Cassandra documentation available

2010-09-29 Thread Scott Mann
On the authentication page
(http://www.riptano.com/docs/0.6.5/install/auth-config), the link
associated with "passwd.mode=MD5" is bad. It should take you to the
table onhttp://www.riptano.com/docs/0.6.5/install/env-settings.

Do you want these sent to you or someone else directly? Save the list?
As I come across things, I'm happy to note them.

Good job, by the way!

-Scott

On Wed, Sep 29, 2010 at 9:14 AM, Guilherme Defreitas
 wrote:
> Hi,
> There is an issue in http://www.riptano.com/docs/0.6.5/intro/strengths in
> the Reliable chapter.
> "Because all nodes are symmetric and there are no "master" nodes, there is
> single point of failure." when the correct should be " Because all nodes are
> symmetric and there are no "master" nodes, there is NO single point of
> failure."
>
> On Wed, Sep 29, 2010 at 11:51 AM, Jonathan Ellis  wrote:
>>
>> Riptano has posted Cassandra documentation at
>> http://www.riptano.com/docs/, starting with 0.6.5.  We're working to
>> add sections on Hadoop integration and on troubleshooting, as well as
>> a version for 0.7.  Feedback is welcome!
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
-Scott


Re: Cassandra documentation available

2010-09-29 Thread Jonathan Ellis
We'll get those fixed.

Here or tho...@riptano.com directly is fine.

Thanks!

On Wed, Sep 29, 2010 at 11:34 AM, Scott Mann  wrote:
> On the authentication page
> (http://www.riptano.com/docs/0.6.5/install/auth-config), the link
> associated with "passwd.mode=MD5" is bad. It should take you to the
> table onhttp://www.riptano.com/docs/0.6.5/install/env-settings.
>
> Do you want these sent to you or someone else directly? Save the list?
> As I come across things, I'm happy to note them.
>
> Good job, by the way!
>
> -Scott
>
> On Wed, Sep 29, 2010 at 9:14 AM, Guilherme Defreitas
>  wrote:
>> Hi,
>> There is an issue in http://www.riptano.com/docs/0.6.5/intro/strengths in
>> the Reliable chapter.
>> "Because all nodes are symmetric and there are no "master" nodes, there is
>> single point of failure." when the correct should be " Because all nodes are
>> symmetric and there are no "master" nodes, there is NO single point of
>> failure."
>>
>> On Wed, Sep 29, 2010 at 11:51 AM, Jonathan Ellis  wrote:
>>>
>>> Riptano has posted Cassandra documentation at
>>> http://www.riptano.com/docs/, starting with 0.6.5.  We're working to
>>> add sections on Hadoop integration and on troubleshooting, as well as
>>> a version for 0.7.  Feedback is welcome!
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>
>>
>
>
>
> --
> -Scott
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: avro + cassandra + ruby

2010-09-29 Thread Ryan King
On Tue, Sep 28, 2010 at 4:06 PM, Gabor Torok
 wrote:
> Hi,
> I'm attempting to use avro to talk to cassandra because the ruby thrift 
> client's read performance is pretty bad (I measured 4x slower than java).
>
> However, I run into a problem when calling multiget_slice.
> The server gives a KeyspaceNotDefinedException because 
> clientState.getKeyspace() returns null.
> It seems this is because ClientState stores the keyspace in a ThreadLocal.
>
> I call set_keyspace and clientState stores the keyspace value. I guess the 
> next avro call to multiget_slice runs in a different thread so it can't 
> retrieve the value.
>
> In ruby, I use Avro::IPC::HTTPTransceiver as the transport which I believe is 
> a stateless transport. I also tried SocketTransport, but that died with a 
> malloc exception.

Was this exception on the server or in the client? The ruby avro code
is pretty new, so the probability of bugs is pretty high.

-ryan

> Is this a problem with the ruby avro library (I use avro 1.4.0), or how the 
> server handles avro threads?
> Any help would be appreciated!
>
> Thanks,
> --Gabor
>


Marking each node down before rolling restart

2010-09-29 Thread Justin Sanders
I looked through the documentation but couldn't find anything.  I was
wondering if there is a way to manually mark a node "down" in the cluster
instead of killing the cassandra process and letting the other nodes figure
out the node is no longer up.

The reason I ask is because we are having an issue when we perform rolling
restarts on the cluster.  Basically read requests that come in on other
nodes will block while they are waiting on the node that was just killed to
be marked down.  Before they realize the node is offline they will throw a
TimedOutException.

If I could mark the node being down ahead of time this timeout period could
be avoided.  Any help is appreciated.

Justin


MessagingServiceMBean does not support o.a.c.concurrent task counts

2010-09-29 Thread Aaron Morton
Running the current 0.7.0-beta2 #3 build, I get the error below when checking tpstats via node_tool. The MessagingService is registering itself as a mbean in org.apache.cassandra.concurrent, and the MessagingServiceMBean interface does not support the getActiveCount(), getCompletedTasks() and getPendingTasks() functions of the IExecutorMBean. Out of interest should it be registering the JMXEnabledThreadPoolExecutor it created on line 105, or should it be registered elsewhere?It's still possible to view the task pool info through jconsole. I'll create a bug when Jira finishes re-indexing :)AaronPool Name                    Active   Pending      CompletedMIGRATION_STAGE                   0         0             17GOSSIP_STAGE                      0         0              0MESSAGING-SERVICE-POOL   Exception in thread "main" java.lang.reflect.UndeclaredThrowableException        at $Proxy4.getActiveCount(Unknown Source)        at org.apache.cassandra.tools.NodeCmdprintThreadPoolStats(NodeCmd.java:157)        at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:497)Caused by: javax.management.AttributeNotFoundException: No such attribute: ActiveCount        at com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:63)        at com.sun.jmx.mbeanserver.MBeanSupport.getAttribute(MBeanSupport.java:216)        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:666)        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)        at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1404)        at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)        at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)        at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)        at javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:600)        at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)        at java.lang.reflect.Method.invoke(Method.java:597)        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)        at sun.rmi.transport.Transport$1.run(Transport.java:159)        at java.security.AccessController.doPrivileged(Native Method)        at sun.rmi.transport.Transport.serviceCall(Transport.java:155)        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)        at java.lang.Thread.run(Thread.java:619)        at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCalljava:255)        at sun.rmi.transportStreamRemoteCall.executeCall(StreamRemoteCall.java:233)        at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:142)        at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)        at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(Unknown Source)        at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:878)        at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:263)        ... 3 more

Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
Try nodetool drain Flushes all memtables for a node and causes the node to stop accepting write operations. Read operations will continue to work. This is typically used before upgrading a node to a new version of Cassandra.http://www.riptano.com/docs/0.6.5/utils/nodetoolAaronOn 30 Sep, 2010,at 10:15 AM, Justin Sanders  wrote:I looked through the documentation but couldn't find anything.  I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up.
The reason I ask is because we are having an issue when we perform rolling restarts on the cluster.  Basically read requests that come in on other nodes will block while they are waiting on the node that was just killed to be marked down.  Before they realize the node is offline they will throw a TimedOutException.
If I could mark the node being down ahead of time this timeout period could be avoided.  Any help is appreciated.
Justin



Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
Ah, that was not exactly what you were after. I do not know how long it takes gossip / failure detector to detect a down node. In your case what is the CF you're using for reads and what is your RF? The hope would be that taking one node down at a time would leave enough server running to serve the request. AFAIK the coordinator will make a read request to the first node responsible for the row, and only ask for a digest  from the others. So there may be a case where it has to timeout reading from the first node before asking for the full data from the others.A hack solution may be to reduce the rpc_timeout_in_msMay need some adult supervision to answer this one. AaronOn 30 Sep, 2010,at 10:45 AM, Aaron Morton  wrote:Try nodetool drain Flushes all memtables for a node and causes the node to stop accepting write operations. Read operations will continue to work. This is typically used before upgrading a node to a new version of Cassandra.http://www.riptano.com/docs/0.6.5/utils/nodetoolAaronOn 30 Sep, 2010,at 10:15 AM, Justin Sanders  wrote:I looked through the documentation but couldn't find anything.  I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up.
The reason I ask is because we are having an issue when we perform rolling restarts on the cluster.  Basically read requests that come in on other nodes will block while they are waiting on the node that was just killed to be marked down.  Before they realize the node is offline they will throw a TimedOutException.
If I could mark the node being down ahead of time this timeout period could be avoided.  Any help is appreciated.
Justin



Re: MessagingServiceMBean does not support o.a.c.concurrent task counts

2010-09-29 Thread Brandon Williams
On Wed, Sep 29, 2010 at 4:37 PM, Aaron Morton wrote:

> Running the current 0.7.0-beta2 #3 build, I get the error below when
> checking tpstats via node_tool.
>
> The MessagingService is registering itself as a mbean in
> org.apache.cassandra.concurrent, and the MessagingServiceMBean interface
> does not support the getActiveCount(), getCompletedTasks() and
> getPendingTasks() functions of the IExecutorMBean.
>
> Out of interest should it be registering the JMXEnabledThreadPoolExecutor
> it created on line 105, or should it be registered elsewhere?
>

Unfortunately, this was fixed in trunk in the commit made 4s after the one
that the build is from.  It's annoying, but not critical enough to re-roll
another beta (as you said, everything still works via JMX.)  The GCInspector
will also report a similar error in the logs when it is activated.

-Brandon


Re: MessagingServiceMBean does not support o.a.c.concurrent task counts

2010-09-29 Thread Aaron Morton
All good. And I learnt the JMXEnabledThreadPoolExecutor registers itself as an mbean :)ThanksAaronOn 30 Sep, 2010,at 11:15 AM, Brandon Williams  wrote:On Wed, Sep 29, 2010 at 4:37 PM, Aaron Morton  wrote:

Running the current 0.7.0-beta2 #3 build, I get the error below when checking tpstats via node_tool. The MessagingService is registering itself as a mbean in org.apache.cassandra.concurrent, and the MessagingServiceMBean interface does not support the getActiveCount(), getCompletedTasks() and getPendingTasks() functions of the IExecutorMBean. 

Out of interest should it be registering the JMXEnabledThreadPoolExecutor it created on line 105, or should it be registered elsewhere?Unfortunately, this was fixed in trunk in the commit made 4s after the one that the build is from.  It's annoying, but not critical enough to re-roll another beta (as you said, everything still works via JMX.)  The GCInspector will also report a similar error in the logs when it is activated.

-Brandon


Preventing Swapping.

2010-09-29 Thread Jeremy Davis
Did anyone else see this article on preventing swapping? Seems like it would
also apply to Cassandra.

http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/

-JD


Re: inter node protocol for 0.7 nightly

2010-09-29 Thread Aaron Morton
In case anyone saw this, it's a bad idea that I did not try. It will only work if you can put you system into a read only mode. Otherwise, when a node is taken down to do the upgrade and a client moves over to a second node the data it wrote to first node will not be there. It you have a web site with multiple server processes it's even more of a bonehead idea. AaronOn 27 Sep, 2010,at 05:31 PM, Aaron Morton  wrote:Yeah, I can only get away with it because the nodes have only 20GB each and there's a load of free space. ThanksAaronOn 27 Sep, 2010,at 05:25 PM, Jonathan Ellis  wrote:that should work, but requiring RF=N makes it of limited usefulness in
practice :)

On Sun, Sep 26, 2010 at 9:10 PM, Aaron Morton  wrote:
> It is indeed CASSANDRA-1465 where the NEWS.TXT was changed to say the wire
> protocol had changed
> http://github.com/apache/cassandra/commit/4023c3b6f9d4cd66d56024b07962968f2424815f#diff-1
> As far as I can see the wire protocol change is to remove the string name of
> the TP stage from the message. I can do a shutdown upgrade, but was thinking
> of trying the following just to see if it works
> 1) I have a 4 node with RF3, first increase the RF to 4 and repair so each
> node has all the
> data http://wiki.apache.org/cassandra/FAQ#change_replication
> 2) change my clients to use CL.ANY (i guess ONE would work as well)
> 3) change the listen_address on the nodes to localhost
> What I think I've done now is make a cluster of individual nodes, that
> cannot talk to each other. When a client connects (it has a list of nodes)
> to one node it should be able to read data and perform writes HH's will be
> stored to send the data to the other nodes in the cluster.
> 4) Upgrade each node in turn to the new 0.7 nightly, I would drain the node
> and delete the schema CF's as
> per http://www.mail-archive.com/u...@cassandra.apacheorg/msg05726.html
> 5) return the listen_address back to an empty setting, so the nodes see each
> other again. Then run repair on each node in turn to deliver the HH they
> collected while they could not communicate.
> 6) Change the RF back to 3 and nodetool cleanup on each
> Aside from "huh why?" any thoughts on if this would work? I have a slight
> opportunity to play around and am interested to see if I can roll this out
> without a full shutdown.
> Thanks
> Aaron
>
> On 22 Sep, 2010,at 02:57 PM, Jonathan Ellis  wrote:
>
> Yes, I think that's the one.
>
> I imagine svn blame on NEWS would tell you for sure.
>
> On Tue, Sep 21, 2010 at 8:05 AM, Gary Dusbabek  wrote:
>> 1465 maybe?
>>
>> On Mon, Sep 20, 2010 at 16:00, Aaron Morton 
>> wrote:
>>> Just took a look upgrading from from 31/08 nightly to the 20/09 and
>>> noticed
>>> the news.txt says...
>>> "The Cassandra inter-node protocol is incompatible with 0.6.x releases
>>> (and
>>> with 0.7 beta1)"
>>> Could someone point me to the ticket(s) for this change so I can see if I
>>> can do a rolling upgrade.
>>> Thanks
>>> Aaron
>>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Marking each node down before rolling restart

2010-09-29 Thread Justin Sanders
It seems to be about 15 seconds after killing a node before the other nodes
report it being down.

We are running a 9 node cluster with RF=3, all reads and writes at quorum.
 I was making the same assumption you are, that an operation would complete
fine at quorum with only one node down since the other two nodes would be
able to respond.

Justin


On Wed, Sep 29, 2010 at 5:58 PM, Aaron Morton wrote:

> Ah, that was not exactly what you were after. I do not know how long it
> takes gossip / failure detector to detect a down node.
>
> In your case what is the CF you're using for reads and what is your RF? The
> hope would be that taking one node down at a time would leave enough server
> running to serve the request. AFAIK the coordinator will make a read request
> to the first node responsible for the row, and only ask for a digest  from
> the others. So there may be a case where it has to timeout reading from the
> first node before asking for the full data from the others.
>
> A hack solution may be to reduce the rpc_timeout_in_ms
>
> May need some adult supervision to answer this one.
>
> Aaron
>
> On 30 Sep, 2010,at 10:45 AM, Aaron Morton  wrote:
>
> Try nodetool drain
>
> Flushes all memtables for a node and causes the node to stop accepting
> write operations. Read operations will continue to work. This is typically
> used before upgrading a node to a new version of Cassandra.
> http://www.riptano.com/docs/0.6.5/utils/nodetool
>
> Aaron
>
>
> On 30 Sep, 2010,at 10:15 AM, Justin Sanders  wrote:
>
> I looked through the documentation but couldn't find anything.  I was
> wondering if there is a way to manually mark a node "down" in the cluster
> instead of killing the cassandra process and letting the other nodes figure
> out the node is no longer up.
>
> The reason I ask is because we are having an issue when we perform rolling
> restarts on the cluster.  Basically read requests that come in on other
> nodes will block while they are waiting on the node that was just killed to
> be marked down.  Before they realize the node is offline they will throw a
> TimedOutException.
>
> If I could mark the node being down ahead of time this timeout period could
> be avoided.  Any help is appreciated.
>
> Justin
>
>


Re: Marking each node down before rolling restart

2010-09-29 Thread Aaron Morton
I just ran nodetool drain in a 3 node cluster that was not serving any requests, the other nodes picked up the change in about 10 seconds.On the node I drained  INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,281 StorageService.java (line 474) Starting drain process INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,282 MessagingServicejava (line 348) Shutting down MessageService... INFO [ACCEPT-sorb/192.168.34.31] 2010-09-30 15:18:03,289 MessagingService.java (line 529) MessagingService shutting down server thread. INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,290 MessagingService.java (line 365) Shutdown complete (no further commands will be processed) INFO [RMI TCP Connection(39)-192.168.34.31] 2010-09-30 15:18:03,339 StorageService.java (line 474) Node is drainedOne  of the others INFO [Timer-0] 2010-09-30 15:18:12,753 Gossiper.java (line 196) InetAddress /192.168.34.31 is now dead.DEBUG [Timer-0] 2010-09-30 15:18:12,753 MessagingService.java (line 134) Resetting pool for /192.168.34.31Either way, I would say it's safer to drain the node first. As it writes out the SSTables and drains the log, so after the reboot the server will not need to play forward the log. This may be a good thing in the event of an issue with the upgrade. My guess is:- drain the node- other nodes can still read from it, it will actively reject writes (because the Messaging Service is down). So no timeouts.- wait until the down state of the node is propagated around the cluster, then shut it down.  I may be able to test out the theory under a light load later today or tomorrow. Anyone else have any thoughts?AaronOn 30 Sep, 2010,at 02:54 PM, Justin Sanders  wrote:It seems to be about 15 seconds after killing a node before the other nodes report it being down.  We are running a 9 node cluster with RF=3, all reads and writes at quorum.  I was making the same assumption you are, that an operation would complete fine at quorum with only one node down since the other two nodes would be able to respond.
JustinOn Wed, Sep 29, 2010 at 5:58 PM, Aaron Morton  wrote:

Ah, that was not exactly what you were after. I do not know how long it takes gossip / failure detector to detect a down node. 

In your case what is the CF you're using for reads and what is your RF? The hope would be that taking one node down at a time would leave enough server running to serve the request. AFAIK the coordinator will make a read request to the first node responsible for the row, and only ask for a digest  from the others. So there may be a case where it has to timeout reading from the first node before asking for the full data from the others.

A hack solution may be to reduce the rpc_timeout_in_msMay need some adult supervision to answer this one. Aaron

On 30 Sep, 2010,at 10:45 AM, Aaron Morton  wrote:
Try nodetool drain Flushes all memtables for a node and causes the node to stop accepting write operations. Read operations will continue to work. This is typically used before upgrading a node to a new version of Cassandra.

http://www.riptano.com/docs/0.6.5/utils/nodetool

Aaron

On 30 Sep, 2010,at 10:15 AM, Justin Sanders  wrote:

I looked through the documentation but couldn't find anything.  I was wondering if there is a way to manually mark a node "down" in the cluster instead of killing the cassandra process and letting the other nodes figure out the node is no longer up.


The reason I ask is because we are having an issue when we perform rolling restarts on the cluster.  Basically read requests that come in on other nodes will block while they are waiting on the node that was just killed to be marked down.  Before they realize the node is offline they will throw a TimedOutException


If I could mark the node being down ahead of time this timeout period could be avoided.  Any help is appreciated.


Justin





Re: Preventing Swapping.

2010-09-29 Thread Narendra Sharma
Read "Use mlockall via JNA, if present, to prevent Linux from swapping out
parts of the JVM " on
following link:
http://www.riptano.com/blog/whats-new-cassandra-065

-Naren

On Wed, Sep 29, 2010 at 5:21 PM, Jeremy Davis
wrote:

>
> Did anyone else see this article on preventing swapping? Seems like it
> would also apply to Cassandra.
>
>
> http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
>
> -JD
>
>