date:20100604

Re: High read latency

2010-06-04 Thread Sylvain Lebresne

As written in the third point of
http://wiki.apache.org/cassandra/CassandraLimitations,
right now, super columns are not indexed and deserialized fully when you access
them. Another way to put it is, you'll want to user super columns with
only a relatively
small number of columns in them.
Because in you example, even if you read 1 column in 1 row, the full
supercolumn
where this column is is read from disk. Now, 50ms to read 5 records isn't
necessarily too bad.

ColumnIndexSizeInKB will not help here (as superColumns are not indexed anyway)
and your only way out is to change you model so that you don't have
super columns
with so many columns.

On Fri, Jun 4, 2010 at 8:26 AM, Ma Xiao  wrote:
> we have a SupperCF which may have up to 1000 supper columns and 5
> clumns for each supper column, the read latency may go up to 50ms
> (even higher), I think it's a long time to response, how to tune the
> storage config to optimize the performace? I read the wiki,
>  may help to do this, supose that by asign a big
> value to this( 2 ex. ), no row and reach this limit so it never
> generate a index for a row. In our production scenario, we only access
> 1 row at a time and with up to 1000 columns slice retuned.  Any
> suggestion?
>

Re: Cassandra training Jun 18 in SF

2010-06-04 Thread Oleg Anastasjev

Jonathan Ellis  gmail.com> writes:

> 
> This will be Riptano's 6th training session (including the four we've
> done that were on-site with a specific customer), and in my humble
> opinion the material's really solid at this point.
> 
> We are actively working on lining up other locations.
> 
Do you have plans for training sessions in Europe ?

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming

I notice that: there are more than 100 "CLOSE_WAIT"  incomming connections 
on storage port 7000


In my two cassandra node:
126 of 146 storage connections is "CLOSE_WAIT"
196 of 217 storage connections is "CLOSE_WAIT"

Is it normal?

--
From: "Chris Goffinet" 
Sent: Friday, June 04, 2010 1:50 PM
To: 
Cc: 
Subject: Re: High CPU Usage since 0.6.2

We're seeing this as well. We were testing with a 40+ node cluster on the 
latest 0.6 branch from few days ago.


-Chris

On Jun 3, 2010, at 9:55 PM, Lu Ming wrote:



I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to 
cassandra to 0.6.2 yesterday.
But today I find six cassandra nodes have high CPU usage more than 400% 
in my 8-core CPU sever.

The worst one is more than 760%. It is very serious.

I use jvisualvm to watch the worst node, and I found that there are many 
running threads named "thread-xxx"

the status of other threads is waiting and sleeping.

"Thread-130" - Thread t...@240
 java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.setMemory(Native Method)
at sun.nio.ch.Util.erase(Util.java:202)
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

 Locked ownable synchronizers:
- None

"Thread-126" - Thread t...@236
 java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:200)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- locked java.lang.obj...@10808561
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


 Locked ownable synchronizers:
- None

"Thread-119" - Thread t...@229
 java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182)
- locked java.lang.obj...@65b4abbd
- locked java.lang.obj...@38773975
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


 Locked ownable synchronizers:
- None

Re: High read latency

2010-06-04 Thread Ma Xiao

It's really a pain to modify the data model,  the problem is how to
handle "one-to-many" relation in cassandra?  The limitation of the row
size will lead to impossible to store them with columns.

On Fri, Jun 4, 2010 at 4:13 PM, Sylvain Lebresne  wrote:
> As written in the third point of
> http://wiki.apache.org/cassandra/CassandraLimitations,
> right now, super columns are not indexed and deserialized fully when you 
> access
> them. Another way to put it is, you'll want to user super columns with
> only a relatively
> small number of columns in them.
> Because in you example, even if you read 1 column in 1 row, the full
> supercolumn
> where this column is is read from disk. Now, 50ms to read 5 records isn't
> necessarily too bad.
>
> ColumnIndexSizeInKB will not help here (as superColumns are not indexed 
> anyway)
> and your only way out is to change you model so that you don't have
> super columns
> with so many columns.
>
> On Fri, Jun 4, 2010 at 8:26 AM, Ma Xiao  wrote:
>> we have a SupperCF which may have up to 1000 supper columns and 5
>> clumns for each supper column, the read latency may go up to 50ms
>> (even higher), I think it's a long time to response, how to tune the
>> storage config to optimize the performace? I read the wiki,
>>  may help to do this, supose that by asign a big
>> value to this( 2 ex. ), no row and reach this limit so it never
>> generate a index for a row. In our production scenario, we only access
>> 1 row at a time and with up to 1000 columns slice retuned.  Any
>> suggestion?
>>
>

Fatal exception in with compaction

2010-06-04 Thread casablinca126.com

 hi ,
I get a fatal exception with my cassandra cluster:

java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4
at 
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156)
at 
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:151)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:205)
at 
org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:80)
at 
org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffManager.java:100)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.db.CompactionManager$4
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 7 more
I made a  modification  that , do not compact sstables whose size >10GB:
 static Set> getBuckets(Iterable 
files, long min)
{
Map, Long> buckets = new 
HashMap, Long>();
for (SSTableReader sstable : files)
{
long size = sstable.length();
if(size > 10L * 1024L * 1024L * 1024L)
continue;
Could some one help explain why this exception happened? Thanks a lot!

regards,

--
casablinca126.com
2010-06-04

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming

I do the Thread Dump on each cassandra node, and count the thread with call 
stack string "at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)atorg.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav 
a:66)" in "thread-xxx"


then I find an interesting things.

Server Name/Thread Count/Average CPU Usage
A   0  <20%
B   8  760%
C   0  <20%
D   3  300%
E   4  220%
F   0  <20%
G   2  200%
H   7  700%
I   2  200%
J   3  300%

It seems that a thread calling 
"org.apache.cassandra.net.IncomingTcpConnection.run" occupies the 100% CPU

So I guess the code in IncomingTcpConnection.java:66 cause high CPU usage.

   if (isStream)
   {
   new IncomingStreamReader(socket.getChannel()).read();
   }



--
From: "Lu Ming" 
Sent: Friday, June 04, 2010 12:55 PM
To: 
Subject: High CPU Usage since 0.6.2



I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to 
cassandra to 0.6.2 yesterday.
But today I find six cassandra nodes have high CPU usage more than 400% in 
my 8-core CPU sever.

The worst one is more than 760%. It is very serious.

I use jvisualvm to watch the worst node, and I found that there are many 
running threads named "thread-xxx"

the status of other threads is waiting and sleeping.

"Thread-130" - Thread t...@240
  java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.setMemory(Native Method)
at sun.nio.ch.Util.erase(Util.java:202)
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

  Locked ownable synchronizers:
- None

"Thread-126" - Thread t...@236
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:200)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- locked java.lang.obj...@10808561
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


  Locked ownable synchronizers:
- None

"Thread-119" - Thread t...@229
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182)
- locked java.lang.obj...@65b4abbd
- locked java.lang.obj...@38773975
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


  Locked ownable synchronizers:
- None

Re: Fatal exception in with compaction

2010-06-04 Thread casablinca126.com

hi,
I have not used nodetool repair or nodetool compact . So how is 
MajorCompaction triggered?

--   
casablinca126.com
2010-06-04

-
发件人：casablinca126.com
发送日期：2010-06-04 18:05:11
收件人：user
抄送：
主题：Fatal exception in with compaction

 hi ,
I get a fatal exception with my cassandra cluster:

java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4
at 
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156)
at 
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:151)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:205)
at 
org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:80)
at 
org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffManager.java:100)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.db.CompactionManager$4
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 7 more
I made a  modification  that , do not compact sstables whose size >10GB:
 static Set> getBuckets(Iterable 
files, long min)
{
Map, Long> buckets = new 
HashMap, Long>();
for (SSTableReader sstable : files)
{
long size = sstable.length();
if(size > 10L * 1024L * 1024L * 1024L)
continue;
Could some one help explain why this exception happened? Thanks a lot!

regards,

--
casablinca126.com
2010-06-04

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Lu Ming


Does the following code in IncomingStreamReader.read cause 100% CPU???

 while (bytesRead < pendingFile.getExpectedBytes()) {
   bytesRead += fc.transferFrom(socketChannel, bytesRead, 
FileStreamTask.CHUNK_SIZE);

   pendingFile.update(bytesRead);
   }

BTW: our cassandra cluster is delpoyed in two datacenters.

--
From: "Lu Ming" 
Sent: Friday, June 04, 2010 7:01 PM
To: 
Subject: Re: High CPU Usage since 0.6.2

I do the Thread Dump on each cassandra node, and count the thread with 
call stack string "at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)atorg.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav 
a:66)" in "thread-xxx"


then I find an interesting things.

Server Name/Thread Count/Average CPU Usage
A   0  <20%
B   8  760%
C   0  <20%
D   3  300%
E   4  220%
F   0  <20%
G   2  200%
H   7  700%
I   2  200%
J   3  300%

It seems that a thread calling 
"org.apache.cassandra.net.IncomingTcpConnection.run" occupies the 100% CPU

So I guess the code in IncomingTcpConnection.java:66 cause high CPU usage.

   if (isStream)
   {
   new IncomingStreamReader(socket.getChannel()).read();
   }



--
From: "Lu Ming" 
Sent: Friday, June 04, 2010 12:55 PM
To: 
Subject: High CPU Usage since 0.6.2



I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to 
cassandra to 0.6.2 yesterday.
But today I find six cassandra nodes have high CPU usage more than 400% 
in my 8-core CPU sever.

The worst one is more than 760%. It is very serious.

I use jvisualvm to watch the worst node, and I found that there are many 
running threads named "thread-xxx"

the status of other threads is waiting and sleeping.

"Thread-130" - Thread t...@240
  java.lang.Thread.State: RUNNABLE
at sun.misc.Unsafe.setMemory(Native Method)
at sun.nio.ch.Util.erase(Util.java:202)
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)

  Locked ownable synchronizers:
- None

"Thread-126" - Thread t...@236
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:200)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- locked java.lang.obj...@10808561
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


  Locked ownable synchronizers:
- None

"Thread-119" - Thread t...@229
  java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182)
- locked java.lang.obj...@65b4abbd
- locked java.lang.obj...@38773975
at 
sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)

at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)


  Locked ownable synchronizers:
- None

Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen

Are 6..8 seconds to read 23.000 small rows - as it should be?

I have a quick question on what I think is bad read performance for this simple 
setup:

 

SCF:Dashboard
  key:username1 -> { 
SC:uniqStr1 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr2 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr3 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr4 -> { col1:val1, col2: val2, ... col8:val8 },
... up to 23.000 "rows"
  key:username2 -> { 
SC:uniqStr5 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr6 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr7 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr8 -> { col1:val1, col2: val2, ... col8:val8 },
...

A given key "username1" has e.g. 23.000 super column unique (rows). When I try 
and simply raw-read all these rows, it takes what I think isn't pretty fast - 
approximately 6-8 seconds. I know, there are a millions things that affect 
this, but I would just like to have a yes or no if this really can be as it 
should be?

My cassandra is a pretty unchanged v0.6.1.

I read using this code:

 ColumnParent parent = new ColumnParent("Dashboard");

 SlicePredicate predicate = new SlicePredicate();
 SliceRange sliceRange = new SliceRange();
 sliceRange.setCount(Integer.MAX_VALUE);
 sliceRange.setStart(toRawValue(""));
 sliceRange.setFinish(toRawValue(""));
 predicate.setSlice_range(sliceRange);

 // timing this takes 6-8 secs.
 return client.get_slice(
   "keyspace", 
   "theusername", 
   columnParent, 
   slicePredicate, 
   ConsistencyLevel.QUORUM
 );
 
My replication factor is 1 and I had two nodes setup in cluster when doing the 
reads.

Shouldn't this be what cassandra can do dead-fast?

Re: Cassandra training Jun 18 in SF

2010-06-04 Thread S Ahmed

Nice!

Would it be possible to give more than 2 weeks notice for the following
events? Preferrably a month, its not that easy to get off work etc.

On Fri, Jun 4, 2010 at 4:22 AM, Oleg Anastasjev  wrote:

> Jonathan Ellis  gmail.com> writes:
>
> >
> > This will be Riptano's 6th training session (including the four we've
> > done that were on-site with a specific customer), and in my humble
> > opinion the material's really solid at this point.
> >
> > We are actively working on lining up other locations.
> >
> Do you have plans for training sessions in Europe ?
>
>
>
>

RE: Cassandra Cluster Setup

2010-06-04 Thread Stephan Pfammatter

Tx Nahor/Gary/Ben. I blew everything away, adjusted the seed to Cassandra-ca 
and changed the replication factor.
Now I can see the sample keys being spread across the 3 nodes:
Cassandra-ca holds rows (keys: 1,5,8)
Cassandra-az holds rows (keys: 11,55)
Cassandra-or hodls rows (keys: 88)

I don't see the data being duplicated yet. Am I missing a storage-conf setting? 
I need to proof that if Cassandra-az fails that I can still provide key 11,55. 

-Original Message-
From: Nahor [mailto:nahor.j+gm...@gmail.com] 
Sent: Thursday, June 03, 2010 8:25 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra Cluster Setup

  On 2010-06-03 13:07, Stephan Pfammatter wrote:
> Cassandra-or
> [...]
> cassandra-or

Aside from the replication factor noted by Gary, this should point to 
your existing node (cassandra-ca) otherwise, how will this node know 
where existing node is and where to get the data from?

> Cassandra-az
> [...]
> cassandra-az

Same here

Re: High CPU Usage since 0.6.2

2010-06-04 Thread Gary Dusbabek

Chris,

Can you get me a stack dump of one of the busy nodes (kill -3)?

Gary

On Thu, Jun 3, 2010 at 22:50, Chris Goffinet  wrote:
> We're seeing this as well. We were testing with a 40+ node cluster on the 
> latest 0.6 branch from few days ago.
>
> -Chris
>
> On Jun 3, 2010, at 9:55 PM, Lu Ming wrote:
>
>>
>> I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to 
>> cassandra to 0.6.2 yesterday.
>> But today I find six cassandra nodes have high CPU usage more than 400% in 
>> my 8-core CPU sever.
>> The worst one is more than 760%. It is very serious.
>>
>> I use jvisualvm to watch the worst node, and I found that there are many 
>> running threads named "thread-xxx"
>> the status of other threads is waiting and sleeping.
>>
>> "Thread-130" - Thread t...@240
>>  java.lang.Thread.State: RUNNABLE
>>       at sun.misc.Unsafe.setMemory(Native Method)
>>       at sun.nio.ch.Util.erase(Util.java:202)
>>       at 
>> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:560)
>>       at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
>>       at 
>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>>       at 
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>>  Locked ownable synchronizers:
>>       - None
>>
>> "Thread-126" - Thread t...@236
>>  java.lang.Thread.State: RUNNABLE
>>       at sun.nio.ch.FileDispatcher.read0(Native Method)
>>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>>       at sun.nio.ch.IOUtil.read(IOUtil.java:200)
>>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>>       - locked java.lang.obj...@10808561
>>       at 
>> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)
>>       at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
>>       at 
>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>>       at 
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>>
>>  Locked ownable synchronizers:
>>       - None
>>
>> "Thread-119" - Thread t...@229
>>  java.lang.Thread.State: RUNNABLE
>>       at sun.nio.ch.NativeThread.current(Native Method)
>>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:182)
>>       - locked java.lang.obj...@65b4abbd
>>       - locked java.lang.obj...@38773975
>>       at 
>> sun.nio.ch.FileChannelImpl.transferFromArbitraryChannel(FileChannelImpl.java:565)
>>       at sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:603)
>>       at 
>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)
>>       at 
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)
>>
>>  Locked ownable synchronizers:
>>       - None
>>
>>
>
>

RE: Cassandra training Jun 18 in SF

2010-06-04 Thread Parsacala Jr, Nazario R. [Tech]

Do you have details on this ..?

From: S Ahmed [mailto:sahmed1...@gmail.com]
Sent: Friday, June 04, 2010 9:50 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra training Jun 18 in SF

Nice!

Would it be possible to give more than 2 weeks notice for the following events? 
Preferrably a month, its not that easy to get off work etc.
On Fri, Jun 4, 2010 at 4:22 AM, Oleg Anastasjev 
mailto:olega...@gmail.com>> wrote:
Jonathan Ellis  gmail.com> writes:

>
> This will be Riptano's 6th training session (including the four we've
> done that were on-site with a specific customer), and in my humble
> opinion the material's really solid at this point.
>
> We are actively working on lining up other locations.
>
Do you have plans for training sessions in Europe ?

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Jonathan Ellis

get_slice reads a single row.  do you mean there are 23,000 columns,
or are you running get_slice in a loop 23000 times?

On Fri, Jun 4, 2010 at 4:59 AM, Per Olesen  wrote:
> Are 6..8 seconds to read 23.000 small rows - as it should be?
>
> I have a quick question on what I think is bad read performance for this 
> simple setup:
>
>      ColumnType="Super"
>    CompareWith="UTF8Type"
>    CompareSubcolumnsWith="UTF8Type" />
>
> SCF:Dashboard
>  key:username1 -> {
>        SC:uniqStr1 -> { col1:val1, col2: val2, ... col8:val8 },
>        SC:uniqStr2 -> { col1:val1, col2: val2, ... col8:val8 },
>        SC:uniqStr3 -> { col1:val1, col2: val2, ... col8:val8 },
>        SC:uniqStr4 -> { col1:val1, col2: val2, ... col8:val8 },
>        ... up to 23.000 "rows"
>  key:username2 -> {
>        SC:uniqStr5 -> { col1:val1, col2: val2, ... col8:val8 },
>        SC:uniqStr6 -> { col1:val1, col2: val2, ... col8:val8 },
>        SC:uniqStr7 -> { col1:val1, col2: val2, ... col8:val8 },
>        SC:uniqStr8 -> { col1:val1, col2: val2, ... col8:val8 },
>        ...
>
> A given key "username1" has e.g. 23.000 super column unique (rows). When I 
> try and simply raw-read all these rows, it takes what I think isn't pretty 
> fast - approximately 6-8 seconds. I know, there are a millions things that 
> affect this, but I would just like to have a yes or no if this really can be 
> as it should be?
>
> My cassandra is a pretty unchanged v0.6.1.
>
> I read using this code:
>
>  ColumnParent parent = new ColumnParent("Dashboard");
>
>  SlicePredicate predicate = new SlicePredicate();
>  SliceRange sliceRange = new SliceRange();
>  sliceRange.setCount(Integer.MAX_VALUE);
>  sliceRange.setStart(toRawValue(""));
>  sliceRange.setFinish(toRawValue(""));
>  predicate.setSlice_range(sliceRange);
>
>  // timing this takes 6-8 secs.
>  return client.get_slice(
>   "keyspace",
>   "theusername",
>   columnParent,
>   slicePredicate,
>   ConsistencyLevel.QUORUM
>  );
>
> My replication factor is 1 and I had two nodes setup in cluster when doing 
> the reads.
>
> Shouldn't this be what cassandra can do dead-fast?
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen

On Jun 4, 2010, at 4:46 PM, Jonathan Ellis wrote:

> get_slice reads a single row.  do you mean there are 23,000 columns,
> or are you running get_slice in a loop 23000 times?

Hi Jonathan, thanks for answering!

No, I do only one get_slice call.

There are 23.000 SUPER columns, which I read using get_slice with ColumnParent 
parameter set to only CF name (Dashboard) and a SlicePredicate, that has "" for 
begin on super column name and "" for end on super column name.

So, I do one single get_slice to get all the super-columns. This is the thrift 
call, that takes approx. 6-8 secs.

I then iterate over this after the call, to extract columns for each 
super-column, but that is not in my timings and it also performs no thrift 
calls.

Like this:

>>  ColumnParent parent = new ColumnParent("Dashboard");
>> 
>>  SlicePredicate predicate = new SlicePredicate();
>>  SliceRange sliceRange = new SliceRange();
>>  sliceRange.setCount(Integer.MAX_VALUE);
>>  sliceRange.setStart(toRawValue(""));
>>  sliceRange.setFinish(toRawValue(""));
>>  predicate.setSlice_range(sliceRange);
>> 
>>  // timing this takes 6-8 secs.
>>  return client.get_slice(
>>   "keyspace",
>>   "theusername",
>>   columnParent,
>>   slicePredicate,
>>   ConsistencyLevel.QUORUM
>>  );

http://voltdb.com/ ?

2010-06-04 Thread Denis Haskin

Anybody looked at VoltDB?  I haven't dug into it, but curious about it.

dwh

Re: [SPAM ] Re: question about class SlicePredicate

2010-06-04 Thread David Boxenhorn

It works for Random Partitioner only if you want to get all keys.

2010/6/4 Shuai Yuan 

> It's documented that get_range_slice() supports all partitioner in 0.6
>
> Kevin
>
>  原始信件 
> 发件人: Olivier Mallassi 
> 收件人: user@cassandra.apache.org 
> 主题: [***SPAM*** ] Re: question about class SlicePredicate
> 日期: Tue, 1 Jun 2010 13:38:03 +0200
>
> Does it work whatever the chosen partionner?
> Or only for OrderPreservingPartitionner ?
>
> On Tuesday, June 1, 2010, Eric Yu  wrote:
> > It needs a SliceRange. For example:
> > SliceRange range = new SliceRange();
> > range.setStart("".getBytes());
> > range.setFinish("".getBytes());
> > range.setReversed(true);
> > range.setCount(20);
> >
> > SlicePredicate sp = new SlicePredicate();
> > sp.setSlice_range(range);
> >
> > client.get_slice(KEYSPACE, KEY, ColumnParent, sp, ConsistencyLevel.ONE);
> > 2010/6/1 Shuai Yuan 
> > Hi all,
> >
> > I don't quite understand the usage of 'class SlicePredicate' when trying
> > to retrieve a ranged slice.
> >
> > How should it be initialized?
> >
> > Thanks!
> > --
> > Kevin Yuan
> > www.yuan-shuai.info
> >
> >
> >
> >
> >
>
>
>
>
>

RE: http://voltdb.com/ ?

2010-06-04 Thread Jones, Nick

I saw a tweet about claiming far better performance to Cassandra.  After 
following up, I found out it requires the entire DB to reside in memory across 
the nodes.

Nick Jones

From: Denis Haskin [mailto:de...@haskinferguson.net]
Sent: Friday, June 04, 2010 10:17 AM
To: user
Subject: http://voltdb.com/ ?

Anybody looked at VoltDB?  I haven't dug into it, but curious about it.

dwh

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Ben Browning

How many subcolumns are in each supercolumn and how large are the
values? Your example shows 8 subcolumns, but I didn't know if that was
the actual number. I've been able to read columns out of Cassandra at
an order of magnitude higher than what you're seeing here but there
are too many variables to directly compare.

Keep in mind that the results from each thrift call has to fit into
memory - you might be better off paging through the 23000 columns,
reading a few thousand at a time.

Ben

On Fri, Jun 4, 2010 at 11:01 AM, Per Olesen  wrote:
> On Jun 4, 2010, at 4:46 PM, Jonathan Ellis wrote:
>
>> get_slice reads a single row.  do you mean there are 23,000 columns,
>> or are you running get_slice in a loop 23000 times?
>
> Hi Jonathan, thanks for answering!
>
> No, I do only one get_slice call.
>
> There are 23.000 SUPER columns, which I read using get_slice with 
> ColumnParent parameter set to only CF name (Dashboard) and a SlicePredicate, 
> that has "" for begin on super column name and "" for end on super column 
> name.
>
> So, I do one single get_slice to get all the super-columns. This is the 
> thrift call, that takes approx. 6-8 secs.
>
> I then iterate over this after the call, to extract columns for each 
> super-column, but that is not in my timings and it also performs no thrift 
> calls.
>
> Like this:
>
>>>  ColumnParent parent = new ColumnParent("Dashboard");
>>>
>>>  SlicePredicate predicate = new SlicePredicate();
>>>  SliceRange sliceRange = new SliceRange();
>>>  sliceRange.setCount(Integer.MAX_VALUE);
>>>  sliceRange.setStart(toRawValue(""));
>>>  sliceRange.setFinish(toRawValue(""));
>>>  predicate.setSlice_range(sliceRange);
>>>
>>>  // timing this takes 6-8 secs.
>>>  return client.get_slice(
>>>   "keyspace",
>>>   "theusername",
>>>   columnParent,
>>>   slicePredicate,
>>>   ConsistencyLevel.QUORUM
>>>  );
>
>

Re: Cassandra Cluster Setup

2010-06-04 Thread Gary Dusbabek

Great.  It looks as if the replication factor is still 1, not 3.  This
means that each key lives only on one node.  By increasing it to 3,
data will be replicated across all 3 nodes.

Gary.


On Fri, Jun 4, 2010 at 06:51, Stephan Pfammatter
 wrote:
> Tx Nahor/Gary/Ben. I blew everything away, adjusted the seed to Cassandra-ca 
> and changed the replication factor.
> Now I can see the sample keys being spread across the 3 nodes:
> Cassandra-ca holds rows (keys: 1,5,8)
> Cassandra-az holds rows (keys: 11,55)
> Cassandra-or hodls rows (keys: 88)
>
> I don't see the data being duplicated yet. Am I missing a storage-conf 
> setting? I need to proof that if Cassandra-az fails that I can still provide 
> key 11,55.
>
> -Original Message-
> From: Nahor [mailto:nahor.j+gm...@gmail.com]
> Sent: Thursday, June 03, 2010 8:25 PM
> To: user@cassandra.apache.org
> Subject: Re: Cassandra Cluster Setup
>
>  On 2010-06-03 13:07, Stephan Pfammatter wrote:
>> Cassandra-or
>> [...]
>> cassandra-or
>
>
> Aside from the replication factor noted by Gary, this should point to
> your existing node (cassandra-ca) otherwise, how will this node know
> where existing node is and where to get the data from?
>
>
>> Cassandra-az
>> [...]
>> cassandra-az
>
> Same here
>
>
>

RE: Cassandra Cluster Setup

2010-06-04 Thread Stephan Pfammatter

Nice catch Gary. Just realized that replication factor is granular to key 
space. 

-Original Message-
From: Gary Dusbabek [mailto:gdusba...@gmail.com] 
Sent: Friday, June 04, 2010 11:49 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra Cluster Setup

Great.  It looks as if the replication factor is still 1, not 3.  This
means that each key lives only on one node.  By increasing it to 3,
data will be replicated across all 3 nodes.

Gary.


On Fri, Jun 4, 2010 at 06:51, Stephan Pfammatter
 wrote:
> Tx Nahor/Gary/Ben. I blew everything away, adjusted the seed to Cassandra-ca 
> and changed the replication factor.
> Now I can see the sample keys being spread across the 3 nodes:
> Cassandra-ca holds rows (keys: 1,5,8)
> Cassandra-az holds rows (keys: 11,55)
> Cassandra-or hodls rows (keys: 88)
>
> I don't see the data being duplicated yet. Am I missing a storage-conf 
> setting? I need to proof that if Cassandra-az fails that I can still provide 
> key 11,55.
>
> -Original Message-
> From: Nahor [mailto:nahor.j+gm...@gmail.com]
> Sent: Thursday, June 03, 2010 8:25 PM
> To: user@cassandra.apache.org
> Subject: Re: Cassandra Cluster Setup
>
>  On 2010-06-03 13:07, Stephan Pfammatter wrote:
>> Cassandra-or
>> [...]
>> cassandra-or
>
>
> Aside from the replication factor noted by Gary, this should point to
> your existing node (cassandra-ca) otherwise, how will this node know
> where existing node is and where to get the data from?
>
>
>> Cassandra-az
>> [...]
>> cassandra-az
>
> Same here
>
>
>

Re: Fatal exception in with compaction

2010-06-04 Thread Stu Hood

A "major" compaction is any compaction that sees all of the sstables for a 
column family. In the context of the method you edited, that means that all of 
the SSTables fall into a single bucket, and can be compacted together.

-Original Message-
From: "casablinca126.com" 
Sent: Friday, June 4, 2010 6:30am
To: user@cassandra.apache.org
Subject: Re: Fatal exception in with compaction

hi,
I have not used nodetool repair or nodetool compact . So how is 
MajorCompaction triggered?

--   
casablinca126.com
2010-06-04

-
发件人：casablinca126.com
发送日期：2010-06-04 18:05:11
收件人：user
抄送：
主题：Fatal exception in with compaction

 hi ,
I get a fatal exception with my cassandra cluster:

java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4
at 
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156)
at 
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:151)
at 
org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:205)
at 
org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:80)
at 
org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffManager.java:100)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.ClassNotFoundException: 
org.apache.cassandra.db.CompactionManager$4
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 7 more
I made a  modification  that , do not compact sstables whose size >10GB:
 static Set> getBuckets(Iterable 
files, long min)
{
Map, Long> buckets = new 
HashMap, Long>();
for (SSTableReader sstable : files)
{
long size = sstable.length();
if(size > 10L * 1024L * 1024L * 1024L)
continue;
Could some one help explain why this exception happened? Thanks a lot!

regards,

--
casablinca126.com
2010-06-04

Embedded usage

2010-06-04 Thread Sten Roger Sandvik

Hi.

I have looked at cassandra before and now I'm revisiting the project :-) At
the project I am working on we need a fast storage for blobs and lucene
indexes that is available on each node in the cluster. Cassandra seems to
fit very good for the blob storage and cassandra/lucandra for the indexing
part. But to satisfy the customers we need smooth integration - that means
embeddable with our own config format.

So, is it possible to embed cassandra using a programatic configuration? If
not, any plans? Anyone else thinking about embedding cassandra in an
application that is running in production?

BR,
Sten Roger Sandvik

Re: Handling disk-full scenarios

2010-06-04 Thread Ian Soboroff

Story continued, in hopes this experience is useful to someone...

I shut down the node, removed the huge file, restarted the node, and told
everybody to repair.  Two days later, AE stages are still running.

Ian

On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis  wrote:

> this is why JBOD configuration is contraindicated for cassandra.
> http://wiki.apache.org/cassandra/CassandraHardware
>
> On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff  wrote:
> > My nodes have 5 disks and are using them separately as data disks.  The
> > usage on the disks is not uniform, and one is nearly full.  Is there some
> > way to manually balance the files across the disks?  Pretty much anything
> > done via nodetool incurs an anticompaction with obviously fails.  system/
> is
> > not the problem, it's in my data's keyspace.
> >
> > Ian
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Embedded usage

2010-06-04 Thread Jonathan Ellis

look at o.a.c.service.EmbeddedCassandraService

On Fri, Jun 4, 2010 at 9:29 AM, Sten Roger Sandvik  wrote:
> Hi.
>
> I have looked at cassandra before and now I'm revisiting the project :-) At
> the project I am working on we need a fast storage for blobs and lucene
> indexes that is available on each node in the cluster. Cassandra seems to
> fit very good for the blob storage and cassandra/lucandra for the indexing
> part. But to satisfy the customers we need smooth integration - that means
> embeddable with our own config format.
>
> So, is it possible to embed cassandra using a programatic configuration? If
> not, any plans? Anyone else thinking about embedding cassandra in an
> application that is running in production?
>
> BR,
> Sten Roger Sandvik
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Embedded usage

2010-06-04 Thread Sten Roger Sandvik

2010/6/4 Jonathan Ellis 

> look at o.a.c.service.EmbeddedCassandraService
>
>
Yes. Looked at this and the CassandraDaemon. But it seems that it's not
possible to create the configuration programatically (or create the actual
config file and pass it in).

/srs

Re: Giant sets of ordered data

2010-06-04 Thread Benjamin Black

Use index rows named for time intervals that contain columns named for
the row keys of the base data rows from each interval.


b

On Wed, Jun 2, 2010 at 8:32 AM, David Boxenhorn  wrote:
> How do I handle giant sets of ordered data, e.g. by timestamps, which I want
> to access by range?
>
> I can't put all the data into a supercolumn, because it's loaded into memory
> at once, and it's too much data.
>
> Am I forced to use an order-preserving partitioner? I don't want the
> headache. Is there any other way?
>

Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope


Here's the scenario: would like R = N where N is the number of nodes. Let's say 
8.

1. Create first node, modify storage-conf.xml and change the  to be the 
ip of the node. Change replication factor to 8 for CF of interest. Start the 
puppy up.

2. Create 2nd node, modify storage-confg.xml and change  to 
true and let it know the first seed. Ensure replication factor is 8 for the CF 
of interest. Start the puppy up.

3. Create 3rd node. 

Q1: Should the node1 and node2 be listed as seeds? Or only node1?

4. Create 4th node. Same question as before.  

Q2: Same question as before ... should the list of seeds grow as the cluster 
grows? Alternative phrasing ... what is the relationship between Seed and 
AutoBootstrap, i.e. can a Seed node in fact be a node that was 
autobootstrapped? Is this considered best practice?

At this point we've got 4 nodes in the cluster ... I've gotten this far with no 
problems, loaded with tons of data and compared performance with various 
replication factors. Seeing faster reads from any particular node (as expected) 
when the ReplicationFactor is equal to the number of nodes in the cluster. Have 
compared lots of single update/creates as well as batch_mutate (which is very 
fast for bootstrapping the CFs -- highly recommended).

And also seeing varying performance on reads (fast, and as expected) when 
ReplicationFactor < N. 

Q3: What, if any issue, is there when R > N?

This is the situation as you're bringing up nodes in the cluster. And when you 
take down a node (intentionally or as a failure).

I know one consideration is that if R >= N ... and CF data grows ever bigger 
... there will be a hit as the node is created.

Q4: If you know that you're never going to have more than 40 
(MaxExpectedClusterNodes) in your cluster ... is it safe to set R >= 
MaxExpectedClusterNodes? 

Q5: If you set R = MaxExpectedClusterNodes ... and you end up servicing a node 
 and bringing up an alternate node in its place ... thus having R = N at 
all times ... and then you bring up the N+1 node ... will it start to receive 
the data that it missed while it was down?

I tried asking similar questions a few weeks ago on the IRC ... but have much 
more experience now and am trying to figure out best practices and document for 
my ops team.

-phil

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black

On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope  wrote:
>
> Here's the scenario: would like R = N where N is the number of nodes. Let's 
> say 8.
>
> 1. Create first node, modify storage-conf.xml and change the  to be 
> the ip of the node. Change replication factor to 8 for CF of interest. Start 
> the puppy up.
>

RF is per Keyspace, not per CF.

> 2. Create 2nd node, modify storage-confg.xml and change  to 
> true and let it know the first seed. Ensure replication factor is 8 for the 
> CF of interest. Start the puppy up.
>

If you do it this way be aware token automatic assignment may not do
what you want.  It _probably_ will, since 8 is a power of 2, but be
aware.

> 3. Create 3rd node.
>
> Q1: Should the node1 and node2 be listed as seeds? Or only node1?
>

Doesn't matter.  Seeds are only used as a discovery mechanism.  One is
sufficient.

> 4. Create 4th node. Same question as before.
>
> Q2: Same question as before ... should the list of seeds grow as the cluster 
> grows? Alternative phrasing ... what is the relationship between Seed and 
> AutoBootstrap, i.e. can a Seed node in fact be a node that was 
> autobootstrapped? Is this considered best practice?
>

Once a node is bootstrapped, auto or otherwise, that's it.  It is now
just another node in the cluster.  How it got that way is not
relevant.

> At this point we've got 4 nodes in the cluster ... I've gotten this far with 
> no problems, loaded with tons of data and compared performance with various 
> replication factors. Seeing faster reads from any particular node (as 
> expected) when the ReplicationFactor is equal to the number of nodes in the 
> cluster. Have compared lots of single update/creates as well as batch_mutate 
> (which is very fast for bootstrapping the CFs -- highly recommended).
>
> And also seeing varying performance on reads (fast, and as expected) when 
> ReplicationFactor < N.
>
> Q3: What, if any issue, is there when R > N?
>

Not recommended.

> This is the situation as you're bringing up nodes in the cluster. And when 
> you take down a node (intentionally or as a failure).
>
> I know one consideration is that if R >= N ... and CF data grows ever bigger 
> ... there will be a hit as the node is created.
>
> Q4: If you know that you're never going to have more than 40 
> (MaxExpectedClusterNodes) in your cluster ... is it safe to set R >= 
> MaxExpectedClusterNodes?
>

Setting it higher is not going to help you.  It is also unclear to me
how having a cluster that large with an RF that high is going to
behave.  Read repair (which happens on every call) is going to be
_brutal_.

> Q5: If you set R = MaxExpectedClusterNodes ... and you end up servicing a 
> node  and bringing up an alternate node in its place ... thus having R = 
> N at all times ... and then you bring up the N+1 node ... will it start to 
> receive the data that it missed while it was down?
>

This is the Hinted Handoff mechanism.


b

Re: Embedded usage

2010-06-04 Thread Ran Tavory

Cassandra expects a config file and does not expose an alternative API, for
this file, that's correct.
I think it's not hard to add such API but so far the demand for it didn't
exist.

On Jun 4, 2010 8:01 PM, "Sten Roger Sandvik"  wrote:

2010/6/4 Jonathan Ellis 

>
> look at o.a.c.service.EmbeddedCassandraService
>

Yes. Looked at this and the CassandraDaemon. But it seems that it's not
possible to create the configuration programatically (or create the actual
config file and pass it in).

/srs

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope

Thanks on the correction about Keyspace versus ColumnFamily ... I knew that 
just mis-typed.

I guess it should be stated (to be obvious) ... that when you are auto 
bootstrapping a node ... the seed better be alive. The scenario I'm dealing 
with is that it might not be (reasons for that are tangential). 

I am contemplating a situation where there may be 2N servers ... but only N 
online at any one time. But, for operational purposes, N+n (where n is 1 or 2), 
N may be occasionally greater than R. 

This gets to the hinted hand-off question ... if R=8 and N=8 ... and all was 
fine for awhile ... and then N8(S8)  (node 8, server 8) goes down ... N8(S9) 
replaces it ... N8(S9) will take the hit to obtain all that it never had 
before. Then ... at some subsequent time, N9(S8) comes back to life ... will it 
take over its former role and the R is now 9 even though the storage-conf had 
set it to 8 for a particular keyspace?

I'm asking these questions ... because they've been asked of me. I've been 
working with Cassandra for 3+ months now and this level of the key management 
is something that I struggle to get my head around.

What do you mean by "token automatic assignment may not do what you want"? If I 
specify R=N ... then what I want is all data to replicated to all nodes. What 
does a power of 2 have to do with this? Are there undocumented recommendations 
about cluster size to ensure that you can survive any one (or two) nodes 
failing?

Thanks in advance.

-phil

On Jun 4, 2010, at 1:46 PM, Benjamin Black wrote:

> On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope  wrote:
>> 
>> Here's the scenario: would like R = N where N is the number of nodes. Let's 
>> say 8.
>> 
>> 1. Create first node, modify storage-conf.xml and change the  to be 
>> the ip of the node. Change replication factor to 8 for CF of interest. Start 
>> the puppy up.
>> 
> 
> RF is per Keyspace, not per CF.
> 
>> 2. Create 2nd node, modify storage-confg.xml and change  to 
>> true and let it know the first seed. Ensure replication factor is 8 for the 
>> CF of interest. Start the puppy up.
>> 
> 
> If you do it this way be aware token automatic assignment may not do
> what you want.  It _probably_ will, since 8 is a power of 2, but be
> aware.
> 
>> 3. Create 3rd node.
>> 
>> Q1: Should the node1 and node2 be listed as seeds? Or only node1?
>> 
> 
> Doesn't matter.  Seeds are only used as a discovery mechanism.  One is
> sufficient.
> 
>> 4. Create 4th node. Same question as before.
>> 
>> Q2: Same question as before ... should the list of seeds grow as the cluster 
>> grows? Alternative phrasing ... what is the relationship between Seed and 
>> AutoBootstrap, i.e. can a Seed node in fact be a node that was 
>> autobootstrapped? Is this considered best practice?
>> 
> 
> Once a node is bootstrapped, auto or otherwise, that's it.  It is now
> just another node in the cluster.  How it got that way is not
> relevant.
> 
>> At this point we've got 4 nodes in the cluster ... I've gotten this far with 
>> no problems, loaded with tons of data and compared performance with various 
>> replication factors. Seeing faster reads from any particular node (as 
>> expected) when the ReplicationFactor is equal to the number of nodes in the 
>> cluster. Have compared lots of single update/creates as well as batch_mutate 
>> (which is very fast for bootstrapping the CFs -- highly recommended).
>> 
>> And also seeing varying performance on reads (fast, and as expected) when 
>> ReplicationFactor < N.
>> 
>> Q3: What, if any issue, is there when R > N?
>> 
> 
> Not recommended.
> 
>> This is the situation as you're bringing up nodes in the cluster. And when 
>> you take down a node (intentionally or as a failure).
>> 
>> I know one consideration is that if R >= N ... and CF data grows ever bigger 
>> ... there will be a hit as the node is created.
>> 
>> Q4: If you know that you're never going to have more than 40 
>> (MaxExpectedClusterNodes) in your cluster ... is it safe to set R >= 
>> MaxExpectedClusterNodes?
>> 
> 
> Setting it higher is not going to help you.  It is also unclear to me
> how having a cluster that large with an RF that high is going to
> behave.  Read repair (which happens on every call) is going to be
> _brutal_.
> 
>> Q5: If you set R = MaxExpectedClusterNodes ... and you end up servicing a 
>> node  and bringing up an alternate node in its place ... thus having R = 
>> N at all times ... and then you bring up the N+1 node ... will it start to 
>> receive the data that it missed while it was down?
>> 
> 
> This is the Hinted Handoff mechanism.
> 
> 
> b

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black

On Fri, Jun 4, 2010 at 11:04 AM, Philip Stanhope  wrote:
>
> I am contemplating a situation where there may be 2N servers ... but only N 
> online at any one time. But, for operational purposes, N+n (where n is 1 or 
> 2), N may be occasionally greater than R.
>

Then Cassandra is probably not the right choice.  I believe this is
what we explained on IRC when you asked, as well.

b

Keyspace with single CF or Keyspace per CF

2010-06-04 Thread Philip Stanhope

This is a data modeling question, not operational like my previous ones today.

I have a data model where I'm going to have some 1:1 relationship between a 
CF1/Key1/Col value and another CF2/Key where the value in CF1/Key1/Col is the 
CF2/Key.

CF1 will grow to have 1B+ keys. CF1 will have 10...N columns (N <= 1000). Those 
columns themselves will be keys in CF3.

CF2 will grow to have 400M+ keys.

CF3 will grow to have 100B+ keys ... based on key combination of CF1 and CF2 
keys.

What considerations are there to be made here regarding keeping CF1, CF2, and 
CF3 in the same keyspace, or different keyspaces?

-phil

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Philip Stanhope

I guess I'm thick ... 

What would be the right choice? Our data demands have already been proven to 
scale beyond what RDB can handle for our purposes. We are quite pleased with 
Cassandra read/write/scale out. Just trying to understand the operational 
considerations.

On Jun 4, 2010, at 2:11 PM, Benjamin Black wrote:

> On Fri, Jun 4, 2010 at 11:04 AM, Philip Stanhope  wrote:
>> 
>> I am contemplating a situation where there may be 2N servers ... but only N 
>> online at any one time. But, for operational purposes, N+n (where n is 1 or 
>> 2), N may be occasionally greater than R.
>> 
> 
> Then Cassandra is probably not the right choice.  I believe this is
> what we explained on IRC when you asked, as well.
> 
> 
> b

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Per Olesen

On Jun 4, 2010, at 5:19 PM, Ben Browning wrote:

> How many subcolumns are in each supercolumn and how large are the
> values? Your example shows 8 subcolumns, but I didn't know if that was
> the actual number. I've been able to read columns out of Cassandra at
> an order of magnitude higher than what you're seeing here but there
> are too many variables to directly compare.

There are very few columns for each SC. About 8, but it varies a bit. The 
column names and values are pretty small. around 20-30 bytes for each column, I 
guess. So, we are talking small amounts of data here.

Yes, I know there are too many variables, but I have the feeling - as you also 
write - that the performance of this simple thing should be orders of magnitude 
better. 

So, how might I go about trying to find out why this takes so long time in my 
specific setup? Can I get timings of stuff inside cassandra itself?

> Keep in mind that the results from each thrift call has to fit into
> memory - you might be better off paging through the 23000 columns,
> reading a few thousand at a time.

Yes, I know. And I might end up doing this in the end. I do though have pretty 
hard upper limits of how many rows I will end up with for each key, but anyways 
it might be a good idea none the less. Thanks for the advice on that one.

Per

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black

On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope  wrote:
> I guess I'm thick ...
>
> What would be the right choice? Our data demands have already been proven to 
> scale beyond what RDB can handle for our purposes. We are quite pleased with 
> Cassandra read/write/scale out. Just trying to understand the operational 
> considerations.
>

Cassandra supports online topology changes, but those operations are
not cheap.  If you are expecting frequent addition and removal of
nodes from a ring, things will be very unstable or slow (or both).  As
I already mentioned, having a large cluster (and 40 nodes qualifies
right now) with RF=number of nodes is going to make read and write
operations get more and more expensive as the cluster grows.  While
you might see reasonable performance at current, small scale, it will
not be the case when the cluster gets large.

I am not aware of anything like Cassandra (or any other Dynamo system)
that support such extensive replication and topology churn.  You might
have to write it.

b

Re: Embedded usage

2010-06-04 Thread Sten Roger Sandvik

2010/6/4 Ran Tavory 

> Cassandra expects a config file and does not expose an alternative API, for
> this file, that's correct.
> I think it's not hard to add such API but so far the demand for it didn't
> exist.
>
I see that making a config api is not that hard. Will probably take a stab
at it :-)

/srs

Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav

I'm trying to bring up a new node.  The log on the new node says:

INFO 19:00:47,794 Joining: getting bootstrap token
INFO 19:00:49,003 New token will be 98546264352590127933376074587615077134 to 
assume load from /xx.xxx.xx.xxx

I had expected to see messages about the beginning of anticompaction
on the host load is being assumed from, but I don't see them in that
host's log.

I've tried restarting the node with the load and then the new node,
but still nothing happens.

I've been waiting for ~15 minutes: how long should I expect to wait, and
is there anything further I can look at?

   Thanks,
   Aaron Lav (a...@pobox.com)

Re: Expected wait while bootstrapping?

2010-06-04 Thread Gary Dusbabek

Most of the streaming messages are DEBUG, so you'll have to amp up logging.

Gary.


On Fri, Jun 4, 2010 at 12:26, Aaron Lav  wrote:
> I'm trying to bring up a new node.  The log on the new node says:
>
> INFO 19:00:47,794 Joining: getting bootstrap token
> INFO 19:00:49,003 New token will be 98546264352590127933376074587615077134 to 
> assume load from /xx.xxx.xx.xxx
>
> I had expected to see messages about the beginning of anticompaction
> on the host load is being assumed from, but I don't see them in that
> host's log.
>
> I've tried restarting the node with the load and then the new node,
> but still nothing happens.
>
> I've been waiting for ~15 minutes: how long should I expect to wait, and
> is there anything further I can look at?
>
>   Thanks,
>   Aaron Lav (a...@pobox.com)
>

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Torsten Curdt

> Yes, I know. And I might end up doing this in the end. I do though have 
> pretty hard upper limits of how many rows I will end up with for each key, 
> but anyways it might be a good idea none the less. Thanks for the advice on 
> that one.

You set count to Integer.MAX. Did you try with say 3? IIRC that
makes a difference (while it shouldn't) even when you have still less
than 3.

Paging is the way to go.

cheers
--
Torsten

Re: Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav

On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote:
> Most of the streaming messages are DEBUG, so you'll have to amp up logging.

I've upped logging on the bootstrapping node, and I realize
that it's trying to assume load from two nodes.  The other node 
(ie the one not mentioned in the "to assume load from" message) does
have the Anticompacting messages I'd been expecting.

The log reads:
DEBUG 19:41:12,464 Beginning bootstrap process
DEBUG 19:41:12,479 Added /10.240.74.208/PostingData as a bootstrap source
DEBUG 19:41:12,480 Requesting from /10.240.74.208 ranges 
(26387927685172847128195341375882448808,38274023488647696852189267330197674758]
DEBUG 19:41:12,481 Requesting from /10.240.74.208 ranges 
(26387927685172847128195341375882448808,38274023488647696852189267330197674758]
DEBUG 19:41:12,483 Added /10.214.118.143/PostingData as a bootstrap source
DEBUG 19:41:12,483 Requesting from /10.214.118.143 ranges 
(38274023488647696852189267330197674758,98546264352590127933376074587615077134]
DEBUG 19:41:12,483 Requesting from /10.214.118.143 ranges 
(38274023488647696852189267330197674758,98546264352590127933376074587615077134]
DEBUG 19:41:12,484 attempting to connect to /10.240.74.208
DEBUG 19:41:12,505 attempting to connect to /10.214.118.143

   Thanks,
   Aaron Lav (a...@pobox.com)

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Mike Malone

> > Yes, I know. And I might end up doing this in the end. I do though have
> pretty hard upper limits of how many rows I will end up with for each key,
> but anyways it might be a good idea none the less. Thanks for the advice on
> that one.
>
> You set count to Integer.MAX. Did you try with say 3? IIRC that
> makes a difference (while it shouldn't) even when you have still less
> than 3.
>

Er, really? Just off hand, I feel like I've looked through most of the code
that would be relevant and I can't think of any reason that would be the
case. If it is, that definitely seems like a bug, particularly since the
general strategy for fetching "all the things in this row" is to set count
to Integer.MAX_VALUE!

Mike

Re: Column or SuperColumn

2010-06-04 Thread Jonathan Ellis

if you have a relatively small, static set of subcolumns, that you
read as a group, then using supercolumns is reasonable

On Tue, Jun 1, 2010 at 7:33 PM, Peter Hsu  wrote:
> I have a pretty simple data modeling question.  I don't know whether or not 
> to use a CF or SCF in one instance.
>
> Here's my example.  I have an Store entry and locations for each store.  So I 
> have something like:
>
> Using CF:
> Store { //CF
>   storeId { //row key
>      storeName:str,
>      storeLogo:image
>   }
>   storeId:locationId1 {
>      locationName:str,
>      latLong:coordinate
>   }
>   storeId:locationId2 {
>      locationName:str,
>      latLong:coordinate
>   }
> }
>
> Using SCF:
> Store { //SCF
>   storeId { //row key
>      store {
>          storeName:str,
>          storeLogo:image
>      }
>      locationId1 {
>          locationName:str,
>          latLong:coordinate
>      }
>      locationId2 {
>          locationName:str,
>          latLong:coordinate
>      }
>   }
> }
>
> Queries:
>
> Reads:
>  1. Read store and all locations (could be done by range query efficiently 
> when using CF, since I'm using OPP)
>  2. Read only a particular location of a store (don't need the store meta 
> data here)
>  3. Read only store name info (don't need any location info here)
>
> Writes:
>  1. Update store meta data (without touching location info)
>  2. Update location data for a store (without touching rest of store data)
>  3. Add a new location to an existing store (would have a unique identifier 
> for location, no worries about having to do a read..)
>
> I read that SuperColumns are not as fast as Columns, and obviously you can't 
> have indexed subcolumns of supercolumns, but in this case I don't need the 
> subsubcolumn indices.  It seems cleaner to model it as a SuperColumn, but why 
> would I want to pay a performance penalty instead of just concating my keys.
>
> This seems like a fairly common pattern?  What's the rule to decide between 
> CF and SCF?
>
> Thanks,
> Peter



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Row Time range

2010-06-04 Thread Nicholas Sun

Is there a mechanism to select a time range within a row range query?  Is
this planned?  For example, return to me the last 10 post starting at 7:00pm
yesterday?

 

Nick

strange load balancing across three nodes

2010-06-04 Thread Mike Subelsky

Hello everyone,

One of my nodes has a much higher load (10x) than the other two nodes.
 I don't think it's because a few keys have a lot more columns than
others -- the keys are well distributed and I'm using the random
partitioner.

Could someone point me in the direction of what should I be checking
for in a situation like this?  I don't totally understand what the
range value represents below but I feel like that be must a clue.  I
know I can run loadbalance here but I want to address the underlying
problem that caused the imbalance in the first place.

Address   Status Load  Range
   Ring

152603206199102353627433717890579536149
10.198.7.47   Up 95.8 MB
2178153229901630557148545713417876599  |<--|
10.210.239.191Up 1.09 GB
129781355096377133235068186455049349192|   |
10.193.210.192Up 125.44 MB
152603206199102353627433717890579536149|-->|

thanks!

-Mike

-- 
Mike Subelsky
oib.com // ignitebaltimore.com // subelsky.com
@subelsky

Conditional get

2010-06-04 Thread Lev Stesin

Hi,

I am not sure how to implement multiget or slice_range based on a
conditional predicate. For example what if I want to get only keys
with containing certain columns. Thanks.

-- 
Lev

Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-04 Thread Arya Goudarzi


Hi Fellows, 

I have the following design for a system which holds basically key->value pairs 
(aka Columns) for each user (SuperColumn Key) in different namespaces 
(SuperColumnFamily row key). 

Like this: 

Namesapce->user->column_name = column_value; 

keyspaces: 
- name: NKVP 
replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy 
replication_factor: 3 
column_families: 
- name: Namespaces 
column_type: Super 
compare_with: BytesType 
compare_subcolumns_with: BytesType 
rows_cached: 2 
keys_cached: 100 

Cluster using random partitioner. 

I use multiget_slice() for fetching 1 or many columns inside the child 
supercolumn at the same time. This is an awkward performance result I get: 

100 sequential reads completed in : 0.383 this uses multiget_slice() with 1 
key, and 1 column name inside the predicate->column_names 
100 batch loaded completed in : 0.786 this uses multiget_slice() with 1 key, 
and multiple column names inside the predicate->column_names 

read/write consistency are ONE. 

Questions: 

Why doing 100 sequential reads is faster than doing 100 in batch? 
Is this a good design for my problem? 
Does my issue relate to https://issues.apache.org/jira/browse/CASSANDRA-598? 

Now on a single node with replication factor 1 I get this: 

100 sequential reads completed in : 0.438 
100 batch loaded completed in : 0.800 

Please advice as to why is this happening? 

These nodes are VMs. 1 CPU and 1 Gb. 

Best Regards, 
=Arya

Re: Expected wait while bootstrapping?

2010-06-04 Thread Aaron Lav

On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote:
> Most of the streaming messages are DEBUG, so you'll have to amp up logging.

I upped the logging to DEBUG on the bootstrapping node and the
nodes being bootstrapped from, and the bootstrap completed fine, so I'm
not sure what was going on earlier, but I'm happy now.

Thanks,
Aaron Lav (a...@pobox.com)

Re: Fatal exception in with compaction

2010-06-04 Thread Jonathan Ellis

as the stacktrace suggests, HintedHandoffManager does major
compactions of just the hints columnfamily after hint delivery

2010/6/4 casablinca126.com :
> hi,
>I have not used nodetool repair or nodetool compact . So how is 
> MajorCompaction triggered?
>
> --
> casablinca126.com
> 2010-06-04
>
> -
> 发件人：casablinca126.com
> 发送日期：2010-06-04 18:05:11
> 收件人：user
> 抄送：
> 主题：Fatal exception in with compaction
>
>  hi ,
>I get a fatal exception with my cassandra cluster:
>
> java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4
>at 
> org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156)
>at 
> org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:151)
>at 
> org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:205)
>at 
> org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:80)
>at 
> org.apache.cassandra.db.HintedHandOffManager$1.runMayThrow(HintedHandOffManager.java:100)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.cassandra.db.CompactionManager$4
>at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>... 7 more
>I made a  modification  that , do not compact sstables whose size 
> >10GB:
> static Set> getBuckets(Iterable 
> files, long min)
>{
>Map, Long> buckets = new 
> HashMap, Long>();
>for (SSTableReader sstable : files)
>{
>long size = sstable.length();
>if(size > 10L * 1024L * 1024L * 1024L)
>continue;
> Could some one help explain why this exception happened? Thanks a lot!
>
> regards,
>
> --
> casablinca126.com
> 2010-06-04
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: strange load balancing across three nodes

2010-06-04 Thread Jonathan Ellis

The sections on ring management and token selection on
http://wiki.apache.org/cassandra/Operations will help.

On Fri, Jun 4, 2010 at 2:27 PM, Mike Subelsky  wrote:
> Hello everyone,
>
> One of my nodes has a much higher load (10x) than the other two nodes.
>  I don't think it's because a few keys have a lot more columns than
> others -- the keys are well distributed and I'm using the random
> partitioner.
>
> Could someone point me in the direction of what should I be checking
> for in a situation like this?  I don't totally understand what the
> range value represents below but I feel like that be must a clue.  I
> know I can run loadbalance here but I want to address the underlying
> problem that caused the imbalance in the first place.
>
> Address       Status     Load          Range
>           Ring
>
> 152603206199102353627433717890579536149
> 10.198.7.47   Up         95.8 MB
> 2178153229901630557148545713417876599      |<--|
> 10.210.239.191Up         1.09 GB
> 129781355096377133235068186455049349192    |   |
> 10.193.210.192Up         125.44 MB
> 152603206199102353627433717890579536149    |-->|
>
> thanks!
>
> -Mike
>
> --
> Mike Subelsky
> oib.com // ignitebaltimore.com // subelsky.com
> @subelsky
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Row Time range

2010-06-04 Thread Benjamin Black

That's entirely up to you.  If you make row keys that are time ordered
and include the time as a prefix in the key, you just use get_range()
as usual, start now, end 7pm yesterday, count of 10.

On Fri, Jun 4, 2010 at 2:23 PM, Nicholas Sun  wrote:
> Is there a mechanism to select a time range within a row range query?  Is
> this planned?  For example, return to me the last 10 post starting at 7:00pm
> yesterday?
>
>
>
> Nick

Performance Characteristics of CASSANDRA-16 (Memory Efficient Compactions)

2010-06-04 Thread Jeremy Davis

https://issues.apache.org/jira/browse/CASSANDRA-16

Can someone (Jonathan?)  help me understand the performance characteristics
of this patch?
Specifically: If I have an open ended CF, and I keep inserting with ever
increasing column names (for example current Time), will things generally
work out ok performance wise? Or will I pay some ever increasing penalty
with the number of entries?

My assumption is that you have bucketed things up for me by column name
order, and as long as I don't delete/modify/create a column in one of the
old buckets, then things will work out ok. Or is this not at all what is
going on?

Thanks,
-JD

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Jonathan Shook

If I may ask, why the need for frequent topology changes?


On Fri, Jun 4, 2010 at 1:21 PM, Benjamin Black  wrote:
> On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope  wrote:
>> I guess I'm thick ...
>>
>> What would be the right choice? Our data demands have already been proven to 
>> scale beyond what RDB can handle for our purposes. We are quite pleased with 
>> Cassandra read/write/scale out. Just trying to understand the operational 
>> considerations.
>>
>
> Cassandra supports online topology changes, but those operations are
> not cheap.  If you are expecting frequent addition and removal of
> nodes from a ring, things will be very unstable or slow (or both).  As
> I already mentioned, having a large cluster (and 40 nodes qualifies
> right now) with RF=number of nodes is going to make read and write
> operations get more and more expensive as the cluster grows.  While
> you might see reasonable performance at current, small scale, it will
> not be the case when the cluster gets large.
>
> I am not aware of anything like Cassandra (or any other Dynamo system)
> that support such extensive replication and topology churn.  You might
> have to write it.
>
>
> b
>

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread pstanhope

I never said it would be frequent. That was an assumption made by Ben. 

I am trying to understand how to "set the dials" to ensure availability and 
durability ... And understand the cost when the inevitable hardware failure 
occurs. 


Sent via BlackBerry from T-Mobile

-Original Message-
From: Jonathan Shook 
Date: Fri, 4 Jun 2010 20:28:33 
To: 
Subject: Re: Seeds, autobootstrap nodes, and replication factor

If I may ask, why the need for frequent topology changes?


On Fri, Jun 4, 2010 at 1:21 PM, Benjamin Black  wrote:
> On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope  wrote:
>> I guess I'm thick ...
>>
>> What would be the right choice? Our data demands have already been proven to 
>> scale beyond what RDB can handle for our purposes. We are quite pleased with 
>> Cassandra read/write/scale out. Just trying to understand the operational 
>> considerations.
>>
>
> Cassandra supports online topology changes, but those operations are
> not cheap.  If you are expecting frequent addition and removal of
> nodes from a ring, things will be very unstable or slow (or both).  As
> I already mentioned, having a large cluster (and 40 nodes qualifies
> right now) with RF=number of nodes is going to make read and write
> operations get more and more expensive as the cluster grows.  While
> you might see reasonable performance at current, small scale, it will
> not be the case when the cluster gets large.
>
> I am not aware of anything like Cassandra (or any other Dynamo system)
> that support such extensive replication and topology churn.  You might
> have to write it.
>
>
> b
>

Re: Seeds, autobootstrap nodes, and replication factor

2010-06-04 Thread Benjamin Black

On Fri, Jun 4, 2010 at 6:51 PM,   wrote:
> I never said it would be frequent. That was an assumption made by Ben.
>

You indicated in an earlier email that you expected half the nodes to
be offline at any time.  It is unclear how you expected that to work
for either the consistency processes or the topology/placement
processes.

> I am trying to understand how to "set the dials" to ensure availability and 
> durability ... And understand the cost when the inevitable hardware failure 
> occurs.
>

You can expect that with RF=N, your cluster will get slower the larger
it gets.  Add in having half the nodes offline, and you can expect it
simply not to function most of the time.  Perhaps if you started from
what your availability and durability requirements are the
conversation would be simpler.

b

Re: Is there any way to detect when a node is down so I can failover more effectively?

2010-06-04 Thread Patricio Echagüe

Thanks Johathan

On Wed, Jun 2, 2010 at 11:17 PM, Jonathan Ellis  wrote:

> you're overcomplicating things.
>
> just connect to *a* node, and if it happens to be down, try a different
> one.
>
> nodes being down should be a rare event, not a normal condition.  no
> need to optimize for it so much.
>
> also see http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>
> 2010/6/1 Patricio Echagüe :
> > Hi all, I'm using Hector framework to interact with Cassandra and at
> trying
> > to handle failover more effectively I found it a bit complicated to fetch
> > all cassandra nodes that are up and running.
> >
> > My goal is to keep an up-to-date list of active/up Cassandra servers to
> > provide HEctor every time I need to execute against the db.
> >
> > I've seen this Thrift  method: get_string_property("token map") but it
> > returns the nodes in the ring no matter is the node is down.
> >
> >
> >
> > Any advice?
> >
> > --
> > Patricio.-
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
Patricio.-

Can't start up Cassnadra service.

2010-06-04 Thread Ma Xiao

Cassdra can't start  it's service with following error, what's wrong with it?

ERROR 14:28:22,631 Exception encountered during startup.
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1937)
at 
org.apache.cassandra.dht.RandomPartitioner.convertFromDiskFormat(RandomPartitioner.java:50)
at 
org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248)
at org.apache.cassandra.db.Table.(Table.java:338)
at org.apache.cassandra.db.Table.open(Table.java:199)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177)
Exception encountered during startup.
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1937)
at 
org.apache.cassandra.dht.RandomPartitioner.convertFromDiskFormat(RandomPartitioner.java:50)
at 
org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248)
at org.apache.cassandra.db.Table.(Table.java:338)
at org.apache.cassandra.db.Table.open(Table.java:199)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177)

57 matches

Mail list logo