As written in the third point of
http://wiki.apache.org/cassandra/CassandraLimitations,
right now, super columns are not indexed and deserialized fully when you access
them. Another way to put it is, you'll want to user super columns with
only a relatively
small number of columns in them.
Because i
Jonathan Ellis gmail.com> writes:
>
> This will be Riptano's 6th training session (including the four we've
> done that were on-site with a specific customer), and in my humble
> opinion the material's really solid at this point.
>
> We are actively working on lining up other locations.
>
Do y
I notice that: there are more than 100 "CLOSE_WAIT" incomming connections
on storage port 7000
In my two cassandra node:
126 of 146 storage connections is "CLOSE_WAIT"
196 of 217 storage connections is "CLOSE_WAIT"
Is it normal?
--
From: "Chris
It's really a pain to modify the data model, the problem is how to
handle "one-to-many" relation in cassandra? The limitation of the row
size will lead to impossible to store them with columns.
On Fri, Jun 4, 2010 at 4:13 PM, Sylvain Lebresne wrote:
> As written in the third point of
> http://w
hi ,
I get a fatal exception with my cassandra cluster:
java.lang.NoClassDefFoundErrororg/apache/cassandra/db/CompactionManager$4
at
org.apache.cassandra.db.CompactionManager.submitMajor(CompactionManager.java:156)
at
org.apache.cassandra.db.Compa
I do the Thread Dump on each cassandra node, and count the thread with call
stack string "at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:62)atorg.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.jav
a:66)" in "thread-xxx"
then I find a
hi,
I have not used nodetool repair or nodetool compact . So how is
MajorCompaction triggered?
--
casablinca126.com
2010-06-04
-
发件人:casablinca126.com
发送日期:2010-06-04 18:05:11
收件人:u
Does the following code in IncomingStreamReader.read cause 100% CPU???
while (bytesRead < pendingFile.getExpectedBytes()) {
bytesRead += fc.transferFrom(socketChannel, bytesRead,
FileStreamTask.CHUNK_SIZE);
pendingFile.update(bytesRead);
}
BTW: our cas
Are 6..8 seconds to read 23.000 small rows - as it should be?
I have a quick question on what I think is bad read performance for this simple
setup:
SCF:Dashboard
key:username1 -> {
SC:uniqStr1 -> { col1:val1, col2: val2, ... col8:val8 },
SC:uniqStr2 -> { col1:val1, col2: v
Nice!
Would it be possible to give more than 2 weeks notice for the following
events? Preferrably a month, its not that easy to get off work etc.
On Fri, Jun 4, 2010 at 4:22 AM, Oleg Anastasjev wrote:
> Jonathan Ellis gmail.com> writes:
>
> >
> > This will be Riptano's 6th training session (in
Tx Nahor/Gary/Ben. I blew everything away, adjusted the seed to Cassandra-ca
and changed the replication factor.
Now I can see the sample keys being spread across the 3 nodes:
Cassandra-ca holds rows (keys: 1,5,8)
Cassandra-az holds rows (keys: 11,55)
Cassandra-or hodls rows (keys: 88)
I don't se
Chris,
Can you get me a stack dump of one of the busy nodes (kill -3)?
Gary
On Thu, Jun 3, 2010 at 22:50, Chris Goffinet wrote:
> We're seeing this as well. We were testing with a 40+ node cluster on the
> latest 0.6 branch from few days ago.
>
> -Chris
>
> On Jun 3, 2010, at 9:55 PM, Lu Ming
Do you have details on this ..?
From: S Ahmed [mailto:sahmed1...@gmail.com]
Sent: Friday, June 04, 2010 9:50 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra training Jun 18 in SF
Nice!
Would it be possible to give more than 2 weeks notice for the following events?
Preferrably a month, i
get_slice reads a single row. do you mean there are 23,000 columns,
or are you running get_slice in a loop 23000 times?
On Fri, Jun 4, 2010 at 4:59 AM, Per Olesen wrote:
> Are 6..8 seconds to read 23.000 small rows - as it should be?
>
> I have a quick question on what I think is bad read perfor
On Jun 4, 2010, at 4:46 PM, Jonathan Ellis wrote:
> get_slice reads a single row. do you mean there are 23,000 columns,
> or are you running get_slice in a loop 23000 times?
Hi Jonathan, thanks for answering!
No, I do only one get_slice call.
There are 23.000 SUPER columns, which I read using
Anybody looked at VoltDB? I haven't dug into it, but curious about it.
dwh
It works for Random Partitioner only if you want to get all keys.
2010/6/4 Shuai Yuan
> It's documented that get_range_slice() supports all partitioner in 0.6
>
> Kevin
>
> 原始信件
> 发件人: Olivier Mallassi
> 收件人: user@cassandra.apache.org
> 主题: [***SPAM*** ] Re: question about cl
I saw a tweet about claiming far better performance to Cassandra. After
following up, I found out it requires the entire DB to reside in memory across
the nodes.
Nick Jones
From: Denis Haskin [mailto:de...@haskinferguson.net]
Sent: Friday, June 04, 2010 10:17 AM
To: user
Subject: http://voltdb
How many subcolumns are in each supercolumn and how large are the
values? Your example shows 8 subcolumns, but I didn't know if that was
the actual number. I've been able to read columns out of Cassandra at
an order of magnitude higher than what you're seeing here but there
are too many variables t
Great. It looks as if the replication factor is still 1, not 3. This
means that each key lives only on one node. By increasing it to 3,
data will be replicated across all 3 nodes.
Gary.
On Fri, Jun 4, 2010 at 06:51, Stephan Pfammatter
wrote:
> Tx Nahor/Gary/Ben. I blew everything away, adjus
Nice catch Gary. Just realized that replication factor is granular to key
space.
-Original Message-
From: Gary Dusbabek [mailto:gdusba...@gmail.com]
Sent: Friday, June 04, 2010 11:49 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra Cluster Setup
Great. It looks as if the replic
A "major" compaction is any compaction that sees all of the sstables for a
column family. In the context of the method you edited, that means that all of
the SSTables fall into a single bucket, and can be compacted together.
-Original Message-
From: "casablinca126.com"
Sent: Friday, Jun
Hi.
I have looked at cassandra before and now I'm revisiting the project :-) At
the project I am working on we need a fast storage for blobs and lucene
indexes that is available on each node in the cluster. Cassandra seems to
fit very good for the blob storage and cassandra/lucandra for the indexi
Story continued, in hopes this experience is useful to someone...
I shut down the node, removed the huge file, restarted the node, and told
everybody to repair. Two days later, AE stages are still running.
Ian
On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis wrote:
> this is why JBOD configurat
look at o.a.c.service.EmbeddedCassandraService
On Fri, Jun 4, 2010 at 9:29 AM, Sten Roger Sandvik wrote:
> Hi.
>
> I have looked at cassandra before and now I'm revisiting the project :-) At
> the project I am working on we need a fast storage for blobs and lucene
> indexes that is available on e
2010/6/4 Jonathan Ellis
> look at o.a.c.service.EmbeddedCassandraService
>
>
Yes. Looked at this and the CassandraDaemon. But it seems that it's not
possible to create the configuration programatically (or create the actual
config file and pass it in).
/srs
Use index rows named for time intervals that contain columns named for
the row keys of the base data rows from each interval.
b
On Wed, Jun 2, 2010 at 8:32 AM, David Boxenhorn wrote:
> How do I handle giant sets of ordered data, e.g. by timestamps, which I want
> to access by range?
>
> I can't
Here's the scenario: would like R = N where N is the number of nodes. Let's say
8.
1. Create first node, modify storage-conf.xml and change the to be the
ip of the node. Change replication factor to 8 for CF of interest. Start the
puppy up.
2. Create 2nd node, modify storage-confg.xml and ch
On Fri, Jun 4, 2010 at 10:36 AM, Philip Stanhope wrote:
>
> Here's the scenario: would like R = N where N is the number of nodes. Let's
> say 8.
>
> 1. Create first node, modify storage-conf.xml and change the to be
> the ip of the node. Change replication factor to 8 for CF of interest. Start
Cassandra expects a config file and does not expose an alternative API, for
this file, that's correct.
I think it's not hard to add such API but so far the demand for it didn't
exist.
On Jun 4, 2010 8:01 PM, "Sten Roger Sandvik" wrote:
2010/6/4 Jonathan Ellis
>
> look at o.a.c.service.Embedd
Thanks on the correction about Keyspace versus ColumnFamily ... I knew that
just mis-typed.
I guess it should be stated (to be obvious) ... that when you are auto
bootstrapping a node ... the seed better be alive. The scenario I'm dealing
with is that it might not be (reasons for that are tange
On Fri, Jun 4, 2010 at 11:04 AM, Philip Stanhope wrote:
>
> I am contemplating a situation where there may be 2N servers ... but only N
> online at any one time. But, for operational purposes, N+n (where n is 1 or
> 2), N may be occasionally greater than R.
>
Then Cassandra is probably not the
This is a data modeling question, not operational like my previous ones today.
I have a data model where I'm going to have some 1:1 relationship between a
CF1/Key1/Col value and another CF2/Key where the value in CF1/Key1/Col is the
CF2/Key.
CF1 will grow to have 1B+ keys. CF1 will have 10...N
I guess I'm thick ...
What would be the right choice? Our data demands have already been proven to
scale beyond what RDB can handle for our purposes. We are quite pleased with
Cassandra read/write/scale out. Just trying to understand the operational
considerations.
On Jun 4, 2010, at 2:11 PM,
On Jun 4, 2010, at 5:19 PM, Ben Browning wrote:
> How many subcolumns are in each supercolumn and how large are the
> values? Your example shows 8 subcolumns, but I didn't know if that was
> the actual number. I've been able to read columns out of Cassandra at
> an order of magnitude higher than
On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope wrote:
> I guess I'm thick ...
>
> What would be the right choice? Our data demands have already been proven to
> scale beyond what RDB can handle for our purposes. We are quite pleased with
> Cassandra read/write/scale out. Just trying to underst
2010/6/4 Ran Tavory
> Cassandra expects a config file and does not expose an alternative API, for
> this file, that's correct.
> I think it's not hard to add such API but so far the demand for it didn't
> exist.
>
I see that making a config api is not that hard. Will probably take a stab
at it :-
I'm trying to bring up a new node. The log on the new node says:
INFO 19:00:47,794 Joining: getting bootstrap token
INFO 19:00:49,003 New token will be 98546264352590127933376074587615077134 to
assume load from /xx.xxx.xx.xxx
I had expected to see messages about the beginning of anticompaction
Most of the streaming messages are DEBUG, so you'll have to amp up logging.
Gary.
On Fri, Jun 4, 2010 at 12:26, Aaron Lav wrote:
> I'm trying to bring up a new node. The log on the new node says:
>
> INFO 19:00:47,794 Joining: getting bootstrap token
> INFO 19:00:49,003 New token will be 98546
> Yes, I know. And I might end up doing this in the end. I do though have
> pretty hard upper limits of how many rows I will end up with for each key,
> but anyways it might be a good idea none the less. Thanks for the advice on
> that one.
You set count to Integer.MAX. Did you try with say 300
On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote:
> Most of the streaming messages are DEBUG, so you'll have to amp up logging.
I've upped logging on the bootstrapping node, and I realize
that it's trying to assume load from two nodes. The other node
(ie the one not mentioned in the
> > Yes, I know. And I might end up doing this in the end. I do though have
> pretty hard upper limits of how many rows I will end up with for each key,
> but anyways it might be a good idea none the less. Thanks for the advice on
> that one.
>
> You set count to Integer.MAX. Did you try with say 3
if you have a relatively small, static set of subcolumns, that you
read as a group, then using supercolumns is reasonable
On Tue, Jun 1, 2010 at 7:33 PM, Peter Hsu wrote:
> I have a pretty simple data modeling question. I don't know whether or not
> to use a CF or SCF in one instance.
>
> Here'
Is there a mechanism to select a time range within a row range query? Is
this planned? For example, return to me the last 10 post starting at 7:00pm
yesterday?
Nick
Hello everyone,
One of my nodes has a much higher load (10x) than the other two nodes.
I don't think it's because a few keys have a lot more columns than
others -- the keys are well distributed and I'm using the random
partitioner.
Could someone point me in the direction of what should I be chec
Hi,
I am not sure how to implement multiget or slice_range based on a
conditional predicate. For example what if I want to get only keys
with containing certain columns. Thanks.
--
Lev
Hi Fellows,
I have the following design for a system which holds basically key->value pairs
(aka Columns) for each user (SuperColumn Key) in different namespaces
(SuperColumnFamily row key).
Like this:
Namesapce->user->column_name = column_value;
keyspaces:
- name: NKVP
replica_placeme
On Fri, Jun 04, 2010 at 12:35:51PM -0700, Gary Dusbabek wrote:
> Most of the streaming messages are DEBUG, so you'll have to amp up logging.
I upped the logging to DEBUG on the bootstrapping node and the
nodes being bootstrapped from, and the bootstrap completed fine, so I'm
not sure what was goin
as the stacktrace suggests, HintedHandoffManager does major
compactions of just the hints columnfamily after hint delivery
2010/6/4 casablinca126.com :
> hi,
>I have not used nodetool repair or nodetool compact . So how is
> MajorCompaction triggered?
>
> --
> casablinca12
The sections on ring management and token selection on
http://wiki.apache.org/cassandra/Operations will help.
On Fri, Jun 4, 2010 at 2:27 PM, Mike Subelsky wrote:
> Hello everyone,
>
> One of my nodes has a much higher load (10x) than the other two nodes.
> I don't think it's because a few keys
That's entirely up to you. If you make row keys that are time ordered
and include the time as a prefix in the key, you just use get_range()
as usual, start now, end 7pm yesterday, count of 10.
On Fri, Jun 4, 2010 at 2:23 PM, Nicholas Sun wrote:
> Is there a mechanism to select a time range withi
https://issues.apache.org/jira/browse/CASSANDRA-16
Can someone (Jonathan?) help me understand the performance characteristics
of this patch?
Specifically: If I have an open ended CF, and I keep inserting with ever
increasing column names (for example current Time), will things generally
work out
If I may ask, why the need for frequent topology changes?
On Fri, Jun 4, 2010 at 1:21 PM, Benjamin Black wrote:
> On Fri, Jun 4, 2010 at 11:14 AM, Philip Stanhope wrote:
>> I guess I'm thick ...
>>
>> What would be the right choice? Our data demands have already been proven to
>> scale beyond
I never said it would be frequent. That was an assumption made by Ben.
I am trying to understand how to "set the dials" to ensure availability and
durability ... And understand the cost when the inevitable hardware failure
occurs.
Sent via BlackBerry from T-Mobile
-Original Message-
On Fri, Jun 4, 2010 at 6:51 PM, wrote:
> I never said it would be frequent. That was an assumption made by Ben.
>
You indicated in an earlier email that you expected half the nodes to
be offline at any time. It is unclear how you expected that to work
for either the consistency processes or the
Thanks Johathan
On Wed, Jun 2, 2010 at 11:17 PM, Jonathan Ellis wrote:
> you're overcomplicating things.
>
> just connect to *a* node, and if it happens to be down, try a different
> one.
>
> nodes being down should be a rare event, not a normal condition. no
> need to optimize for it so much.
Cassdra can't start it's service with following error, what's wrong with it?
ERROR 14:28:22,631 Exception encountered during startup.
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1937)
at
org.apache.cassandra.d
57 matches
Mail list logo