I have a basic understanding of how Cassandra handles the file system (flushes
in Memtables out to SSTables, SSTables get compacted) and I understand that old
files are only deleted when a node is restarted, when Java does a GC, or when
Cassandra feels like it is running out of space.
My quest
So, in summary, there is no way to predictably and efficiently tell Cassandra
to get rid of all of the extra space it is using on disk?
- Original Message -
From: "Jeffrey Kesselman"
To: user@cassandra.apache.org
Sent: Thursday, May 26, 2011 8:57:49 PM
Subject: Re: Forcing Cassandra to f
What is the ConsitencyLevel of your reads? A ConsistencyLevel.ONE remove
returns when it has deleted the record from at least 1 replica (and any other
ones will be deleted when they can). It could be the case that you are deleting
the record off of one node and then reading it off of the other o
Did you set the token values for you nodes? I remember having similar symptoms
when I had a token conflict.
- Original Message -
From: "David McNelis"
To: user@cassandra.apache.org
Sent: Friday, June 3, 2011 5:06:10 PM
Subject: Re: Setting up cluster and nodetool ring in 0.8.0
Edwa
You could try to roll your own. I managed to create a custom 0.8 RPM using the
spec file from the redhat directory. First check out the source. Then edit the
spec file with the following changes:
Set the Version and Release variables appropriately.
At the end of %install, add the following 2 li
The second error (the CQL select) is because you have different Key Validation
Class values for your two user columns. users is
org.apache.cassandra.db.marshal.BytesType, while users2 is
org.apache.cassandra.db.marshal.UTF8Type. The select is failing because you are
comparing a String to a bu
As I understand, it has to do with a node being up but missing the delete
message (remember, if you apply the delete at CL.QUORUM, you can have almost
half the replicas miss it and still succeed). Imagine that you have 3 nodes A,
B, and C, each of which has a column 'foo' with a value 'bar'. The
Do you mean that it is using all of the available heap? That is the expected
behavior of most long running Java applications. The JVM will not GC until it
needs memory (or you explicitly ask it to) and will only free up a bit of
memory at a time. That is very good behavior from a performance sta
A ColumnPath can contain a super column, so you should be fine inserting a
super column family (in fact I do that). Quoting cassandra.thrift:
struct ColumnPath {
3: required string column_family,
4: optional binary super_column,
5: optional binary column,
}
- Original Message ---
"2. Trying to reduce disk occupation I deleted CF which used 90% of available
space. After issuing a "drop column family User;" command
no *User*.db files were deleted. "nodetool compact" haven't helped too. How
can that deletion be triggered?"
You have to wait for a garbage collect (or do a roll
In the Cassandra CLI tutorial(http://wiki.apache.org/cassandra/CassandraCli),
there is an example of creating a secondary index.
Konstantin
- Original Message -
From: "CASSANDRA learner"
To: user@cassandra.apache.org
Sent: Wednesday, July 20, 2011 9:47:28 AM
Subject: best example of ind
As mentioned, there is an init.d script in the RPM package to start and stop
Cassandra (it is what we use). If you do not use the RPM and don't want to or
cannot install the full package, you can get just the script at:
https://svn.apache.org/repos/asf/cassandra/trunk/redhat/cassandra
- Ori
I believe that what would happen is that whichever data center has the later
clock will win. Every modification you make gets a time stamp (generally set by
your client to the current time, if you are using one). I believe that whatever
modification happened with the last time stamp is canonical
I have had similar issues when I generated Cassandra for Erlang. It seems that
Thrift 0.6.1 (the latest stable version) does not work with Cassandra. Using
Thrift 0.7 does.
I had issues where it would give me run time errors when trying to send an
insert (it would not serialize correctly).
---
t API in C
- Original Message -
> From: Konstantin Naryshkin
> To: user@cassandra.apache.org
> Cc:
> Sent: Thursday, August 4, 2011 10:36 AM
> Subject: Re: Problems using Thrift API in C
>
> I have had similar issues when I generated Cassandra for Erlang. It seems
> that
When I build cassandra, I use:
#ant
#ant release
It does produce a working cassandra.jar, though I am not sure if it will
fulfill your needs since I make mine to create an RPM out of it.
- Original Message -
From: "Norman Maurer"
To: user@cassandra.apache.org
Sent: Monday, August 8, 201
Would you consider adding an RSS feed to the site for the benefit of those who
like to use feed readers to keep track of unread posts and what not?
- Original Message -
From: "Lynn Bender"
To: user@cassandra.apache.org
Sent: Friday, August 12, 2011 2:18:45 PM
Subject: Planet Cassandra is
Thanks. I did not see a link to it when I was sending my message.
- Original Message -
From: "Zhu Han"
To: user@cassandra.apache.org
Sent: Saturday, August 13, 2011 12:11:37 AM
Subject: Re: Planet Cassandra is now live
On Sat, Aug 13, 2011 at 4:35 AM, Konstantin
1. The 100 row limit is for listing (i.e. how many rows that the list command
will print). You can give list another limit:
list User limit 1000;
This limit has nothing to do with any internal Cassandra limitation. I am not
aware of any limitation on the number of rows that you can have.
2. I be
Why are you keeping all your indexes in the same row? We do a similar thing
(maintain several indexes over the same data) and we just have an index column
family with keys like "dest192.168.0.1" which means destination index of
192.168.0.1. You can do rows like User_Keys_By_Last_Name_adams and
x by range query starting with "adams_".
Am I right?
I want to know what's the cost difference of rang query and slice query?
If I can use either composite key or composite column name, which one gives me
less query cost?
2011/8/25 Konstantin Naryshkin < konstant...@a-bb.net
Yeah, I believe that Yan has a type in his post. A CF is no read in one go, a
row is. As for the scalability of having all the columns being read at once, I
do not believe that it was ever meant to be. All the columns in a row are
stored together, on the same set of machines. This means that if
I think that Oleg may have misunderstood how replicas are selected. If you have
3 nodes in your cluster and a RF of 2, Cassandra first selects what two nodes,
out of the 3 will get data, then, and only then does it write it out. The
selection is based on the row key, the token of the node, and y
The ring wraps around, so the value before 0 is the max possible token. I
believe that it is 2**127 -1 .
- Original Message -
From: "Kyle Gibson"
To: user@cassandra.apache.org
Sent: Monday, September 12, 2011 3:30:20 PM
Subject: Re: Replace Live Node
What could you do if the initial_tok
Wait, his nodes are going SC, SC, AT, AT. Shouldn't they go SC, AT, SC, AT? By
which I mean that if he adds another node to the ring (or lowers the
replication factor), he will have a node that is under-utilized. The rings in
his data centers have the tokens:
SC: 0, 1
AT: 85070591730234615865843
I believe that minor compactions work on tables of same or similar size, so as
long as your tables do not fall within a small range of each other in terms of
size, Cassandra does not see an opportunity to run a minor compaction.
- Original Message -
From: "myreasoner"
To: cassandra-u...
One thing you can do is search over the range from "username:" to "username;".
"username:" is the first possible string starting with "username:". "username;"
is the first possible sting after all of the stings that start with "username:"
. This works because ; is the character right after : in
Yes, they start Cassandra as a daemon in the background. It is running. You can
connect to it from the CLI or any other client. You can see what it is doing by
reading the logs. cassandra -f starts Cassandra in the foreground, that is why
it does not return a prompt when the server starts.
It picks sequentially (the two previous ones, I believe). So in your example it
would be 105.12 and 105.11
- Original Message -
From: "Ramesh Natarajan"
To: user@cassandra.apache.org
Sent: Monday, October 3, 2011 5:06:10 PM
Subject: node selection for replication factor 3
I have 6 nod
Cassandra does not break apart a row. All of the columns of a row are kept on
the same nodes.
I believe that writing multiple columns of the same row is transactional, but
not atomic. By which I mean that if one column is written all the other ones
will be written as well, but if a read happens
Method 1 may also result in very wide rows if you have lots and lots of tags
and comments. This is a very drastic inefficiency for Cassadra (but again,
it depends on your data).
On Mon, Oct 17, 2011 at 05:40, Chintana Wilamuna wrote:
> Hi,
>
> Does anyone have an idea about the pros/cons with mod
We are setting up my application around Cassandra .8.0 (will move to
Cassandra 1.0 in the near future). In production the application will
be running in a two (or more) node cluster with RF 2. In development,
we do not always have 2 machines to test on, so we may have to run a
Cassandra cluster con
You can do a column slice for columns between "image/" (the first
ASCII string that starts with that sub-string) and "image/~" (the last
printable ASCII string that starts with that sub-string).
On Thu, Oct 27, 2011 at 21:10, Jean-Nicolas Boulay Desjardins
wrote:
> Normally in SQL I would use "%"
I realize that it is not realistic to expect it, but is would be good
to have a Partitioner that supports both range slices and automatic
load balancing.
On Thu, Nov 3, 2011 at 13:57, Ertio Lew wrote:
> Provide an option to sort columns by timestamp i.e, in the order they have
> been added to the
I assume that Reports is the Super column family, the first 1: is the
report id and in the topology is the row key, that the second 1: is
the report line and in the Cassandra topology the super column, and
that "value 1" is the column name. If this is not the case, maybe
explain the topology better
It may be the case that your CL is the issue. You are writing it at
ONE, which means that out of the 4 replicas of that key (two in each
data center), you are only putting it on one of them. When you read at
CL ONE, if only looks at a single replica to see if the data is there.
In other words. If y
Or just have two column families to do it: A CF idToName that has the
userIds as keys and the userName as the only column and a CF nameToId
that has the userNames as keys and the userId as the only column
On Mon, Nov 14, 2011 at 03:50, chovatia jaydeep
wrote:
> Check if Cassandra secondary index
I am running Cassandra 1.0.0. I am using cqlsh for inspecting my data
(very useful tool, thank you whoever wrote it). I notice that when I
query for the FIRST N REVERSED column, it is omitting the column name
on the first column. For example,
cqlsh> SELECT FIRST 1 REVERSED * FROM netflow_raw;
'{"bo
The way that I understand it (and that seems to be consistent with what was
said in this discussion) is that each DC has its own data space. Using your
simplified 1-10 system:
DC1 DC2
0 D1R1 D2R2
1 D1R1 D2R1
2 D1R1 D2R1
3 D1R1 D2R1
4 D1R1 D2R1
5 D1R2 D2R1
6 D1R2 D2R2
7 D1R2 D
I want to create a custom RPM of Cassandra (so I can deploy it pre-configured).
There is an RPM in the source tree, but it does not contain any details of the
setup required to create the RPM (what files should I have where). I have tried
to run rpmbuild -bi on the spec file and I am getting the
: Making a custom Cassandra RPM
Your apache ant install is too old. The ant that comes with
rhel/centos 5.X isn't new enough to build cassandra. You will need to
install ant manually.
On Wed, May 4, 2011 at 2:01 PM, Konstantin Naryshkin
wrote:
> I want to create a custom RPM of Cassandra (
: "Konstantin Naryshkin"
To: user@cassandra.apache.org
Sent: Friday, May 6, 2011 2:56:43 PM
Subject: Re: Making a custom Cassandra RPM
Sorry that I did not get back to you on the issue. Your suggestion worked and I
was able to get the RPM to build. Unfortunately, it still does not work for
42 matches
Mail list logo