> One thing to keep in mind is that SSTables are not actually removed
> from disk until the garbage collector has identified the relevant
> in-memory structures as garbage (there is a note on the wiki about
However I forgot that the 'load' reported by nodetool ring does not, I
think, represent on-
Wild guess here, but are you using start_token/end_token here when you
should be using start_key? Looks to me like you are trying end_token
= ''.
HTH,
/thomas
On Thursday, August 5, 2010, Adam Crain wrote:
> Hi,
>
> I'm on 0.6.4. Previous tickets in the JIRA in searching the web indicated
> tha
Thomas,
That was indeed the source of the problem. I naively assumed that the token
range would help me avoid retrieving duplicate rows.
If you iterate over the keys, how do you avoid retrieving duplicate keys? I
tried this morning and I seem to get odd results. Maybe this is just a
consequenc
Funny you should ask... I just went through the same exercise.
You must use Cassandra 0.6.4. Otherwise you will get duplicate keys.
However, here is a snippet of perl that you can use.
our $WANTED_COLUMN_NAME = 'mycol';
get_key_to_one_column_map('myKeySpace', 'myColFamily', 'mySuperCol', QUORUM
Thanks Dave. I'm using 0.6.4 since I say this issue in the JIRA, but I just
discovered that the client I'm using mutates the order of keys after retrieving
the result with the thrift API... pretty much making key iteration impossible.
So time to fork and see if they'll fix it :(.
I'll review yo
Sounds like what you're seeing is in the client, but there was another
duplicate bug with get_range_slice that was recently fixed on cassandra-0.6
branch. It's slated for 0.6.5 which will probably be out sometime this month,
based on previous minor releases.
https://issues.apache.org/jira/brow
If I create 3-4 keyspaces, will this impact performance and resources
(esp. memory and disk I/O) too much?
Thanks,
Zhong
On Aug 5, 2010, at 4:52 PM, Benjamin Black wrote:
On Thu, Aug 5, 2010 at 12:59 PM, Zhong Li wrote:
The big thing bother me is initial ring token. We have some Column
Has anyone had any success using Cassandra 0.7 w/ ruby? I'm attempting
to use the fauan/cassandra gem (http://github.com/fauna/cassandra/)
which has explicit support for 0.7 but I keep receiving the following
error message when making a request.
Thrift::TransportException: end of file reached
Make sure the client and server are both using the same transport
(framed vs. non)
-ryan
On Fri, Aug 6, 2010 at 9:47 AM, Mark wrote:
> Has anyone had any success using Cassandra 0.7 w/ ruby? I'm attempting to
> use the fauan/cassandra gem (http://github.com/fauna/cassandra/) which has
> explicit
Wow.. fast answer AND correct. In Cassandra.yml
# Frame size for thrift (maximum field length).
# 0 disables TFramedTransport in favor of TSocket.
thrift_framed_transport_size_in_mb: 15
I just had to change that value to 0 and everything worked. Now for my
follow up question :) What is the dif
On Fri, Aug 6, 2010 at 9:57 AM, Mark wrote:
> Wow.. fast answer AND correct. In Cassandra.yml
>
> # Frame size for thrift (maximum field length).
> # 0 disables TFramedTransport in favor of TSocket.
> thrift_framed_transport_size_in_mb: 15
>
> I just had to change that value to 0 and everything wo
If nodetool loadbalance does not do what it's name implies, should it be
renamed or maybe even remove altogether since the recommendation is to
_never_ use it in production?
Bill
On Thu, Aug 5, 2010 at 6:41 AM, aaron morton wrote:
> This comment from Ben Black may help...
>
> "I recommend you _n
We are defaulting to framed in 0.7 because it enables the fix to
https://issues.apache.org/jira/browse/CASSANDRA-475
I am strongly in favor of removing the unframed option entirely in 0.8
On Fri, Aug 6, 2010 at 12:57 PM, Mark wrote:
> Wow.. fast answer AND correct. In Cassandra.yml
>
> # Frame s
Hi All,
Little background about myself. I am ETL engineer worked in only relational
databases.
I have been reading and trying Cassandra since 3-4 weeks. I kind of
understood Cassandra data model, its structure, nodes etc. I also installed
Cassandra and played around with it, like
cassandra>
if i am using batch_mutate to update/insert two columns in the same CF
and same key, is this an atomic operation?
i understand that an operation on a single key in a CF is atomic, but
not sure if the above scenario boils down to two operations or
considered one operation.
thx
ok i just saw the FAQ
(http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic)
follow up question ...
it states that "As a special case, mutations against a single key are
atomic, but more generally no" ... i interpret that to also mean " ..
mutations against a single key in the same CF ... "
In my opinion it's the wrong approach when so ask how to migrate from MySQL to
Cassandra from a database level view. The lack of joins in NoSQL should lead to
think about what you wanna get out of your persistent storage and afterwards
think about how to migrate and most of the time how to denor
Thanks for reply,
I am sorry It seems my question comes out wrong..
* My question is what are the considration should I keep in mind to Migrate
to Cassandra?
* Like we do in ETL to extract data from source data we write query and then
load it in our database after applying desired transformation
Hi
I have a question about the internal of cassandra write.
Say, I already have the following in the database -
(row_x,col_y,val1)
Now if I try to insert
(row_x,col_y,val100), what will happen?
Will it overwrite the old data?
I mean, will it overwrite the data physically or will it keep both the
o
Hi Jeremy,
So, I fixed my client so it preserves the ordering and I get results that may
be related to the bug.
If I insert 30 keys into the random partitioner with names [key1, key2, ...
key30] and then start the iteration (with a batch size of 10) I get the
following debug output during the
If you're willing to try it out, the easiest way to check to see if it is
resolved by the patch for CASSANDRA-1145, you could checkout the 0.6 branch:
svn checkout http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.6/
cassandra-0.6
Then run `ant` to build the binaries.
On Aug 6, 20
Yes, imo, it should be renamed.
On Fri, Aug 6, 2010 at 10:10 AM, Bill Au wrote:
> If nodetool loadbalance does not do what it's name implies, should it be
> renamed or maybe even remove altogether since the recommendation is to
> _never_ use it in production?
>
> Bill
>
> On Thu, Aug 5, 2010 at 6
On Fri, Aug 6, 2010 at 12:51 PM, Maifi Khan wrote:
> Hi
> I have a question about the internal of cassandra write.
> Say, I already have the following in the database -
> (row_x,col_y,val1)
>
> Now if I try to insert
> (row_x,col_y,val100), what will happen?
> Will it overwrite the old data?
> I m
"Additional keyspaces have very little overhead (unlike CFs)."
On Fri, Aug 6, 2010 at 9:42 AM, Zhong Li wrote:
>
> If I create 3-4 keyspaces, will this impact performance and resources (esp.
> memory and disk I/O) too much?
>
> Thanks,
>
> Zhong
>
> On Aug 5, 2010, at 4:52 PM, Benjamin Black wrot
http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/
http://www.slideshare.net/benjaminblack/cassandra-basics-indexing
On Fri, Aug 6, 2010 at 11:42 AM, sonia gehlot wrote:
> Thanks for reply,
>
> I am sorry It seems my question comes out wrong..
>
> * My question is w
Ryan,
I believe my branch was merged into fauna some time ago by jmhodges.
However, 0.7 support must be explicitly enabled by require
'cassandra/0.7' as it currently defaults to 0.6.
b
On Fri, Aug 6, 2010 at 10:02 AM, Ryan King wrote:
> On Fri, Aug 6, 2010 at 9:57 AM, Mark wrote:
>> Wow.. fas
On 8/5/10 11:51 AM, Peter Schuller wrote:
Also, the variation in disk space in your most recent post looks
entirely as expected to me and nothing really extreme. The temporary
disk space occupied during the compact/cleanup would easily be as high
as your original disk space usage to begin with, a
If you want to be able to get the data over time, you need to store it
in multiple columns. You can use TimeUUID columns if you need to be
able to get ranges of times through queries.
-Original Message-
From: Maifi Khan [mailto:maifi.k...@gmail.com]
Sent: Friday, August 06, 2010 2:51 PM
On Thu, Jul 29, 2010 at 9:57 PM, Ryan Daum wrote:
>
> Barring this we (place where I work, Chango) will probably eventually fork
> Cassandra to have a RESTful interface and use the Jetty async HTTP client to
> connect to it. It's just ridiculous for us to have threads and associated
> resources t
I ran against the 0.6 branch I still see similarly odd results. My test cases
prove that set of keys have been successfully inserted, but usually I never see
the first key again or I reach the first key before having seen all of the keys.
-Adam
-Original Message-
From: Jeremy Hanna [m
On 8/6/10 2:13 PM, Benjamin Black wrote:
Assuming the old version is already on disk in an SSTable, the new
version will not overwrite it, and both versions will be in the
system. A compaction will remove the old version, however.
To be clear, a compaction will only remove the old version if :
> Your post refers to "obsolete" sstables, but the only thing that makes them
> "obsolete" in this case is that they have been compacted?
Yes.
> As I understand Julie's case, she is :
>
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more dis
I'm a little retarded. Can you explain this a little more in depth? What you
mean by "index rows named...". Do you mean create a separate ColumnFamily?
On Sat, Jul 31, 2010 at 9:32 PM, Benjamin Black wrote:
> Have the TimeUUID as the key, and then index rows named for the time
> intervals, each
> a) It is a major compaction [1]
> b) The old version was deleted/overwritten more than GCGraceSeconds ago [2]
c) and the memtable containing the delete/overwrite has been flushed.
(I suppose that's kinda obvious in retrospect, but it took me a little
bit to realize this was why a 'nodetool comp
Hey,
[junit] key24
[junit] Query w/ Range(key24,,10) result size: 10
[junit] key24
I think this is actually the expected result, whenever you are using
range_slices with start_key/end_key you must increment the last key
you received and then use that in the next slice start_key. I also
tried to u
> I think this is actually the expected result, whenever you are using
> range_slices with start_key/end_key you must increment the last key
> you received and then use that in the next slice start_key. I also
> tried to use token because of exactly that behaviour and the doc
> talking about inclus
On 8/6/10 3:30 PM, Peter Schuller wrote:
*However*,
and this is what I meant with my follow-up, that still does not
explain the data from her post unless 'nodetool ring' reports total
sstable size rather than the total size of live sstables.
Relatively limited time available to respond to this
> sstables waiting for the GC to trigger actual file removal. *However*,
> and this is what I meant with my follow-up, that still does not
> explain the data from her post unless 'nodetool ring' reports total
> sstable size rather than the total size of live sstables.
As far as I can tell, the inf
I took this approach... reject the first result of subsequent get_range_slice
requests. If you look back at output I posted (below) you'll notice that not
all of the 30 keys [key1...key30] get listed! The iteration dies and can't
proceed past key2.
1) 1st batch gets 10 unique keys.
2) 2nd batch
>
> Another way to do it is to filter results to exclude columns received
> twice due to being on iteration end points.
Well, depends on the size of your rows, keeping lists of 1mil+ column
names will eventually become rally slow (at least in ruby).
>
> This is useful because it is not always
would it be possible to backport the 0.7 feature, the ability to safe and
preload row caches after a restart. i think that is a very nice and
important feature that would help users with very large caches, that take a
long time to get the proper hot set. for example we can get pretty good
cache r
Hi all,
I'm now checking reliability when inserting.
I set one cluster of 3 nodes with replication factor 2 and
OrderPreservingPartitioner.
I insert data (A,B,C in this order) with consistency level ONE to node No.3.
I immediately shut off (turn off N/W I/F)node No.3 right after inserting
dat
On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain
wrote:
> I took this approach... reject the first result of subsequent get_range_slice
> requests. If you look back at output I posted (below) you'll notice that not
> all of the 30 keys [key1...key30] get listed! The iteration dies and can't
> proceed
the way i understand how row caches work is that each node has an
independent cache, in that they do not push there cache contents with other
nodes. if that the case is it also true that when a new node is added to
the cluster it has to build up its own cache. if thats the case i see that
as a po
.
ColumnFamily Standard: LogRecords, CompareWith=TimeUUIDType
Row Key "20100806":
Column Name: TimeUUID.new Value: JSON({'remote_addr':...,
'user_agent':, 'url':)
..., more Columns
In my case I chose to "partition" by day, if you are getting too man
limit (it was also logging related). Solution
> was pretty simple, log data is immutable, so no SuperColumn needed.
>
> ColumnFamily Standard: LogRecords, CompareWith=TimeUUIDType
>
> Row Key "20100806":
> Column Name: TimeUUID.new Value: JSON({'remote_addr':..
t;LogByRemoteAddrAndDate" CompareWith: TimeUUID
Row: "127.0.0.1:20100806" Column TimeUUID/JSON as usual. If you want
to "link" to the actual log record (to avoid writing if multiple
times) just insert the same timeuuid you inserted into the other CF
and leave the value empty. So yo
On 8/5/10 1:42 AM, Oleg Anastasjev wrote:
3.) When using the random partitioner how much difference should be expected
(or has been observed) between nodes? 2%? 10%?
This depends on data. It will distribute keys almost equal between nodes, nut
sizes of row data could be different for different
Everything in the same key of a batch_mutate is atomic. (But not isolated.)
On Fri, Aug 6, 2010 at 2:15 PM, B. Todd Burruss wrote:
> ok i just saw the FAQ
> (http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic)
>
> follow up question ...
>
> it states that "As a special case, mutations agai
See comments to https://issues.apache.org/jira/browse/CASSANDRA-1256
On Fri, Jul 30, 2010 at 12:57 AM, Ryan Daum wrote:
> An asynchronous thrift client in Java would be something that we could
> really use; I'm trying to get a sense of whether this async client is usable
> with Cassandra at this
are you caching 100% of the CF?
if not this is not super useful.
On Fri, Aug 6, 2010 at 7:10 PM, Artie Copeland wrote:
> would it be possible to backport the 0.7 feature, the ability to safe and
> preload row caches after a restart. i think that is a very nice and
> important feature that would
did you check the log on 3 for exceptions?
On Fri, Aug 6, 2010 at 7:10 PM, Ken Matsumoto wrote:
> Hi all,
>
> I'm now checking reliability when inserting.
>
> I set one cluster of 3 nodes with replication factor 2 and
> OrderPreservingPartitioner.
> I insert data (A,B,C in this order) with consis
rColumn needed.
>>
>> ColumnFamily Standard: LogRecords, CompareWith=TimeUUIDType
>>
>> Row Key "20100806":
>> Column Name: TimeUUID.new Value: JSON({'remote_addr':...,
>> 'user_agent':, 'url':)
>> ..., more Column
ot;LogByRemoteAddrAndDate" CompareWith: TimeUUID
Row: "127.0.0.1:20100806" Column TimeUUID/JSON as usual. If you want
to "link" to the actual log record (to avoid writing if multiple
times) just insert the same timeuuid you inserted into the other CF
and leave the value e
On Sat, Aug 7, 2010 at 6:00 AM, sonia gehlot wrote:
> Can you please help me how to move forward? How should I do all the setup
> for this?
My view is that Cassandra is fundamentally different from SQL databases. There
may be artefact's which are superficially similar between the two systems, bu
ll
>> them out later in some map/reduce fashion. What you want is another
>> column Family and a similar structure.
>>
>> ColumnFamily Standard "LogByRemoteAddrAndDate" CompareWith: TimeUUID
>>
>> Row: "127.0.0.1:20100806" Column TimeUUID/JSON as usual. If yo
good morning;
On 2010-08-07, at 02:45 , Jonathan Ellis wrote:
Everything in the same key of a batch_mutate is atomic. (But not
isolated.)
what does the distinction mean in the context of cassandra?
is it that the execution of an operation with the same key could see
the effect of the 'f
olumnFamily Standard "LogByRemoteAddrAndDate" CompareWith: TimeUUID
Row: "127.0.0.1:20100806" Column TimeUUID/JSON as usual. If you want
to "link" to the actual log record (to avoid writing if multiple
times) just insert the same timeuuid you inserted into the other CF
and leave t
olumnFamily Standard "LogByRemoteAddrAndDate" CompareWith: TimeUUID
Row: "127.0.0.1:20100806" Column TimeUUID/JSON as usual. If you want
to "link" to the actual log record (to avoid writing if multiple
times) just insert the same timeuuid you inserted into the other CF
and leave t
In the "CassandraLimitations" wiki it states:
" Cassandra has two levels of indexes: key and column"
I understand how the column and subcolumn indexes work but can someone
explain to me how the key level index works?
60 matches
Mail list logo