Samal that's pretty smart stuff
From: samal [mailto:samalgo...@gmail.com]
Sent: Friday, June 01, 2012 11:24 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra Data Archiving
I believe you are talking about "HDD space", consumed by user generated data
which is no longer required after 15 d
I believe you are talking about "HDD space", consumed by user generated
data which is no longer required after 15 days or may required.
First case to use TTL which you don't wan to use. 2nd as aaron pointed
snapshotting data, but data still exist in cluster, only used for back up.
I think of like
On Fri, Jun 1, 2012 at 12:28 PM, Harshvardhan Ojha <
harshvardhan.o...@makemytrip.com> wrote:
> Problem statement:
>
> We are keeping daily generated data(user generated content) in
> Cassandra, but our application is using only 15 days old data. So how can
> we archive data older than 15 da
Problem statement:
We are keeping daily generated data(user generated content) in Cassandra, but
our application is using only 15 days old data. So how can we archive data
older than 15 days so that we can reduce load on Cassandra ring.
Note : we can't apply TTL, as this data may be needed in f
Could be this
https://issues.apache.org/jira/browse/CASSANDRA-4201
But that talks about segments not being cleared at startup. Does not explain
why they were allowed to get past the limit in the first place.
Can you share some logs from the time the commit log got out of control ?
Cheers
--
Sounds like
https://issues.apache.org/jira/browse/CASSANDRA-4219?attachmentOrder=desc
Drop back to 1.0.10 and have a play.
Good luck.
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 1/06/2012, at 6:38 AM, Chen, Simon wrote:
> Hi,
> I am new
> The ring (2 in DC1, 1 in DC2) looks OK, but the load on the new node in DC2
> is almost 0%.
yeah, thats the way it will look.
> But all the other rows are not in the new node. Do I need to copy the data
> files from a node in DC1 to the new node?
How did you add the node ? (see
http://www.d
The default value for rpc_timeout is 1 - 10 seconds.
You want the socket timeout to be higher than the rpc_timeout otherwise the
client will give up before the server.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 1/06/2012, at 3:2
I suggest creating a ticket on https://issues.apache.org/jira/browse/CASSANDRA
with the details.
If it is an immediate concern see if you can find someone in the #cassandra
chat room http://cassandra.apache.org/
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.
Look in the logs for errors or warnings. Also let us know what version you are
using.
Am guessing that node 2 still thought that node 1 was in the cluster when you
did the move. Which should(?) have errored.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.t
> If you hash 4 composite keys, let's say ('A','B','C'), ('A','D','C'),
> ('A','E','X'), ('A','R','X'), you have only 4 hashes or you have more?
Four
> If it's 4, how come you are able to range query for example between
> start_column=('A', 'D') and end_column=('A','E') and get this column
> ('
I'm not sure on your needs, but the simplest thing to consider is snapshotting
and copying off node.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 1/06/2012, at 12:23 AM, Shubham Srivastava wrote:
> I need to archive my Cassandra data i
If you want to do arbitrary complex online / realtime queries look at Data Stax
Enterprise, or https://github.com/tjake/Solandra or straight Solr.
Alternatively denormalise the model to materialise the results when you insert
so you query is a straight lookup. Or do some client side filtering /
So this happened to me again, but it was only when the cluster had a node down
for a while. Then the commit logs started piling up past the limit I set in
the config file, and filled the drive.
After the node recovered and hints had replayed the space was never reclaimed.
A flush or drain did
But I think it's bad idea, since hot data will be evenly distributed
between multiple sstables and filesystem pages.
On Thu, May 31, 2012 at 1:08 PM, crypto five wrote:
> You may also consider disabling key/row cache at all.
> 1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and
You may also consider disabling key/row cache at all.
1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you
will access your hot keys with thousands of qps without hitting disk at all.
Enabling compression can make situation even better.
On Thu, May 31, 2012 at 12:01 PM, Gurpree
Aaron,
Thanks for your email. The test kinda resembles how the actual application
will be.
It is going to be a simple key-value store with 500 million keys per node.
The traffic will be read heavy in steady state, and there will be some keys
that will have a lot more traffic than others. The expect
Hi,
I am new to Cassandra.
I have started a Cassandra instance (Cassandra.bat), played with it for a
while, created a keyspace Zodiac.
When I kill Cassandra instance and restarted, the keyspace is gone but when I
tried to recreate it,
I got 'org.apache.thrift.transport.TTransportException' err
Thanks Aaron.
I might use LOCAL_QUORUM to avoid the waiting on the ack from DC2.
Another question, after I setup a new node with token +1 in a new DC, and
updated a CF with RF {DC1:2, DC2:1}. When i update a column on one node in
DC1, it's also updated in the new node in DC2. But all the other r
Thanks a lot Aaron for the very fast response!
I have increased the CassandraThriftSocketTimeout from 5000 to 9000. Is
this a reasonable setting?
configurator.setCassandraThriftSocketTimeout(9000);
Cheers,
Christof
2012/5/31 aaron morton
> There are two times of timeouts. The thrift TimedOutEx
Hi guys,
We're running a three node cluster of cassandra 1.1 servers, originally
1.0.7 and immediately after the upgrade the error logs of all three servers
began filling up with the following message:
ERROR [ReplicateOnWriteStage:177] 2012-05-31 08:17:02,236
CounterContext.java (line 381) invali
Let me elaborate a bit.
two node cluster
node1 has token 0
node2 has token 85070591730234615865843651857942052864
node1 goes down perminently.
do a nodetool move 0 on node2.
monitor with ring... is in Moving state forever it seems.
From: Poziombka, Wade L
Sent: Tuesday, May 29, 2012 4:29 P
but sorry, I don"t undertand
If you hash 4 composite keys, let's say
('A','B','C'), ('A','D','C'), ('A','E','X'), ('A','R','X'), you have only 4
hashes or you have more?
If it's 4, how come you are able to range query for example between
start_column=('A', 'D') and end_column=('A','E') and get th
We want to use cassandra to store complex data. But we can't figure out, how to
organize indexes.
Our table (column family) looks like this:
Users = { RandomId int, Firstname varchar, Lastname varchar, Age int, Country
int, ChildCount int }
In our queries we have mandatory fields (Firstname,Lastn
it is hashed once.
To the partitioner it's just some bytes. Other parts of the code car about it's
structure.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 31/05/2012, at 7:00 PM, Cyril Auburtin wrote:
> Thx for the answer
> 1 more th
> You can set the gc_grace_secs as a little value and force major compaction
> after the row is expired. After then please check whether the row still
> exists.
There are some downsides to major compactions. (There have been some recent
discussions).
You can provoke (some) minor compactions by
There are two times of timeouts. The thrift TimedOutException occurs when the
coordinator times out waiting for the CL level nodes to respond. The error is
transmitted back to the client and raised.
This is a client side socket timeout waiting for the coordinator to respond.
See the Cassandra
Agree.
Just happy to see people upgrade to something 1.X
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 31/05/2012, at 8:24 AM, Rob Coli wrote:
> On Tue, May 29, 2012 at 10:29 PM, Pierre Chalamet wrote:
>> You'd better use version 1.0.9 (using
> Could you provide some guide on how to assign the tokens in this growing
> deployment phases?
background
http://www.datastax.com/docs/1.0/install/cluster_init#calculating-tokens-for-a-multi-data-center-cluster
Start with tokens for a 4 node cluster. Add the next 4 between between each of
t
Not directly.
* stop the cluster
* rename the /var/lib/cassandra/data/mykeyspace directory
* start the cluster
* create the keyspace with new name
* drop the keyspace with the old name
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 30/05/
> -Is there any other way to stract the contect of SSTable, writing a
> java program for example instead of using sstable2json?
Look at the code in sstale2json and copy it :)
> -I tried to get tombstons using the thrift API, but seems to be not
> possible, is it right? When I try, the program thro
Hi,
yes, the work can be split between different mappers, but each one will process
one row at the time. In fact, the method
> public void map(ByteBuffer key, SortedMap columns,
> Context context)
processes 1 row, with the specified ByteBuffer key and the list of columns
SortedMap columns.
Hi,
I'm working on some use cases to understand how cassandra-hadoop
integration works.
I have a very basic scenario: I have a column family that keeps the session
id and some bson data that contains the username in two separate columns. I
want to go through all rows and dump the row to a file wh
Thx for the answer
1 more thing, a Composite key is not hashed only once I guess?
It's hashed the number of part the composite have?
So this means there are twice or 3 or ... as many keys as for normal column
keys, is it true?
Le 31 mai 2012 02:59, "aaron morton" a écrit :
> Composite Columns com
34 matches
Mail list logo