I´m using for each application it´s own keyspace.
What I want is to split up for different load patterns.
So that 2 apps with same and very high load pattern are not clashing.
For other load patterns I want to use another splitting.
Is there any best practice or should I scale out, so that the co
Memtables resides in heap, write rate impacts GC, more writes - more frequent
and longer ParNew GC pauses.
From: Jay Svc [mailto:jaytechg...@gmail.com]
Sent: Friday, April 12, 2013 01:03
To: user@cassandra.apache.org
Subject: Does Memtable resides in Heap?
Hi Team,
I have got this 8GB of RAM o
Saw this in earlier versions. Our workaround was disable; drain; snap;
shutdown; delete; link from snap; restart;
-ljr
On Apr 11, 2013, at 9:45, wrote:
> I have formulated the following theory regarding C* 1.2.2 which may be
> relevant: Whenever there is a disk error during compaction of an
Hi,
I've been changing a benchmarking tool (YCSB) to vary the number of
clients throughout a workload execution and, for some reason, I believe
Cassandra is facing some problems to handle the variation (both up and
down) on the number of connections. Each client has a connection and
clients ar
Hi Team,
I have got this 8GB of RAM out of that 4GB allocated to Java Heap. My
question is the size of Memtable does it contribute to heap size? or they
are part of off-heap?
Does bigger Memtable would have impact on GC and overall memory management?
I am using DSE 3.0 / Cassandra 1.1.9.
Thanks
I am using 1.2.3, used default heap - 2 GB without JNA installed,
then modified heap to 4 GB / 400 MB young generation. + JNA installed.
bloom filter on the CF's is lowered (more false positives, less disk space).
WARN [ScheduledTasks:1] 2013-04-11 11:09:41,899 GCInspector.java (line
142) Heap is
With that much data per node you have to raise the IndexInterval and adjust
the bloom filter settings. Although the bloom filters are off heap now
having that much data can but a strain on physical memory.
On Thu, Apr 11, 2013 at 4:26 PM, aaron morton wrote:
> > The data will be huge, I am estim
> The data will be huge, I am estimating 4-6 TB per server. I know this is
> best, but those are my resources.
You will have a very unhappy time.
The general rule of thumb / guideline for a HDD based system with 1G networking
is 300GB to 500Gb per node. See previous discussions on this topic fo
> I will be using `Pelops client`
If you are starting out using Java I *strongly* suggest using this client
https://github.com/Netflix/astyanax/ see the documentation here
https://github.com/Netflix/astyanax/wiki
> My understanding was to create the cluster with all the `24 nodes` as I will
>
> Whenever there is a disk error during compaction of an SS table (e.g., bad
> block, out of disk space), that SStable’s files stick around forever after
>
Fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-2261
> We are using 1.1.5, besides that I have tried to run cleanup, with no
> Is it guaranteed that the rows are grouped by the value of the partition key?
> That is, is it guaranteed that I'll get
Your primary key (k1, k2) is considered in type parts (partition_key ,
grouping_columns). In your case the primary_key is key and the grouping column
in k2. Columns are order
> Can you please elaborate on the specials of truncate?
I think ed was talking about this config setting in 1.2
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L484
> It works, only sometimes it silently fails (1 in 400 runs of the truncate,
> actually).
The data is left in p
For one project I will need to run cassandra on following dedicated servers:
Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally
attached HDD's in some kind of RAID, visible as single HDD.
I can do cluster of 20-30 such servers, may be even more.
The data will be huge, I am estim
When created by the SSTableScanner the dataStart passed in is the existing file
position so it may not be necessary. But it may be sane to do it and the seek()
call may not result in disk reads.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
htt
tables created without COMPACT STORAGE are still visible in cassandra-cli.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 11/04/2013, at 5:40 AM, Tyler Hobbs wrote:
>
> On Wed, Apr 10, 2013 at 11:09 AM, Vivek Mish
Hello,
I don't know what pelops is. I'm not sure why you want two clusters. I
would have two clusters if I want to have data stored on totally separate
servers for perhaps security reasons.
If you are going to have the servers in one location then you might as well
have one cluster. You'll have t
A node can only exist in one DC and one rack.
Use different keyspaces as suggested.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 12/04/2013, at 1:47 AM, Jabbar Azam wrote:
> Hello,
>
> I'm not an expert but I
Folks, Any thoughts on this? I am still in the learning process. So any
guidance will be of great help.
*Raihan Jamal*
On Wed, Apr 10, 2013 at 10:39 PM, Raihan Jamal wrote:
> I have started working on a project in which I am using `Cassandra
> database`.
>
> Our production DBA's have setup
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.2.4.
Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:
http://cassand
Hello,
Let us consider that we have a table t created as follows:
create table t(k1 vachar, k2 varchar, value varchar, primary key (k1, k2));
Its contents is
a m x
a n y
z 0 9
z 1 8
and I perform a
select * from p where k1 in ('a', 'z');
Is it guaranteed that the rows are grouped by the val
Hi,
I have JNA (cassandra only complains about obsolete version - Obsolete
version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later
- I have stock centos version 3.2.4).
Usage of separate CFs for each test run is difficult to set up.
Can you please elaborate on the specials of
Thanks for the feedback! We will be going forward by implementing and
deploying the proposed model, and test it out.
Cheers,
Coen
On Thu, Apr 11, 2013 at 12:21 PM, aaron morton wrote:
> Retrieving the latest 1000 tweets (of a given day) is trivial by
> requesting the streamTweets columnFamily.
If you do not have JNA truncate has to fork an 'ln -s'' command for the
snapshots. I think that makes it un-predicatable. Truncate has its own
timeout value now (separate from the other timeouts). If possible I think
it is better to make each test use it's own CF and avoid truncate entirely.
On T
I have formulated the following theory regarding C* 1.2.2 which may be
relevant: Whenever there is a disk error during compaction of an SS table
(e.g., bad block, out of disk space), that SStable's files stick around forever
after, and do not subsequently get deleted by normal compaction (minor
Aaron,
It seems that we are in the same situation as Nury, we are storing a lot of
files of ~5MB in a CF.
This happens in a test cluster, with one node using cassandra 1.1.5, we
have commitlog in a different partition than the data directory. Normally
our tests use nearly 13 GB in data, but when
Bingo! Thanks to both of you. (the C* community rocks)
A few hours worth of work, and I've got a working REST-based photo
repository backed by C* using the CQL java driver. =)
rock on, thanks again,
-brian
On Thu, Apr 11, 2013 at 9:33 AM, Sylvain Lebresne wrote:
>
> I assume I'm doing someth
Hi,
I use C* 1.2.3 and CQL3.
I integrated cassandra into our testing environment. In order to make the
tests repeatable I truncate all the tables that need to be empty before the
test run via ssh session to the host cassandra runs on and by running cqlsh
where I issue the truncate.
It works, onl
Hello,
I read the source code of SSTableIdentityIterator with v-1.0.9, and I
thought the following code is not necessary, did I miss anything?
RandomAccessReader file = (RandomAccessReader) input;
file.seek(this.dataStart);
here, the value of dataStart is assigned
Hello,
I'm not an expert but I don't think you can do what you want. The way to
separate data for applications on the same cluster is to use different
tables for different applications or use multiple keyspaces, a keyspace per
application. The replication factor you specify for each keyspace speci
> I assume I'm doing something wrong in the select. Am I incorrectly using
> the ResultSet?
>
You're incorrectly using the returned ByteBuffer. But you should not feel
bad, that API kinda
sucks.
The short version is that .array() returns the backing array of the
ByteBuffer. But there is no
guara
That's right, there is some padding there...
So, instead of getting calling array(), you have to do something like:
byte[] data = resultSet.one().getBytes("data");
int length = data.remaining();
blobBytes = new byte[length];
data.get(blobBytes, 0, length);
Gabi
On 4/11/13 4:09 PM, Brian O'Nei
Sylvain,
Interesting, when I look at the actual bytes returned, I see the byte array
is prefixed with the keyspace and table name.
I assume I'm doing something wrong in the select. Am I incorrectly using
the ResultSet?
-brian
On Thu, Apr 11, 2013 at 9:09 AM, Brian O'Neill wrote:
> Yep, it wor
Yep, it worked like a charm. (PreparedStatement avoided the hex conversion)
But now, I'm seeing a few extra bytes come back in the select….
(I'll keep digging, but maybe you have some insight?)
I see this:
ERROR [2013-04-11 13:05:03,461] com.skookle.dao.RepositoryDao:
repository.add() byte.lengt
Hi,
I would like to create big cluster for many applications.
Within this cluster I would like to separate the data for each application,
which can be easily done via different virtual datacenters and the correct
replication strategy.
What I would like to know, if I can specify for 1 node multip
> Hopefully, the prepared statement doesn't do the conversion.
>
It does not.
> (I'm not sure if it is a limitation of the CQL protocol itself)
>
> thanks again,
> -brian
>
>
>
> ---
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science
> The Science of Better Results
>
Cool. That might be it. I'll take a look at PreparedStatement.
For query building, I took a look under the covers, and even when I was
passing in a ByteBuffer, it runs through the following code in the
java-driver:
Utils.java:
if (value instanceof ByteBuffer) {
sb.append("0x");
s
I'm not using the query builder but the PreparedStatement.
Here is the sample code: https://gist.github.com/devsprint/5363023
Gabi
On 4/11/13 3:27 PM, Brian O'Neill wrote:
Great!
Thanks Gabriel. Do you have an example? (are using QueryBuilder?)
I couldn't find the part of the API that allowe
Great!
Thanks Gabriel. Do you have an example? (are using QueryBuilder?)
I couldn't find the part of the API that allowed you to pass in the byte
array.
-brian
---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive King o
Hi Brian,
I'm using the blobs to store images in cassandra(1.2.3) using the
java-driver version 1.0.0-beta1.
There is no need to convert a byte array into hex.
Br,
Gabi
On 4/11/13 3:21 PM, Brian O'Neill wrote:
I started playing around with the CQL driver.
Has anyone used blobs with it yet?
I started playing around with the CQL driver.
Has anyone used blobs with it yet?
Are you forced to convert a byte[] to hex?
(e.g. I have a photo that I want to store in C* using the java-driver API)
-brian
--
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mo
Fixed in 1.1.11 due out soon
https://issues.apache.org/jira/browse/CASSANDRA-5284
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 10/04/2013, at 7:35 PM, Winsdom Chen wrote:
> Hi,
> I've lot of assertion error i
cqlsh in cassandra 1.2 defaults to cql 3.
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 10/04/2013, at 6:55 PM, Gurminder Gill wrote:
> Ah ha. So, the client defaults to CQL 2. Anyway of changing that? I tired
> libthrif
> I've already tried to set internode_compression: none in my yaml files.
What version are you on?
If you've set internode_compression to none and restarted? Can you double check.
The code stack shows cassandra deciding that the connection should be
compressed.
Cheers
-
Aaron
> Is it true the coordinator node treats them as __independent__
> communications/requests to replicas (even if in that case, the replicas are
> the same for every request) ?
A row mutation is a request to store columns in one or more CF's using one row
key. It is treated as indivisible by the c
> b) the "batch_mutate" advantages are better, for the communication
> "client<=>coordinator node" __and__ for the communications "coordinator
> node<=>replicas".
Yes. A single row mutation can write to many CFs.
> Is there any experience out there about such data modeling (option_a vs
> optio
> Retrieving the latest 1000 tweets (of a given day) is trivial by requesting
> the streamTweets columnFamily.
If you normally want to get the most recent items use a reverse comparator on
the column name
see http://thelastpickle.com/2011/10/03/Reverse-Comparators/
> Getting the latest tweets
To reduce possibilities, have you changed a super CF to a standard CF recently
?
Can you isolate this to specific CF ?
Have you changed the comparators / schema recently ?
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle
47 matches
Mail list logo