Hi,
I am using default LZ4Compressor, how can i further compress the data ??
Does switching to SnappyCompressor, or DeflateCompressor helps is there
any comparison metrics available ??
Has anyone implemented any new compression types which can reduce data, i
am using Cassandra 3.11.x
Than
1. How to use sharding partition key in a way that partitions end up in
different nodes?
You could, for example, create a table with a bucket column added to the
partition key:
Table distinct(
hourNumber int,
bucket int, //could be a 5 minute bucket for example
key text,
distinctValue long
primary
Two other questions:
1. How to use sharding partition key in a way that partitions end up in
different nodes?
2. if i set gc_grace_seconds to 0, would it replace the row at memtable (not
saving repeated rows in sstables) or it would be done at first compaction?
Sent using Zoho Mail
Can i set gc_grace_seconds to 0 in this case? because reappearing deleted data
has no impact on my Business Logic, i'm just either creating a new row or
replacing the exactly same row.
Sent using Zoho Mail
On Wed, 13 Jun 2018 03:41:51 +0430 Elliott Sims
wro
If this is data that expires after a certain amount of time, you probably
want to look into using TWCS and TTLs to minimize the number of tombstones.
Decreasing gc_grace_seconds then compacting will reduce the number of
tombstones, but at the cost of potentially resurrecting deleted data if the
ta
Hi,
I needed to save a distinct value for a key in each hour, the problem with
saving everything and computing distincts in memory is that there
are too many repeated data.
Table schema:
Table distinct(
hourNumber int,
key text,
distinctValue long
primary key (hourNumber)
)
I want t
Hi.
I am using Cassandra 1.2.15 and OpsCenter 5.0.2 (non-enterprise). We are
planning to upgrade in the upcoming week.
I found some old opscenter tables still lurking around.
/mnt/data/OpsCenter# du -h
68K./events_timeline
18M./rollups86400
172K./pdps
269M./rollups7200
68K./e
I think I get the basics of what you want to achieve. Side note, the sample
insert seems to have a typo for the transaction time
For the first query, I would store the data using weatherstation _id as the
key. The create table statement might look like this.
CREATE TABLE weatherstation (
weathers
Thanks for the response Peter. I used the temperature table because its the
most common example on CQL timeseries and I thought I would reuse it. From
some of the responses, looks like I was wrong.
event_time is the time the event happened. So yes it is valid time. I was
trying to see if I can get
I've built several different bi-temporal databases over the year for a
variety of applications, so I have to ask "why are you modeling it this
way?"
Having a temperatures table doesn't make sense to me. Normally a
bi-temporal database has transaction time and valid time. The transaction
time is th
I had forgotten, but there is a new tuple notation to iterate over more
than one clustering column in C* 2.0.6:
https://issues.apache.org/jira/browse/CASSANDRA-4851
For example,
SELECT ... WHERE (c1, c2) > (1, 0)
There's an example in the CQL spec:
https://cassandra.apache.org/doc/cql3/CQL.html
The simple, easy way to look at this is that you can use a range when the
data will be contiguous.Only by allowing only the last clustering column to
use a range can Cassandra be assured that the rows selected by the range
will be contiguous (a "slice.") The point is that Cassandra is designed for
As you point out, there's not really a node-based problem with your
query from a performance point of view. This is a limitation of CQL in
that, cql wants to slice one section of a partition's row (no matter how
big the section is). In your case, you are asking to slice multiple
sections of a p
Perhaps you should learn more about Cassandra before you ask such questions.
It's easy if you just look at the readily accessible docs.
ml
On Sat, Feb 14, 2015 at 6:05 PM, Raj N wrote:
> I don't think thats solves my problem. The question really is why can't we
> use ranges for both time colum
I don't think thats solves my problem. The question really is why can't we
use ranges for both time columns when they are part of the primary key.
They are on 1 row after all. Is this just a CQL limitation?
-Raj
On Sat, Feb 14, 2015 at 3:35 AM, DuyHai Doan wrote:
> "I am trying to get the state
"I am trying to get the state as of a particular transaction_time"
--> In that case you should probably define your primary key in another
order for clustering columns
PRIMARY KEY (weatherstation_id,transaction_time,event_time)
Then, select * from temperatures where weatherstation_id = 'foo' an
Has anyone designed a bi-temporal table in Cassandra? Doesn't look like I
can do this using CQL for now. Taking the time series example from well
known modeling tutorials in Cassandra -
CREATE TABLE temperatures (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weather
Hi everyone,
I am a bit stuck with my data model on Cassandra; What I am trying to do is
to be able to retrieve rows in groups, something similar to sql's GROUP BY
but that works only on one attribute.
I am keeping data grouped together in a different CF (eg. GROUP BY x had
his own CF groupby_x),
>
>
> If the data is read from a slice of a partition that has been added over
> time there will be a part of that row in every almost sstable. That would
> mean all of them (multiple disk seeks depending on clustering order per
> sstable) would have to be read from in order to service the query.
w).
Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
On 07/05/2014, at 10:55 AM, Kevin Burton wrote:
> I'm looking at storing log data in Cassandra…
>
> Every record is a unique timestamp for the key, and then the log line for the
> value.
>
>
ction options of your
> table as follow:
>
> compaction={'min_threshold': '0', 'class': 'SizeTieredCompactionStrategy',
> 'max_threshold': '0'}
>
> Regards
>
> Duy Hai DOAN
>
>
> On Wed, May 7, 201
On Tue, May 6, 2014 at 7:55 PM, Kevin Burton wrote:
> I'm looking at storing log data in Cassandra…
>
> Every record is a unique timestamp for the key, and then the log line for
> the value.
>
> I think it would be best to just disable compactions.
>
> - th
ote:
> I'm looking at storing log data in Cassandra…
>
> Every record is a unique timestamp for the key, and then the log line for the
> value.
>
> I think it would be best to just disable compactions.
>
> - there will never be any deletes.
>
> - all the data wi
May 7, 2014 at 2:55 AM, Kevin Burton wrote:
> I'm looking at storing log data in Cassandra…
>
> Every record is a unique timestamp for the key, and then the log line for
> the value.
>
> I think it would be best to just disable compactions.
>
> - there will neve
I'm looking at storing log data in Cassandra…
Every record is a unique timestamp for the key, and then the log line for
the value.
I think it would be best to just disable compactions.
- there will never be any deletes.
- all the data will be accessed in time range (probably partit
Hi All,
I am facing strange problem in my production box. i am using hector-client
1.2-5 to execute query on cassandra 1.2.11.
I am getting below logs in my application.
[2014-02-18 20:48:26.340][KafkaConsumer-0][INFO][CommonCassandraService:246]*
MutationResult for saving entity UpdatedUploadI
> The problems occurs during the day where updates can be sent that possibly
> contain older data then the nightly batch update.
If you have a an application level sequence for updates (I used that term to
avoid saying timestamp) you could use it as the cassandra timestamp. As long as
you know
that is, data consists of of an account id with a timestamp column that
indicates when the account was updated. This is not to be confused with row
insertion/update times tamp maintained by Cassandra for conflict resolution
within the Cassanda Nodes. Furthermore the account has about 200 columns
an
On Fri, Nov 1, 2013 at 10:29 PM, Krishna Chaitanya
wrote:
> I am newbie to the Cassandra world. I am currently using
> Cassandra 2.0.0 with thrift 0.8.0 for storing netflow packets using
> libQtCassandra library. ... Is this a known issue because it did not occur
> when we were using Ca
Hello,
I am newbie to the Cassandra world. I am currently using
Cassandra 2.0.0 with thrift 0.8.0 for storing netflow packets using
libQtCassandra library. Currently, I am generating about 1000 netflows/sec
and store\ing them into the database. The program is crashing with the an
exceptio
Yes, what is Solr Cloud then for, that already provides clustering support,
so what's the need for Cassandra ?
On Tue, Oct 1, 2013 at 2:06 AM, Sávio Teles wrote:
>
> Solr's index sitting on a single machine, even if that single machine can
>> vertically scale, is a single point of failure.
>>
>
> Solr's index sitting on a single machine, even if that single machine can
> vertically scale, is a single point of failure.
>
And about Cloud Solr?
2013/9/30 Ken Hancock
> Yes.
>
>
> On Mon, Sep 30, 2013 at 1:57 PM, Andrey Ilinykh wrote:
>
>>
>> Also, be aware that while Cassandra has knobs
Yes.
On Mon, Sep 30, 2013 at 1:57 PM, Andrey Ilinykh wrote:
>
> Also, be aware that while Cassandra has knobs to allow you to get
>> consistent read results (CL=QUORUM), DSE Search does not. If a node drops
>> messages for whatever reason, outtage, mutation, etc. its solr indexes will
>> be inc
> Also, be aware that while Cassandra has knobs to allow you to get
> consistent read results (CL=QUORUM), DSE Search does not. If a node drops
> messages for whatever reason, outtage, mutation, etc. its solr indexes will
> be inconsistent with other nodes in its replication group.
>
> Will repair
To clarify, solr indexes are not distributed in the same way that Cassandra
data is stored.
With Cassandra, each node receives a fraction of the keyspace (based on
your replication factor and token assignment). With DSE Search, writes to
Cassandra are hooked and each node independently indexes it
On Mon, Sep 30, 2013 at 8:50 AM, Ertio Lew wrote:
> Solr's data is stored on the file system as a set of index files[
> http://stackoverflow.com/a/7685579/530153]. Then why do we need anything
> like Solandra or DataStax Enterprise Search? Isn't Solr complete solution
> in itself ? What do we ne
the main reason is scalability and performance.
If your Solr indexes fit fine on a single system and doesn't need to scale
out, Cassandra/HDFS isn't necessary.
On Mon, Sep 30, 2013 at 11:50 AM, Ertio Lew wrote:
> Solr's data is stored on the file system as a set of index files[
> http://stacko
Solr's data is stored on the file system as a set of index files[
http://stackoverflow.com/a/7685579/530153]. Then why do we need anything
like Solandra or DataStax Enterprise Search? Isn't Solr complete solution
in itself ? What do we need to integrate with Cassandra ?
I need to store binary byte data in Cassandra column family in all my
columns. Each columns will have its own binary byte data. Below is the code
where I will be getting binary byte data. My rowKey is going to be String
but all my columns has to store binary blobs data.
GenericDatumWriter
On 04/18/2013 12:06 AM, aaron morton wrote:
What version are you using ?
And what JDBC driver ?
Sounds like the driver is not converting the value to bytes for you.
I guess the problem may because of undefined
key_validation_class,default_validation_class and comparator etc.
If you are using
What version are you using ?
And what JDBC driver ?
Sounds like the driver is not converting the value to bytes for you.
> I guess the problem may because of undefined
> key_validation_class,default_validation_class and comparator etc.
If you are using CQL these are not relevant.
Cheers
--
Hi,
When I am trying to insert the data into a table using Java with JDBC, I
am getting the error
InvalidRequestException(why:cannot parse 'Jo' as hex bytes)
My insert quarry is:
insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);
This insert quarry is running successfully fro
Thanks for sharing this. We are also using Cassandra + Storm + Queue
messaging (Kestrel for now) and are always glad to learn.
Alain
2012/11/9 Brian O'Neill
> For those looking to index data in Cassandra with Elastic Search, here
> is what we decided to do:
>
> http://brianon
For those looking to index data in Cassandra with Elastic Search, here
is what we decided to do:
http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html
-brian
--
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog
Thnx a lot :) :)
On Sat, Apr 14, 2012 at 11:59 PM, aaron morton wrote:
> but using insert query of cql is not working because i have fields in my
> table which has null values for the columns and cassandra would not take
> null values.
>
> You do not need to insert the null values. How they are h
> but using insert query of cql is not working because i have fields in my
> table which has null values for the columns and cassandra would not take null
> values.
You do not need to insert the null values. How they are handled depends on the
.net client you are using. If you really want to i
I m able to connect cassandra and fetch rows from the cassandra database.
Now i want to insert the data from .net on to cassandra
but using insert query of cql is not working because i have fields in my
table which has null values for the columns and cassandra would not take
null values.
So now
Thanks :)
But finally i used Hector and it works fine :D
Date: Wed, 11 Apr 2012 17:19:15 +0200
From: berna...@gmail.com
To: user@cassandra.apache.org
Subject: Re: INserting data in Cassandra
On 04/11/12 11:42, Aliou SOW wrote:
And I
On 04/11/12 11:42, Aliou SOW wrote:
And I used the tool json2sstable, but that does not work, I always
have an error:
java.lang.RuntimeException: Can't write Super columns to the Standard
Column Family.
So I have two questions:
1) What I did wrong, must I define the complete structure of m
Hello,
Any help Or idea?
Thanks
From: aliouji...@hotmail.com
To: user@cassandra.apache.org
Subject: INserting data in Cassandra
Date: Wed, 11 Apr 2012 09:42:52 +
Hello
all,
We would like to
adopt Cassandra solution
for storing our biological data which
are essentially microarray
Hello,
We currently have data stored in a bi-temporal fashion using a DBRMS.
I've been trying to find recommendations on how to model bi-temporal using a
NoSQL database such as Cassandra but without much success. Does anybody in the
list have had success in doing such thing in Cassandra
I did propotyping very small Cassandra data browser on top of Wicket and
Hector. Try it :) http://goo.gl/lozFo
On 23 September 2011 08:41, mcasandra wrote:
> Are there any tools that let you scroll over data in Cassandra in html or
> UI?
>
> We are planning to encrypt data befor
Hello,
I'm trying to save geo-data in Cassandra,
according to SimpleGeo they did that using nested tree:
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
I wonder if someone already implement something like that and how he
accomplished that without transaction sup
Short answer: yes, this is normal.
Longer answer: this was discussed at length on this list a few days
ago, check the archives.
On Wed, Jul 14, 2010 at 10:55 PM, Hendro Kaskus
wrote:
> Hi everyone,
>
> I'm newbie to Cassandra :D.. I try to insert data from MySQL to Cassandra.
> Data dump from My
It could be that your Cassandra nodes haven't full compacted yet.
On Thu, Jul 15, 2010 at 5:55 AM, Hendro Kaskus wrote:
> Hi everyone,
>
> I'm newbie to Cassandra :D.. I try to insert data from MySQL to Cassandra.
> Data dump from MySQL is about 11 MB (64716 records). But when i'm insert to
> Cas
Hi everyone,
I'm newbie to Cassandra :D.. I try to insert data from MySQL to Cassandra.
Data dump from MySQL is about 11 MB (64716 records). But when i'm insert to
Cassandra, i think the data is become bigger than in MySQL. Is it true...???
Thanks
On Saturday, April 17, 2010, philip andrew wrote:
> Hi,
> Lets say I wanted to store 2 dimensional data in the database, each object
> has a X and Y location in a very large space.
> I want to query Cassandra for all objects within a rectangle.
>
You should look into Geohash (http://en.m.wikiped
Hi,
Lets say I wanted to store 2 dimensional data in the database, each object
has a X and Y location in a very large space.
I want to query Cassandra for all objects within a rectangle.
My understanding is that my objects can only be indexed by one key, one key
for each single object in my tabl
58 matches
Mail list logo