data model for TWCS+TTL

2020-06-04 Thread Arvinder Dhillon
Hi eveyone, In our use-case, we need to insert 200 millions rows per day. By default we need to retain data for 10 days unless a certain condition is matched from client within same day(in that case we need to update ONE column and set ttl to 1 day). In 98% of cases we will find that match and 2%

Re: Re: Data model storage optimization

2018-07-30 Thread James Shaw
considering: row size large or not update a lot or not - update is insert actually read heavy or not overall read performance if row size large , you may consider table:user_detail , add column id in all tables. In application side, merge/join by id. But paid read price, 2nd query to user_de

Fwd: Re: Data model storage optimization

2018-07-29 Thread onmstester onmstester
How many rows in average per partition? around 10K. Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ? We are just analyzing output log

Re: Data model storage optimization

2018-07-29 Thread Rahul Singh
a storage optimization problem - it’s a data architecture problem. Rahul On Jul 28, 2018, 3:11 AM -0400, onmstester onmstester , wrote: > The current data model described as table name: > ((partition_key),cluster_key),other_column1,other_column2,... > > user_by_name: ((time_bucket, u

Data model storage optimization

2018-07-28 Thread onmstester onmstester
The current data model described as table name: ((partition_key),cluster_key),other_column1,other_column2,... user_by_name: ((time_bucket, username)),ts,request,email user_by_mail: ((time_bucket, email)),ts,request,username The reason that all 2 keys (username, email) repeated in all tables is

Data Model for One To Many - Itemcontainer - Items

2018-06-05 Thread malte
hi, i have two CFs "ItemContainer" and "Items". I used to have a secondary index in "Items" referring to the "Itemcontainer". Something like: CREATE table items (key uuid primary key, container uuid, slot int CREATE INDEX items_container ON items(container) i change the "container" cel

Re: Cassandra data model too many table

2018-02-18 Thread Jeff Jirsa
row as input with rate of 10K per seconds. Each > row has 20 columns. Some queries should be answered on these inputs. Because > most of queries needs different where, group by or orderby, The final data > model ended up like this: > primary key for table of query1 : ((column1,

Re: Cassandra data model too many table

2018-02-18 Thread Rahul Singh
, 2018, 6:38 AM -0500, onmstester onmstester , wrote: > I have a single structured row as input with rate of 10K per seconds. Each > row has 20 columns. Some queries should be answered on these inputs. Because > most of queries needs different where, group by or orderby, The final data

Cassandra data model too many table

2018-02-18 Thread onmstester onmstester
I have a single structured row as input with rate of 10K per seconds. Each row has 20 columns. Some queries should be answered on these inputs. Because most of queries needs different where, group by or orderby, The final data model ended up like this: primary key for table of query1

Re: Data Model Suggestion Required

2017-07-11 Thread Siddharth Prakash Singh
Thanks Jeff for suggestions. On Mon, Jul 10, 2017 at 9:50 PM Jeff Jirsa wrote: > > > On 2017-07-10 07:13 (-0700), Siddharth Prakash Singh > wrote: > > I am planning to build a user activity timeline. Users on our system > > generates different kind of activity. For example - Search some product

Re: Data Model Suggestion Required

2017-07-10 Thread Jeff Jirsa
On 2017-07-10 07:13 (-0700), Siddharth Prakash Singh wrote: > I am planning to build a user activity timeline. Users on our system > generates different kind of activity. For example - Search some product, > Calling our sales team, Marking favourite etc. > Now I would like to generate timeline

Data Model Suggestion Required

2017-07-10 Thread Siddharth Prakash Singh
I am planning to build a user activity timeline. Users on our system generates different kind of activity. For example - Search some product, Calling our sales team, Marking favourite etc. Now I would like to generate timeline based on these activities. Timeline could be for all events, filtered on

Re: Cassandra data model right definition

2017-03-28 Thread Oskar Kjellin
It's not that easy as I recall this email thread https://groups.google.com/forum/m/#!topic/nosql-databases/ZLdgwCT_PNU /Oskar > On 30 Sep 2016, at 18:40, Carlos Alonso wrote: > > Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra > > Carlos Alonso | Software Engineer

Re: Time series data model and tombstones

2017-02-08 Thread DuyHai Doan
Thanks for the update. Good to know that TWCS give you more stability On Wed, Feb 8, 2017 at 6:20 PM, John Sanda wrote: > I wanted to provide a quick update. I was able to patch one of the > environments that is hitting the tombstone problem. It has been running > TWCS for five days now, and thi

Re: Time series data model and tombstones

2017-02-08 Thread John Sanda
I wanted to provide a quick update. I was able to patch one of the environments that is hitting the tombstone problem. It has been running TWCS for five days now, and things are stable so far. I also had a patch to the application code to implement date partitioning ready to go, but I wanted to see

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
In theory, you're right and Cassandra should possibly skip reading cells having time < 50. But it's all theory, in practice Cassandra read chunks of xxx kilobytes worth of data (don't remember the exact value of xxx, maybe 64k or far less) so you may end up reading tombstones. On Sun, Jan 29, 2017

Re: Time series data model and tombstones

2017-01-29 Thread Jonathan Haddad
Check out our post on how to use TWCS before 3.0. http://thelastpickle.com/blog/2017/01/10/twcs-part2.html On Sun, Jan 29, 2017 at 11:20 AM John Sanda wrote: > It was with STCS. It was on a 2.x version before TWCS was available. > > On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote: > > Did y

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
Thanks for the clarification. Let's say I have a partition in an SSTable where the values of time range from 100 to 10 and everything < 50 is expired. If I do a query with time < 100 and time >= 50, are there scenarios in which Cassandra will have to read cells where time < 50? In particular I am w

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
"Should the data be sorted by my time column regardless of the compaction strategy" --> It does What I mean is that an old "chunk" of expired data in SSTABLE-12 may be compacted together with a new chunk of SSTABLE-2 containing fresh data so in the new resulting SSTable will contain tombstones AND

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
> > Since STCS does not sort data based on timestamp, your wide partition may > span over multiple SSTables and inside each SSTable, old data (+ > tombstones) may sit on the same partition as newer data. Should the data be sorted by my time column regardless of the compaction strategy? I didn't t

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Ok so give it a try with TWCS. Since STCS does not sort data based on timestamp, your wide partition may span over multiple SSTables and inside each SSTable, old data (+ tombstones) may sit on the same partition as newer data. When reading by slice, even if you request for fresh data, Cassandra ha

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
It was with STCS. It was on a 2.x version before TWCS was available. On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote: > Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? > > If you're using DTCS, beware of its weird behavior and tricky > configuration. > > On Sun, Jan

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? If you're using DTCS, beware of its weird behavior and tricky configuration. On Sun, Jan 29, 2017 at 3:52 PM, John Sanda wrote: > Your partitioning key is text. If you have multiple entries per id you are >> likely hitti

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
> > Your partitioning key is text. If you have multiple entries per id you are > likely hitting older cells that have expired. Descending only affects how > the data is stored on disk, if you have to read the whole partition to find > whichever time you are querying for you could potentially hit to

Re: Time series data model and tombstones

2017-01-29 Thread kurt greaves
Your partitioning key is text. If you have multiple entries per id you are likely hitting older cells that have expired. Descending only affects how the data is stored on disk, if you have to read the whole partition to find whichever time you are querying for you could potentially hit tombstones i

Re: Time series data model and tombstones

2017-01-28 Thread Benjamin Roth
> STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy > for this type of workload. > On Sat, Jan 28, 2017 at 8:30 AM John Sanda wrote: > >> I have a time series data model that is basically: >> >> CREATE TABLE metrics ( >> id text, >

Re: Time series data model and tombstones

2017-01-28 Thread John Sanda
ll be switching over to TWCS shortly. The suggestions raised are excellent ones, but I tend to think of them as optimizations that might not address my issue which I think may be 1) a problem with my data model, 2) problem with the queries used or 3) some misunderstanding of Cassandra performs range scan

Re: Time series data model and tombstones

2017-01-28 Thread Jonathan Haddad
Since you didn't specify a compaction strategy I'm guessing you're using STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy for this type of workload. On Sat, Jan 28, 2017 at 8:30 AM John Sanda wrote: > I have a time series data model that is basi

Re: Time series data model and tombstones

2017-01-28 Thread DuyHai Doan
at 5:30 PM, John Sanda wrote: > I have a time series data model that is basically: > > CREATE TABLE metrics ( > id text, > time timeuuid, > value double, > PRIMARY KEY (id, time) > ) WITH CLUSTERING ORDER BY (time DESC); > > I do append-only writes,

Time series data model and tombstones

2017-01-28 Thread John Sanda
I have a time series data model that is basically: CREATE TABLE metrics ( id text, time timeuuid, value double, PRIMARY KEY (id, time) ) WITH CLUSTERING ORDER BY (time DESC); I do append-only writes, no deletes, and use a TTL of seven days. Data points are written every seconds

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth
>* wrote >>> >>> The solution maybe work. However, the play list will grow over time and >>> somebody maybe has ten thousands that will slow down the query and sort . >>> Do you mean the oldest one should be removed when a new play is added? >>> >>> BTW, the ve

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Gang Liu
that will slow down the query and sort . >> Do you mean the oldest one should be removed when a new play is added? >> >> BTW, the version is 2.1.16 in our live system. >> >> >> BRs, >> >> BEN >> -- >> >>

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth
gt; >>> On Wed, 09 Nov 2016 20:47:05 -0500*Diamond ben >>> >* wrote >>> >>> The solution maybe work. However, the play list will grow over time and >>> somebody maybe has ten thousands that will slow down the query and sort . >>> Do you

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Carlos Alonso
list will grow over time and >>> somebody maybe has ten thousands that will slow down the query and sort . >>> Do you mean the oldest one should be removed when a new play is added? >>> >>> BTW, the version is 2.1.16 in our live system. >>> >>> >

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Benjamin Roth
removed when a new play is added? >> >> BTW, the version is 2.1.16 in our live system. >> >> >> BRs, >> >> BEN >> -- >> >> *发件人:* Vladimir Yudovin >> *发送时间:* 2016年11月9日 18:11:26 >> *收件人:* user

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Carlos Alonso
ten thousands that will slow down the query and sort . > Do you mean the oldest one should be removed when a new play is added? > > BTW, the version is 2.1.16 in our live system. > > > BRs, > > BEN > -- > > *发件人:* Vladimir Yudovin >

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Vladimir Yudovin
is added? BTW, the version is 2.1.16 in our live system. BRs, BEN 发件人: Vladimir Yudovin <vla...@winguzone.com> 发送时间: 2016年11月9日 18:11:26 收件人: user 主题: Re: 答复: A difficult data model with C* You are welcome! ) >recent ten movies watched by the user within 30 days. In this ca

答复: 答复: A difficult data model with C*

2016-11-09 Thread Diamond ben
发件人: Vladimir Yudovin 发送时间: 2016年11月9日 18:11:26 收件人: user 主题: Re: 答复: A difficult data model with C* You are welcome! ) >recent ten movies watched by the user within 30 days. In this case you can't use PRIMARY KEY (user_name, video_id), as video_id is demanded to fetch row

Re: 答复: A difficult data model with C*

2016-11-09 Thread Vladimir Yudovin
BEN 发件人: Vladimir Yudovin <vla...@winguzone.com> 发送时间: 2016年11月8日 22:35:48 收件人: user 主题: Re: A difficult data model with C* Hi Ben, if need very limited number of positions (as you said ten) may be you can store them in LIST of UDT? Or just as JSON string? So you'll have

答复: A difficult data model with C*

2016-11-08 Thread ben ben
e and continue to watch from the last position she/he did. BTW, a movie can be watched several times by a user and the last positon is needed indeed. BRs, BEN 发件人: Vladimir Yudovin 发送时间: 2016年11月8日 22:35:48 收件人: user 主题: Re: A difficult data model with C* H

Re: A difficult data model with C*

2016-11-08 Thread Vladimir Yudovin
Hi Ben, if need very limited number of positions (as you said ten) may be you can store them in LIST of UDT? Or just as JSON string? So you'll have one row per each pair user-video. It can be something like this: CREATE TYPE play (position int, last_time timestamp); CREATE TABLE recent

Re: A difficult data model with C*

2016-11-07 Thread Mickael Delanoë
Which version of Cassandra are you using? If this is a 3.0 or higher, why don't you create a materialized view for ypur base table with the last time as the first clustering colum? However : It need to be confirmed if this is not an anti-pattern for cassandra as this materialized view will have a

Re: A difficult data model with C*

2016-11-07 Thread Dikang Gu
Agree, change the last_time to be descending order will help, you can also TTL the data, so that the old records will be purged by Cassandra. --Dikang. On Mon, Nov 7, 2016 at 10:39 PM, Alain Rastoul wrote: > On 11/08/2016 03:54 AM, ben ben wrote: > >> Hi guys, >>CREATE TABLE recent ( >>

Re: A difficult data model with C*

2016-11-07 Thread Alain Rastoul
On 11/08/2016 03:54 AM, ben ben wrote: Hi guys, CREATE TABLE recent ( user_name text, vedio_id text, position int, last_time timestamp, PRIMARY KEY (user_name, vedio_id) ) Hi Ben, May be a clustering columns order would help CREATE TABLE recent ( ... ) WITH CLUSTER

A difficult data model with C*

2016-11-07 Thread ben ben
Hi guys, We are maintaining a system for an on-line video service. ALL users' viewing records of every movie are stored in C*. So she/he can continue to enjoy the movie from the last point next time. The table is designed as below: CREATE TABLE recent ( user_name text, vedio_id text,

Re: time series data model

2016-10-24 Thread kurt Greaves
On 20 October 2016 at 09:29, wxn...@zjqunshuo.com wrote: > I do need to align the time windows to day bucket to prevent one row > become too big, and event_time is timestamp since unix epoch. If I use > bigint as type of event_time, can I do queries as you mentioned? Yes. Kurt Greaves k...@ins

Re: time series data model

2016-10-20 Thread wxn...@zjqunshuo.com
series data model If event_time is timestamps since unix epoch you 1. may want to use the in-built timestamps type, and 2. order by event_time DESC. 2 applies if you want to do queries such as "select * from eventdata where ... and event_time > x" (i.e; get latest events). Other t

Re: time series data model

2016-10-20 Thread wxn...@zjqunshuo.com
| speed --+--+---+-+---++--- 186628 | 20160928 | 1474992002005 | 48 | 30.343443 | 120.087514 |41 -Simon Wu From: kurt Greaves Date: 2016-10-20 16:23 To: user Subject: Re: time series data model Ah didn't pick up on that but looks like he's storing JSON within posit

Re: time series data model

2016-10-20 Thread kurt Greaves
eries data which is GPS trace from mysql to > C*. I want a wide row to hold one day data. I designed the data model as > below. Please help to see if there is any problem. Any suggestion is > appreciated. > > Table Model: > CREATE TABLE cargts.eventdata ( > dev

Re: time series data model

2016-10-20 Thread kurt Greaves
t; > > On Thu, 20 Oct 2016 03:29:16 -0400* >* wrote > > Hi All, > I'm trying to migrate my time series data which is GPS trace from mysql to > C*. I want a wide row to hold one day data. I designed the data model as > below. Please help to see if there

Re: time series data model

2016-10-20 Thread Vladimir Yudovin
series data which is GPS trace from mysql to C*. I want a wide row to hold one day data. I designed the data model as below. Please help to see if there is any problem. Any suggestion is appreciated. Table Model: CREATE TABLE cargts.eventdata ( deviceid int, date int, event_

time series data model

2016-10-20 Thread wxn...@zjqunshuo.com
Hi All, I'm trying to migrate my time series data which is GPS trace from mysql to C*. I want a wide row to hold one day data. I designed the data model as below. Please help to see if there is any problem. Any suggestion is appreciated. Table Model: CREATE TABLE cargts.eventdata ( dev

Re: Cassandra data model right definition

2016-10-14 Thread selcuk mart
unsubscribe 3.10.2016 16:25 tarihinde Edward Capriolo yazdı: The phrase is defensible, but that is the root of the problem. Take for example a skateboard. "A skateboard is like a bike because it has wheels and you ride on it." That is true and defensively true. :) However with not much more

Re: Cassandra data model right definition

2016-10-04 Thread Mehdi Bada
Hi all, Just to refocus the debat (because I'm the at the origin of this very interesting exchanges). I think for a good understanding of the data model of any DMBS, we have (technical experts) to decompose the data objects of the model and understand how the data is precisely store

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
I did not ascribe blame. I only empathised with their predicament; I don't want to listen to either of us, either! On 3 October 2016 at 19:45, Edward Capriolo wrote: > You know what don't "go low" and suggest the recent un-subscriber on me. > > If your so eager to deal with my pull request

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
You know what don't "go low" and suggest the recent un-subscriber on me. If your so eager to deal with my pull request please review this one: I would rather you review this pull request: https://issues.apache.org/jira/browse/CASSANDRA-10825 On Mon, Oct 3, 2016 at 1:04 PM, Benedict Elliott Sm

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
Nobody is disputing that the docs can and should be improved to avoid this misreading. I've invited Ed to file a JIRA and/or pull request twice now. You are of course just as welcome to do this. Perhaps you will actually do it, so we can all move on with our lives! On 3 October 2016 at 17:45

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
"X-store" refers to how data is stored, in almost every case it refers to what logical constructs are grouped together physically on disk. It has nothing to do with whether a database is relational or not. Cassandra does, in fact meet the definition of row-store, however, I would like to re-itera

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
I've met clients that read the cassandra docs and then said in a big meeting "it's just like relational database, it has tables just like sqlserver/oracle." I'm not putting words in other people's mouth either, but I've heard that said enough times to want to puke. Does the docs claim cassandra is

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
... and my response can be summed up as "you are not parsing English correctly." The word "like" does not mean what you think it means in this context. It does not mean "close relative." It is constrained to the similarities expressed, and no others. You don't seem to be reading any of my respo

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
It's a row store because its schemed (vs ad hoc documents), and data (rows) are stored together. What would you call the things you iterate over when you query a partition? Rows. That makes it a thing that stores "rows" of data, row store isn't some crazy stretch. On Mon, Oct 3, 2016 at 12:33 PM Jo

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
Nobody is claiming Cassandra is a relational I'm not sure why that keeps coming up. On Mon, Oct 3, 2016 at 10:53 AM Edward Capriolo wrote: > My original point can be summed up as: > > Do not define cassandra in terms SMILES & METAPHORS. Such words include > "like" and "close relative". > > For th

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
My original point can be summed up as: Do not define cassandra in terms SMILES & METAPHORS. Such words include "like" and "close relative". For the specifics: Any relational db could (and I'm sure one does!) allow for sparse fields as well. MySQL can be backed by rocksdb now, does that make it n

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
A couple things I would like to note: 1. Cassandra does not determine how data is stored on disk, the compaction strategy does. One could, in theory, (and I believe some are trying) could create a column-store compaction strategy. There is a large effort in the database community overall to sepa

Re: Cassandra data model right definition

2016-10-03 Thread Peter Lin
Whether a storage engine requires schema isn't really critical for row oriented storage. How about CSV that doesn't have a header row? CSV is probably the most commonly used row oriented storage and tons of businesses still use it for B2B transactions. As you pointed out, some traditional RDBMS ha

Re: Cassandra data model right definition

2016-10-03 Thread Jonathan Haddad
Sorry Ed, but you're really stretching here. A table in Cassandra is structured by a schema with the data for each row stored together in each data file. Just because it uses log structured storage, sparse fields, and semi-flexible collections doesn't disqualify it from calling it a "row store" Po

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
The equivalent statement would be: "Like a bike, a scooter has wheels." This is a really important linguistic distinction you seem to be glossing over. It is not saying "A is like X," it is saying "A has specific traits in common with X." For example "Like cancer, heart disease is a leading cau

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
Also every piece of techincal information that describes a rowstore http://cs-www.cs.yale.edu/homes/dna/talks/abadi-sigmod08-slides.pdf https://en.wikipedia.org/wiki/Column-oriented_DBMS#Row-oriented_systems Does it like this: 001:10,Smith,Joe,4; 002:12,Jones,Mary,5; 003:11,Johnson,Cathy

Re: Cassandra data model right definition

2016-10-03 Thread Edward Capriolo
The phrase is defensible, but that is the root of the problem. Take for example a skateboard. "A skateboard is like a bike because it has wheels and you ride on it." That is true and defensively true. :) However with not much more text you can accurately describe what it is, as opposed to somethi

Re: Cassandra data model right definition

2016-10-03 Thread Benedict Elliott Smith
While that sentence leaves a lot to be desired (for me because it confers a different meaning on row store), it doesn't say "Cassandra is like a RDBMS" - it says "like an RDBMS, it organises data by rows and columns" - i.e., in this regard only it is like an RDBMS, not more generally. I believe it

Re: Cassandra data model right definition

2016-10-01 Thread Peter Lin
I'll second Ed's comment. The documentation should be more careful when using phrases "like relational databases". When we look at the history of relational databases, people expect certain things like ACID transactions, primary/foriegn key constraints, query planners, joins and relational algebra

Re: Cassandra data model right definition

2016-10-01 Thread Edward Capriolo
https://github.com/apache/cassandra Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL. I generally do not know what to say about these high level

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
I can iterate over JSON data stored in mongo and present it as a table with rows and columns. It does not make mongo a rowstore. On Fri, Sep 30, 2016 at 9:16 PM, Edward Capriolo wrote: > The problem with calling it a row store: > > https://en.wikipedia.org/wiki/Row_(database) > > In the context

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
The problem with calling it a row store: https://en.wikipedia.org/wiki/Row_(database) In the context of a relational database , a *row*—also called a record or tuple

Re: Cassandra data model right definition

2016-09-30 Thread Russell Bradberry
-store, which is just utterly backwards as you mentioned. From: Benedict Elliott Smith Reply-To: Date: Friday, September 30, 2016 at 5:12 PM To: Subject: Re: Cassandra data model right definition Absolutely. A "partitioned row store" is exactly what I would call it. As it ha

Re: Cassandra data model right definition

2016-09-30 Thread Benedict Elliott Smith
Absolutely. A "partitioned row store" is exactly what I would call it. As it happens, our README thinks the same, which is fantastic. I thought I'd take a look at the rest of our cohort, and didn't get far before disappointment. HBase literally calls itself a "*column-oriented* store" - which i

Re: Cassandra data model right definition

2016-09-30 Thread Jonathan Haddad
+1000 to what Benedict says. I usually call it a "partitioned row store" which usually needs some extra explanation but is more accurate than "column family" or whatever other thrift era terminology people still use. On Fri, Sep 30, 2016 at 1:53 PM DuyHai Doan wrote: > I used to present Cassandra

Re: Cassandra data model right definition

2016-09-30 Thread Edward Capriolo
Then: Physically: A data store which physically structured-log-merge of SSTables (see) https://cloud.google.com/bigtable/. Now: One of the change made in Apache Cassandra 3.0 is a relatively important refactor of the storage engine . I say refac

Re: Cassandra data model right definition

2016-09-30 Thread DuyHai Doan
I used to present Cassandra as a NoSQL datastore with "distributed" table. This definition is closer to CQL and has some academic background (distributed hash table). On Fri, Sep 30, 2016 at 7:43 PM, Benedict Elliott Smith wrote: > Cassandra is not a "wide column store" anymore. It has a schem

Re: Cassandra data model right definition

2016-09-30 Thread Benedict Elliott Smith
Cassandra is not a "wide column store" anymore. It has a schema. Only thrift users no longer think they have a schema (though they do), and thrift is being deprecated. I really wish everyone would kill the term "wide column store" with fire. It seems to have never meant anything beyond "schema-l

Re: Cassandra data model right definition

2016-09-30 Thread Joaquin Casares
Hi Mehdi, I can help clarify a few things. As Carlos said, Cassandra is a Wide Column Store. Theoretically a row can have 2 billion columns, but in practice it shouldn't have more than 100 million columns. Cassandra partitions data to certain nodes based on the partition key(s), but does provide

Re: Cassandra data model right definition

2016-09-30 Thread Carlos Alonso
Cassandra is a Wide Column Store http://db-engines.com/en/system/Cassandra Carlos Alonso | Software Engineer | @calonso On 30 September 2016 at 18:24, Mehdi Bada wrote: > Hi all, > > I have a theoritical question: > - Is Apache Cassandra really a column store? > Co

Cassandra data model right definition

2016-09-30 Thread Mehdi Bada
Hi all, I have a theoritical question: - Is Apache Cassandra really a column store? Column store mean storing the data as column rather than as a rows. In fact C* store the data as row, and data is partionned with row key. Finally, for me, Cassandra is a row oriented schema less DBMS Is

Re: Optimising the data model for reads

2016-09-29 Thread Romain Hardouin
Hi Julian, The problem with any deletes here is that you can *read* potentially many tombstones. I mean you have two concerns: 1. Avoid to read tombstones during a query 2. How to evict tombstones as quickly as possible to reclaim disk space     The first point is a data model consideration

Optimising the data model for reads

2016-09-29 Thread Thomas Julian
he results for the queries stated at less than 10ms. I deployed this in production(Cassandra 2.1.13), It was working perfectly for a month or two. But now at times the queries are taking 5s to 10s. On analysing further, I found that few users are deleting the files too frequently. This generates too many tombstones. I have set the gc_grace_seconds to the default 10 days and I have chosen SizeTieredCompactionStrategy. I want to optimise this Data Model for read efficiency. Any help is much appreciated. Best Regards, Julian.

Re: Cassandra Data Model with Narrow partition

2015-10-30 Thread Jeff Jirsa
To: "user@cassandra.apache.org" Subject: Re: Cassandra Data Model with Narrow partition Hi Chandra, Narrow partition is probably your best choice, but you need to bucket data somehow, otherwise your partitions will soon become unmanageable and you'll have problems reading them, bo

Re: Cassandra Data Model with Narrow partition

2015-10-30 Thread Kai Wang
. There aren’t any updates/deletes to >> records inserted. Records are inserted with a TTL of 60 days (retention >> period) >> >> 2) The table has a single primary key which is a sequence number >> (27 digits) generated by source application >> >

Re: Cassandra Data Model with Narrow partition

2015-10-30 Thread Carlos Alonso
ed with a TTL of 60 days (retention > period) > > 2) The table has a single primary key which is a sequence number (27 > digits) generated by source application > > 3) There are only two access patterns used – one by using the > sequence number & the other using se

Cassandra Data Model with Narrow partition

2015-10-30 Thread chandrasekar.krc
sible) 4) My target data model in Cassandra is partitioned with sequence number as the primary key + event date as clustering columns to enable range scans on date. 5) The Table has close to 120+ columns and the average row size comes close to 32K bytes 6) Reads are very very

Re: Tag filtering data model

2015-09-19 Thread Artur Siekielski
I came to a similar conclusion, that is if you have more than a few tags, then the problem is no more simple "tagging" but more like regular "document search" with indexed words. There are too many word subsets to precompute matching documents, so you need to index documents individually and co

Re: Tag filtering data model

2015-09-16 Thread Naresh Yadav
We also had similar usecase, after lot of trials with cassandra, we finally created solr schema doc_id(unique key), tags(indexed) in apache solr for answering search query "Get me matching docs by any given no of tags" and that solved our usecase. We had usecase of millions of docs and in tags we c

Re: Tag filtering data model

2015-09-15 Thread Carlos Alonso
Really interesting question Artur. Have you gone any further? I think, based on my experience and recalling Cassandra's good practices, that full denormalisation is the Cassandra way to go. Cheers Carlos Alonso | Software Engineer | @calonso On 11 September 2015 at

Tag filtering data model

2015-09-11 Thread Artur Siekielski
I store documents submitted by users, with optional tags (lists of strings): CREATE TABLE doc ( user_id uuid, date text, // part of partition key, to distribute data better doc_id uuid, tags list, contents text, PRIMARY KEY((user_id, date), doc_id) ); What is the best way to implemen

Re: CQL Data Model question

2015-05-14 Thread Alaa Zubaidi (PDF)
Thanks Ngoc, Jack On Tue, May 12, 2015 at 4:56 AM, Ngoc Minh VO wrote: > Hello, > > > > The problem with your approach is: you will need to specify all the 30 > filters (in the pre-defined order in PK) when querying. > > > > I would go for this data model: >

Re: CQL Data Model question

2015-05-12 Thread Jack Krupansky
Porting an SQL data model to Cassandra is an anti-pattern - don't do it! Instead, focus on developing a new data model that capitalizes on the key strengths of Cassandra - distributed, scalable, fast writes, fast direct access. Complex and ad-hoc queries are anti-patterns as well. I'll l

RE: CQL Data Model question

2015-05-12 Thread Ngoc Minh VO
Hello, The problem with your approach is: you will need to specify all the 30 filters (in the pre-defined order in PK) when querying. I would go for this data model: CREATE TABLE t ( name text, filter_name1 text, filter_value1 text, filter_name2 text, filter_value2 text

CQL Data Model question

2015-05-11 Thread Alaa Zubaidi (PDF)
Hi, I am trying to port an Oracle Table to Cassandra. the table is a wide table (931 columns) and could have millions of rows. name, filter1, filter2filter30, data1, data2...data900 The user would retrieve multiple rows from this table and filter (30 filter columns) by one or more (up to 3)

Re: Data model suggestions

2015-04-27 Thread Laing, Michael
ndra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__auto_snapshot >> >> >> >> >> >> *From:* Ali Akhtar [mailto:ali.rac...@gmail.com] >> *Sent:* Sunday, April 26, 2015 10:31 PM >> >> *To:* user@cassandra.apache.org >> *Subjec

Re: Data model suggestions

2015-04-27 Thread Ali Akhtar
...@gmail.com] > *Sent:* Sunday, April 26, 2015 10:31 PM > > *To:* user@cassandra.apache.org > *Subject:* Re: Data model suggestions > > > > Thanks Peer. I like the approach you're suggesting. > > > > Why do you recommend truncating the last active table rat

RE: Data model suggestions

2015-04-26 Thread Peer, Oded
/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__auto_snapshot From: Ali Akhtar [mailto:ali.rac...@gmail.com] Sent: Sunday, April 26, 2015 10:31 PM To: user@cassandra.apache.org Subject: Re: Data model suggestions Thanks Peer. I like the approach you're sugge

  1   2   3   4   5   >