Re: Time series data model and tombstones

2017-02-08 Thread DuyHai Doan
Thanks for the update. Good to know that TWCS give you more stability On Wed, Feb 8, 2017 at 6:20 PM, John Sanda wrote: > I wanted to provide a quick update. I was able to patch one of the > environments that is hitting the tombstone problem. It has been running > TWCS for five days now, and thi

Re: Time series data model and tombstones

2017-02-08 Thread John Sanda
I wanted to provide a quick update. I was able to patch one of the environments that is hitting the tombstone problem. It has been running TWCS for five days now, and things are stable so far. I also had a patch to the application code to implement date partitioning ready to go, but I wanted to see

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
In theory, you're right and Cassandra should possibly skip reading cells having time < 50. But it's all theory, in practice Cassandra read chunks of xxx kilobytes worth of data (don't remember the exact value of xxx, maybe 64k or far less) so you may end up reading tombstones. On Sun, Jan 29, 2017

Re: Time series data model and tombstones

2017-01-29 Thread Jonathan Haddad
Check out our post on how to use TWCS before 3.0. http://thelastpickle.com/blog/2017/01/10/twcs-part2.html On Sun, Jan 29, 2017 at 11:20 AM John Sanda wrote: > It was with STCS. It was on a 2.x version before TWCS was available. > > On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote: > > Did y

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
Thanks for the clarification. Let's say I have a partition in an SSTable where the values of time range from 100 to 10 and everything < 50 is expired. If I do a query with time < 100 and time >= 50, are there scenarios in which Cassandra will have to read cells where time < 50? In particular I am w

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
"Should the data be sorted by my time column regardless of the compaction strategy" --> It does What I mean is that an old "chunk" of expired data in SSTABLE-12 may be compacted together with a new chunk of SSTABLE-2 containing fresh data so in the new resulting SSTable will contain tombstones AND

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
> > Since STCS does not sort data based on timestamp, your wide partition may > span over multiple SSTables and inside each SSTable, old data (+ > tombstones) may sit on the same partition as newer data. Should the data be sorted by my time column regardless of the compaction strategy? I didn't t

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Ok so give it a try with TWCS. Since STCS does not sort data based on timestamp, your wide partition may span over multiple SSTables and inside each SSTable, old data (+ tombstones) may sit on the same partition as newer data. When reading by slice, even if you request for fresh data, Cassandra ha

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
It was with STCS. It was on a 2.x version before TWCS was available. On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote: > Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? > > If you're using DTCS, beware of its weird behavior and tricky > configuration. > > On Sun, Jan

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? If you're using DTCS, beware of its weird behavior and tricky configuration. On Sun, Jan 29, 2017 at 3:52 PM, John Sanda wrote: > Your partitioning key is text. If you have multiple entries per id you are >> likely hitti

Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
> > Your partitioning key is text. If you have multiple entries per id you are > likely hitting older cells that have expired. Descending only affects how > the data is stored on disk, if you have to read the whole partition to find > whichever time you are querying for you could potentially hit to

Re: Time series data model and tombstones

2017-01-29 Thread kurt greaves
Your partitioning key is text. If you have multiple entries per id you are likely hitting older cells that have expired. Descending only affects how the data is stored on disk, if you have to read the whole partition to find whichever time you are querying for you could potentially hit tombstones i

Re: Time series data model and tombstones

2017-01-28 Thread Benjamin Roth
Maybe trace your queries to see what's happening in detail. Am 28.01.2017 21:32 schrieb "John Sanda" : Thanks for the response. This version of the code is using STCS. gc_grace_seconds was set to one day and then I changed it to zero since RF = 1. I understand that expired data will still generat

Re: Time series data model and tombstones

2017-01-28 Thread John Sanda
Thanks for the response. This version of the code is using STCS. gc_grace_seconds was set to one day and then I changed it to zero since RF = 1. I understand that expired data will still generate tombstones and that STCS is not the best. More recent versions of the code use DTCS, and we'll be switc

Re: Time series data model and tombstones

2017-01-28 Thread Jonathan Haddad
Since you didn't specify a compaction strategy I'm guessing you're using STCS. Your TTL'ed data is becoming a tombstone. TWCS is a better strategy for this type of workload. On Sat, Jan 28, 2017 at 8:30 AM John Sanda wrote: > I have a time series data model that is basically: > > CREATE TABLE met

Re: Time series data model and tombstones

2017-01-28 Thread DuyHai Doan
When the data expired (after TTL of 7 days), at the next compaction they are transformed into tombstonnes and will still stay there during gc_grace_seconds. After that, they (the tombstonnes) will be completely removed at the next compaction, if there is any ... So doing some maths, supposing that

Re: time series data model

2016-10-24 Thread kurt Greaves
On 20 October 2016 at 09:29, wxn...@zjqunshuo.com wrote: > I do need to align the time windows to day bucket to prevent one row > become too big, and event_time is timestamp since unix epoch. If I use > bigint as type of event_time, can I do queries as you mentioned? Yes. Kurt Greaves k...@ins

Re: time series data model

2016-10-20 Thread wxn...@zjqunshuo.com
series data model If event_time is timestamps since unix epoch you 1. may want to use the in-built timestamps type, and 2. order by event_time DESC. 2 applies if you want to do queries such as "select * from eventdata where ... and event_time > x" (i.e; get latest events). Other t

Re: time series data model

2016-10-20 Thread wxn...@zjqunshuo.com
| speed --+--+---+-+---++--- 186628 | 20160928 | 1474992002005 | 48 | 30.343443 | 120.087514 |41 -Simon Wu From: kurt Greaves Date: 2016-10-20 16:23 To: user Subject: Re: time series data model Ah didn't pick up on that but looks like he's storing JSON within posit

Re: time series data model

2016-10-20 Thread kurt Greaves
If event_time is timestamps since unix epoch you 1. may want to use the in-built timestamps type, and 2. order by event_time DESC. 2 applies if you want to do queries such as "select * from eventdata where ... and event_time > x" (i.e; get latest events). Other than that your model seems workable,

Re: time series data model

2016-10-20 Thread kurt Greaves
Ah didn't pick up on that but looks like he's storing JSON within position. Is there any strong reason for this or as Vladimir mentioned can you store the fields under "position" in separate columns? Kurt Greaves k...@instaclustr.com www.instaclustr.com On 20 October 2016 at 08:17, Vladimir Yudov

Re: time series data model

2016-10-20 Thread Vladimir Yudovin
Hi Simon, Why position is text and not float? Text takes much more place. Also speed and headings can be calculated basing on latest positions, so you can also save them. If you really need it in data base you can save them as floats, or compose single float value like speed.heading: 41.173 (or

Re: Time-series data model

2010-04-15 Thread Dan Di Spaltro
This is actually fairly similar to how we store metrics at Cloudkick. Below has a much more in depth explanation of some of that https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ So we store each natural point in the NumericArchive table. our keys look like: . Anyway

Re: Time-series data model

2010-04-15 Thread Ted Zlatanov
On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin wrote: JB> Am 14.04.2010 15:22, schrieb Ted Zlatanov: >> On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin" >> wrote: >> JB> The metrics are stored together with a timestamp. The queries we want to JB> perform are: JB> * The last

Re: Time-series data model

2010-04-15 Thread Jean-Pierre Bergamin
Am 14.04.2010 15:22, schrieb Ted Zlatanov: On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin" wrote: JB> The metrics are stored together with a timestamp. The queries we want to JB> perform are: JB> * The last value of a specific metric of a device JB> * The values of a specific m

Re: Time-series data model

2010-04-15 Thread Ilya Maykov
Hi Jean-Pierre, I'm investigating using Cassandra for a very similar use case, maybe we can chat and compare notes sometime. But basically, I think you want to pull the metric name into the row key and use simple CF instead of SCF. So, your example: "my_server_1": { "cpu_usage": {

Re: Time-series data model

2010-04-14 Thread alex kamil
James, i'm a big fan of Cassandra, but have you looked at http://en.wikipedia.org/wiki/RRDtool is is natively built for this type of problem Alex On Wed, Apr 14, 2010 at 9:02 AM, Jean-Pierre Bergamin wrote: > Hello everyone > > We are currently evaluating a new DB system (replacing MySQL) to st

Re: Time-series data model

2010-04-14 Thread Ted Zlatanov
On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin" wrote: JB> The metrics are stored together with a timestamp. The queries we want to JB> perform are: JB> * The last value of a specific metric of a device JB> * The values of a specific metric of a device between two timestamps t1 and

Re: Time-series data model

2010-04-14 Thread Zhiguo Zhang
first of all I am a new bee by Non-SQL. I try write down my opinions as references: If I were you, I will use 2 columnfamilys: 1.CF, key is devices 2.CF, key is timeuuid how do u think about that? Mike On Wed, Apr 14, 2010 at 3:02 PM, Jean-Pierre Bergamin wrote: > Hello everyone > > We are