Re: Beginner Assumptions

Thomas Heller Sun, 13 Jun 2010 04:43:36 -0700

Hey,

I'm sorry, I think I didnt make myself clear enough. I'm using
cassandra only the store the _results_ (the calculated time series)
not the source data. Also using "Beginner Assumptions" as the Subject
propably wasnt the best choice since I'm more interested in the inner
workings of cassandra than how to use it. ;)


> And the per hour counts are stored as json?

No, they are stored as byte arrays with a fixed size (96 = 24x4byte integers).

>  cassandra.get("/page/1", Slice("20100612"..."20100613"))

I know how to do it in cassandra, I just was comparing it to others. I
was interested to know if

cassandra.get("/page/1", :start => "20100612", :count => 90)
is actually just as fast as
cassandra.get("/page/1", Slice("20100612", "20100613", ...)) with 90 keys

>
>> Assumption #3:
> I doubt you data will grow at a fixed rate per row. (Unless you have
> always the same hit pattern for your pages) But you should be able to
> able to calculated the maximal required storage requirement. That said
> - I am wondering... where are you aggregating the counts per hour?

The Data is currently just stored in logfiles which are parsed once an
hour in a map/reduce like fashion (not stored in cassandra). Even if
there are no values to be saved there will still be a column for this
row with [0, 0, 0, ...]. I also do not need to increment any of those
counters live. Hit Patterns dont matter since 1million views per hour
consume just the same space as 0 views (96 bytes fixed). I may at some
time remove the 0 values to save space but right now there is always
one column per day per row.

>
> So you want to increment those counters per hit? I don't think there
> is an atomic increment semantic in cassandra yet. (Some one else to
> confirm?)

No, see above. Each View generates one entry in a logfile which is
append only (much like the cassandra commitlog). Incrementing those
counters live is very unlikely to happen, since they are just one part
of the whole log map/reduce thing. The offline processing part is not
moving into cassandra anytime soon, I just wanna put the results
somewhere. SQL is fine for that (atm) but I was interested in some
NoSQL and this seemed like a good usecase (very structed data, only
accessed by keys or key ranges but the key is always known, aka no
dynamic queries)

Cheers,
/thomas

Re: Beginner Assumptions

Reply via email to