Re: what happen if coordinator node fails during write

2013-06-29 Thread Shahab Yunus
Aaron,

Can you explain a bit when you say that the client needs to support Atomic
Batches in 1.2 and Hector doesn't support it? Does it mean that there is no
way of using atomic batch of inserts through Hector? Or did I misunderstand
you? Feel free to point me to any link or resource, thanks.

Regards,
Shahab

On Friday, June 28, 2013, aaron morton wrote:

> As far as I know in 1.2 coordinator logs request before it updates
> replicas.
>
> You may be thinking about atomic batches, which are enabled by default for
> 1.2 via CQL but must be supported by Thrift clients. I would guess Hector
> is not using them.
> These logs are stored on other machines, which then reply the mutation if
> they have not been removed by a certain time.
>
>
>> I am writing data to Cassandra by thrift client (not hector) and
>> wonder what happen if the coordinator node fails.
>
> How and when it fails is important.
> But lets say their was an OS level OOM situation and the process was
> killed just after it sent messages to the remote replicas. In that case all
> you know if the request was applied on 0 to RF number of replicas. So it's
> the same as a TimedOutException.
>
> The request did not complete at the request CL so reads to that data will
> be working eventual consistency until the next successful write.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/06/2013, at 12:45 PM, Andrey Ilinykh 
> >
> wrote:
>
> It depends on cassandra version. As far as I know in 1.2 coordinator logs
> request before it updates replicas. If it fails it will replay log on
> startup.
> In 1.1 you may have inconsistant state, because only part of your request
> is propagated to replicas.
>
> Thank you,
>   Andrey
>
>
> On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng 
> 
> > wrote:
>
>> Hi there,
>>
>> I am writing data to Cassandra by thrift client (not hector) and
>> wonder what happen if the coordinator node fails. The same question
>> applies for bulk loader which uses gossip protocol instead of thrift
>> protocol. In my understanding, the HintedHandoff only takes care of
>> the replica node fails.
>>
>> Thanks.
>>
>> --
>> Regards,
>> Jiaan
>>
>
>
>


Re: Date range queries

2013-06-29 Thread Oleksandr Petrov
Maybe i'm a bit late to the party, but that can be still useful for
reference in future.

We've tried to keep documentation for Clojure cassandra driver as elaborate
and generic as possible, and it contains raw CQL examples,
so you can refer to the docs even if you're using any other driver.

Here's a Range Query guide:
http://clojurecassandra.info/articles/kv.html#toc_8 there's also
information about ordering a resultset,
One more thing that may be useful is Data Modelling guide here:
http://clojurecassandra.info/articles/data_modelling.html#toc_2 which
describes usage of compound keys (which is directly related to range
queries, too).



On Wed, Jun 26, 2013 at 3:05 AM, Colin Blower  wrote:

>  You could just separate the history data from the current data. Then
> when the user's result is updated, just write into two tables.
>
> CREATE TABLE all_answers (
>   user_id uuid,
>   created timeuuid,
>   result text,
>   question_id varint,
>   PRIMARY KEY (user_id, created)
> )
>
> CREATE TABLE current_answers (
>   user_id uuid,
>   question_id varint,
>   created timeuuid,
>   result text,
>   PRIMARY KEY (user_id, question_id)
> )
>
>
> > select * FROM current_answers ;
>  user_id  | question_id | result | created
>
> --+-++--
>  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   1 | no |
> f9893ee0-ddfa-11e2-b74c-35d7be46b354
>  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   2 |   blah |
> f7af75d0-ddfa-11e2-b74c-35d7be46b354
>
> > select * FROM all_answers ;
>  user_id  |
> created  | question_id | result
>
> --+--+-+
>  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
> f0141234-ddfa-11e2-b74c-35d7be46b354 |   1 |yes
>  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
> f7af75d0-ddfa-11e2-b74c-35d7be46b354 |   2 |   blah
>  11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
> f9893ee0-ddfa-11e2-b74c-35d7be46b354 |   1 | no
>
> This way you can get the history of answers if you want and there is a
> simple way to get the most current answers.
>
> Just a thought.
> -Colin B.
>
>
>
> On 06/24/2013 03:28 PM, Christopher J. Bottaro wrote:
>
> Yes, that makes sense and that article helped a lot, but I still have a
> few questions...
>
>  The created_at in our answers table is basically used as a version id.
>  When a user updates his answer, we don't overwrite the old answer, but
> rather insert a new answer with a more recent timestamp (the version).
>
>  answers
> ---
> user_id | created_at | question_id | result
> ---
>   1 | 2013-01-01 | 1   | yes
>   1 | 2013-01-01 | 2   | blah
>1 | 2013-01-02 | 1   | no
>
>  So the queries we really want to run are "find me all the answers for a
> given user at a given time."  So given the date of 2013-01-02 and user_id
> 1, we would want rows 2 and 3 returned (since rows 3 obsoletes row 1).  Is
> it possible to do this with CQL given the current schema?
>
>  As an aside, we can do this in Postgresql using window functions, not
> standard SQL, but pretty neat.
>
>  We can alter our schema like so...
>
>  answers
> ---
> user_id | start_at | end_at | question_id | result
>
>  Where the start_at and end_at denote when an answer is active.  So the
> example above would become:
>
>  answers
> ---
> user_id | start_at   | end_at | question_id | result
> 
>   1 | 2013-01-01 | 2013-01-02 | 1   | yes
>   1 | 2013-01-01 | null   | 2   | blah
>1 | 2013-01-02 | null   | 1   | no
>
>  Now we can query "SELECT * FROM answers WHERE user_id = 1 AND start_at
> >= '2013-01-02' AND (end_at < '2013-01-02' OR end_at IS NULL)".
>
>  How would one define the partitioning key and cluster columns in CQL to
> accomplish this?  Is it as simple as PRIMARY KEY (user_id, start_at,
> end_at, question_id) (remembering that we sometimes want to limit by
> question_id)?
>
>  Also, we are a bit worried about race conditions.  Consider two separate
> processes updating an answer for a given user_id / question_id.  There will
> be a race condition between the two to update the correct row's end_at
> field.  Does that make sense?  I can draw it out with ASCII tables, but I
> feel like this email is already too long... :P
>
>  Thanks for the help.
>
>
>
> On Wed, Jun 19, 2013 at 2:28 PM, David McNelis  wrote:
>
>> So, if you want to grab by the created_at and occasionally limit by
>> question id, that is why you'd use created_at.
>>
>>  The way the primary keys work is the first part of the primary key is
>> the Partioner key, that field is what essentially is the single cassandra
>> row.  The second key is the order preserving key, so you can

Re: token() function in CQL3 (1.2.5)

2013-06-29 Thread Oleksandr Petrov
Tokens are very useful for pagination and "world" iteration. For example,
when you want to scan an entire table, you want to use token() function.

You can refer two guides we've written for Clojure driver (although they do
not contain much clojure-specific information.
First one is Data Modelling / Static Tables guide:
http://clojurecassandra.info/articles/data_modelling.html#toc_1
and second one would be K/V guide / Pagination:
http://clojurecassandra.info/articles/kv.html#toc_7


On Wed, Jun 19, 2013 at 5:06 PM, Tyler Hobbs  wrote:

>
> On Wed, Jun 19, 2013 at 7:47 AM, Ben Boule  wrote:
>
>>  Can anyone explain this to me?  I have been looking through the source
>> code but can't seem to find the answer.
>>
>> The documentation mentions using the token() function to change a value
>> into it's token for use in queries.   It always mentions it as taking a
>> single parameter:
>>
>> SELECT * FROM posts WHERE token(userid) > token('tom') AND token(userid) < 
>> token('bob')
>>
>>
>> However on my 1.2.5 node I am getting the following error:
>>
>> e.x.
>>
>> create table foo (
>> organization text,
>> type text,
>> time timestamp,
>> id uuid,
>> primary key ((organization, type, time), id))
>>
>> select * from foo where organization = 'companyA' and type = 'typeB' and
>> token(time) < token('somevalue') and token(time) > token('othervalue')
>>
>> Bad Request: Invalid number of arguments in call to function token: 3
>> required but 1 provided
>>
>> What are the other two parameters?  We don't currently use the token
>> function but I was experimenting seeing if I could move the time into the
>> partition key for a table like this to better distribute the rows.  But I
>> can't seem to figure out how to get token() working.
>>
>
> token() acts on the entire partition key, which for you is (organization,
> type, time), hence the 3 required values.
>
> In order to better distribute the rows, I suggest using a time bucket as
> part of the partition key.  For example, you might use only the date
> portion of the timestamp as the time bucket.
>
> These posts talk about doing something similar with the Thrift API, but
> they will probably still be helpful:
> - http://rubyscale.com/2011/basic-time-series-with-cassandra/
> - http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
>
> --
> Tyler Hobbs
> DataStax 
>



-- 
alex p


Re: Data model for financial time series

2013-06-29 Thread Oleksandr Petrov
You can refer to the Data Modelling guide here:
http://clojurecassandra.info/articles/data_modelling.html
It includes several things you've mentioned (namely, range queries and
dynamic tables).

Also, it seems that it'd be useful for you to use indexes, and performing
filtering (for things related to "give me everything about symbol X"), for
that you can refer to K/V operations guide:
http://clojurecassandra.info/articles/kv.html#toc_8 (range query section)
and http://clojurecassandra.info/articles/kv.html#toc_10 (filtering
section).

It looks that data model fits really well to what Cassandra allows.
Especially in financial data, it sounds like you can use your "symbol" as a
partition key, which opens up a wild range of possibilities for querying.


On Fri, Jun 7, 2013 at 9:16 PM, Jake Luciani  wrote:

> We have built a similar system, you can ready about our data model in CQL3
> here:
>
> http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013
>
> We are going to be presenting a similar talk next week at the cassandra
> summit.
>
>
> On Fri, Jun 7, 2013 at 12:34 PM, Davide Anastasia <
> davide.anasta...@qualitycapital.com> wrote:
>
>>  Hi,
>>
>> I am trying to build the storage of stock prices in Cassandra. My queries
>> are ideally of three types:
>>
>> - give me everything between time A and time B;
>>
>> - give me everything about symbol X;
>>
>> - give me everything of type Y;
>>
>> …or an intersection of the three. Something I will be happy doing is:
>>
>> - give me all the trades about APPL between 7:00am and 3:00pm of a
>> certain day.
>>
>> ** **
>>
>> However, being a time series, I will be happy to retrieve the data in
>> ascending order of timestamp (from 7:00 to 3:00).
>>
>> ** **
>>
>> I have tried to build my table with the timestamp (as timeuuid) as
>> primary key, however I cannot manage to get my data in order and and “order
>> by” in CQL3 raise an error and doesn’t perform the query.
>>
>> ** **
>>
>> Does anybody have any suggestion to get a good design the fits my queries?
>> 
>>
>> Thanks,
>>
>> David
>>
>
>
>
> --
> http://twitter.com/tjake
>



-- 
alex p


Re: Dynamic column family using CQL2, possible?

2013-06-29 Thread Oleksandr Petrov
WITH COMPACT STORAGE should allow accessing your dataset from CQL2,
actually.
There're newer driver that supports binary CQL, namely
https://github.com/iconara/cql-rb which is written by guys from Bart, who
know stuff about cassandra :)

We're using COMPACT STORAGE for tables we access through Thrift/Hadoop, and
it works perfectly well.
You can refer to Data Modelling guide if you want to learn more about how
to model your data to make it fit into Cassandra well:
http://clojurecassandra.info/articles/data_modelling.html


On Wed, May 29, 2013 at 12:44 AM, Matthew Hillsborough <
matthew.hillsboro...@gmail.com> wrote:

> Hi all,
>
> I started building a schema using CQL3's interface following the
> instructions here: http://www.datastax.com/dev/blog/thrift-to-cql3
>
> In particular, the dynamic column family instructions did exactly what I
> need to model my data on that blog post.
>
> I created a schema that looks like the following:
>
> CREATE TABLE user_games (
>   g_sp_key text,
>   user_id int,
>   nickname text,
>   PRIMARY KEY (g_sp_key, user_id)
> ) WITH COMPACT STORAGE;
>
> Worked great. My problem is I tested everything in CQLsh. As soon as it
> came time to implementing in my application (a Ruby on Rails app using the
> cassandra-cql gem found at https://github.com/kreynolds/cassandra-cql), I
> realized cassandra-cql does not support CQL3 and I have to stick to CQL2.
>
> My question simply comes down to is it possible to do what I was
> attempting to do above in CQL2? How would my schema above change? Do I have
> to go back to using a Thrift based client?
>
> Thanks all.
>



-- 
alex p


Re: Mixing CAS UPDATE and non-CAS DELETE

2013-06-29 Thread Blair Zajac

On 6/26/13 10:26 AM, Sylvain Lebresne wrote:

On Tue, Jun 25, 2013 at 5:30 AM, Blair Zajac mailto:bl...@orcaware.com>> wrote:

But if I want to delete it regardless of v1, then this doesn't work:

   DELETE FROM test WHERE k = 0 IF EXISTS


That's correct, though we should probably fix that at some point. I've
opened https://issues.apache.org/jira/browse/CASSANDRA-5708 for that.


I was thinking about this and wondering if there's even a point to 
having a CAS delete in this case?  Is "DELETE FROM test WHERE k = 0" the 
same as "DELETE FROM test WHERE k = 0 IF EXISTS" from a practical point 
of view?  One is still racing with any other updates to the row, either 
with  the proposal from other CASes or an implicit race with whoever has 
a later timestamp?


Does one save a row tombstone if one uses "IF EXISTS"?


So one is left to

   DELETE FROM test WHERE k = 0

How does this non-CAS DELETE mix with a CAS UPDATE for the same
partition key?  Will they properly not step over each other?


Depends on what you mean by "not step over each other". A CAS update
will end up inserting columns with a timestamp that is basically the one
of the start of the paxos algorithm use underneath. The delete itself
will be a tombstone with a timestamp of when you execute that delete. So
the basic rule of "the more recent wins" will apply. Of course if 2 such
operations contend, you can't really know which will win. But if you do
a delete at QUORUM, followed by a CAS update IF NOT EXISTS (and there is
no other concurrently running operation on that row) you are guaranteed
that your update will succeed.

I don't know if I've answered your question.


Yes, thanks.  I was wondering if the non-CAS delete ignores any 
outstanding proposals on the row.


Blair


Re: How to do a CAS UPDATE on single column CF?

2013-06-29 Thread Blair Zajac

On 6/24/13 8:23 PM, Blair Zajac wrote:

How does one do an atomic update in a column family with a single column?

I have a this CF

CREATE TABLE schema_migrations (
version TEXT PRIMARY KEY,
) WITH COMPACTION = {'class': 'LeveledCompactionStrategy'};


Anyone?  Should I raise this on the developer mailing list or open a ticket?

Blair


CorruptBlockException

2013-06-29 Thread Glenn Thompson
Hi,

I'm Glenn Thompson and new to Cassandra.  I have been trying to figure out
how to recover from a CorruptBlockException.  My travels have led me to
numerous email threads and trouble tickets.  I think I did the right
thing(s) based on my research.

My basic situation.

I'm running on non enterprise hardware.  My personal cloud playground.  8
identical mini-itx Gigabyte GA-Z77N systems running centos 6.4 84 bit.
 Each has 16GB of Ram and two 750GB wd black laptop drives(raid 0) and
i3-3220 processors(2 cores 4 threads).  Anyone interested is the hardware
can go here :
https://drive.google.com/folderview?id=0B54Jqmw0tKp0c19kYy1kUW54VVE&usp=sharing

I've been loading NOAA ISH data in an effort to learn/slash evaluate
Cassandra.

One of my nodes must have a hardware problem.  Although I've been unable to
find anything wrong via logs, smart, or mce.

Cassandra discovered the error during a compaction.  My loading continued
so I let it finish.

Then I:

Flushed
repaired
scrubbed
and finally decommissioned the node.

At no point did Cassandra declare any of the tokens as down or anything.
 Other than the Exceptions In the logs.  Cassandra was happy.

The repair, scrub, and decommission all produced Exceptions related to the
same few corrupt files.

I plumb this whole thing with SaltStack so I'm going to start over and
attempt another load with new RAM in the bad node.  I'll save my logs and
configs if anyone is interested.  I'll post them on my google drive if
anyone thinks it will be useful.

Cheers,
Glenn