Re: design that mimics twitter tweet search

2012-03-19 Thread Chris Goffinet
We do not use Cassandra for search. We made modifications to Lucene.

Here is a blog post on our engineering section that talks about what we did:

http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html


On Sun, Mar 18, 2012 at 11:22 PM, Tharindu Mathew wrote:

> Sasha,
>
> It depends on the way you implement I guess... Maybe twitter uses
> Solandra, who's very good at indexing these in different ways but has the
> power of Cassandra underneath...
>
> If your doing your own impl of indexing be mindful that you can break the
> sentence into four words and index or you index the whole sentence. Both
> would produce different results as they can mean a completely different
> thing based on the context.
>
>
> On Mon, Mar 19, 2012 at 7:35 AM, Andrey V. Panov wrote:
>
>> Why you suppose they did search on Cassandra?
>>
>>
>> On 19 March 2012 00:16, Sasha Dolgy  wrote:
>>
>>> yes -- but given i have two keywords, and want to find all tweets that
>>> have "cassandra" and "bestest" ... means, retrieving all columns + values
>>> in each row, iterating through both to see if tweet id's in one, exist in
>>> the other and finishing up with a consolidated list of tweet id's that only
>>> exist in both.  just seems clunky to me ... ?
>>>
>>>
>>> On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud wrote:
>>>
 The simpliest modeling you could have is using the keyword as key, a
 timestamp/time UUID as column name and the tweetid as value

 -> cf['keyword']['timestamp'] = tweetid

 then you do a range query to get all tweetid sorted by time (you may
 want them in reverse order) and you can limit to the number of tweets
 displayed on the page.

 As some rows can become large, you could use key patitionning by
 concatening for instance keyword and the month and year.


 2012/3/18 Sasha Dolgy :
 > Hi All,
 >
 > With twitter, when I search for words like:  "cassandra is the
 bestest", 4
 > tweets will appear, including one i just did.  My understand that the
 > internals of twitter work in that each word in a tweet is allocated,
 > irrespective of the presence of a  # hash tag, and the tweet id is
 assigned
 > to a row for that word.  What is puzzling to me, and hopeful that
 some smart
 > people on here can shed some light on -- is how would this work with
 > Cassandra?
 >
 > row [ cassandra ]: key -> tweetid  / timestamp
 > row [ bestest ]: key -> tweetid / timestamp
 >
 > I had thought that I could simply pull a list of all column names
 from each
 > row (representing each word) and flag all occurrences (tweet id's)
 that
 > exist in each row ... however, these rows would get quite long over
 time.
 >
 > Am I missing an easier way to get a list of all "tweetid's" that
 exist in
 > multiple rows?
 >
 > --
 > Sasha Dolgy
 > sasha.do...@gmail.com



 --
 sent from my Nokia 3210

>>>
>>>
>>>
>>> --
>>> Sasha Dolgy
>>> sasha.do...@gmail.com
>>>
>>
>>
>
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


Re: 0.8.1 Vs 1.0.7

2012-03-19 Thread Chris Goffinet
When creating a new CF, defaults are now in fact compression enabled.


On Sat, Mar 17, 2012 at 5:50 AM, R. Verlangen  wrote:

> Check your log for messages about rebuilding indices: that might grow your
> dataset some.
>
> One thing is for sure: the data import removed all the crap that lasted in
> the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly
> dramatic but not unlogical at all.
>
>
> 2012/3/16 Jeremiah Jordan 
>
>>  I would guess more aggressive compaction settings, did you update rows
>> or insert some twice?
>> If you run major compaction a couple times on the 0.8.1 cluster does the
>> data size get smaller?
>>
>> You can use the "describe" command to check if compression got turned on.
>>
>> -Jeremiah
>>
>>  --
>> *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
>> *Sent:* Thursday, March 15, 2012 4:41 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* 0.8.1 Vs 1.0.7
>>
>>  Hi,
>>
>>  I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results
>> were a little bit surprising
>>
>>  0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch
>>
>>  XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB
>> 12.50%
>> XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB
>> 12.50%
>> XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB
>> 12.50%
>> XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB
>> 12.50%
>> XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB
>> 12.50%
>> XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB
>> 12.50%
>> XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB
>> 12.50%
>> XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB
>> 12.50%
>>
>>  1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c
>> yet to join ring],
>> PropertyFileSnitch
>>
>>  XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
>> XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%
>>
>> XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%
>>
>> XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%
>>
>> XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%
>>
>> XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%
>>
>> XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%
>>
>> XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%
>>
>>  There seems to be 3X savings in size for the same dataset running
>> 1.0.7. I have not enabled compression for any of the CFs. Will it be
>> enabled by default when creating a new CF in 1.0.7? cassandra.yaml is also
>> mostly identical.
>>
>>  Thanks and Regards,
>> Ravi
>>
>
>


Re: design that mimics twitter tweet search

2012-03-19 Thread Sasha Dolgy
most excellent ... thanks Chris!

On Mon, Mar 19, 2012 at 9:23 AM, Chris Goffinet wrote:

> We do not use Cassandra for search. We made modifications to Lucene.
>
> Here is a blog post on our engineering section that talks about what we
> did:
>
>
> http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html
>
>
>


Re: 0.8.1 Vs 1.0.7

2012-03-19 Thread Sylvain Lebresne
On Mon, Mar 19, 2012 at 9:27 AM, Chris Goffinet  wrote:
> When creating a new CF, defaults are now in fact compression enabled.

For the record, that will be true starting in 1.1 but isn't be the
default before that.

--
Sylvain


> On Sat, Mar 17, 2012 at 5:50 AM, R. Verlangen  wrote:
>>
>> Check your log for messages about rebuilding indices: that might grow your
>> dataset some.
>>
>> One thing is for sure: the data import removed all the crap that lasted in
>> the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly
>> dramatic but not unlogical at all.
>>
>>
>> 2012/3/16 Jeremiah Jordan 
>>>
>>> I would guess more aggressive compaction settings, did you update rows or
>>> insert some twice?
>>> If you run major compaction a couple times on the 0.8.1 cluster does the
>>> data size get smaller?
>>>
>>> You can use the "describe" command to check if compression got turned on.
>>>
>>> -Jeremiah
>>>
>>> 
>>> From: Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
>>> Sent: Thursday, March 15, 2012 4:41 AM
>>> To: user@cassandra.apache.org
>>> Subject: 0.8.1 Vs 1.0.7
>>>
>>> Hi,
>>>
>>> I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results
>>> were a little bit surprising
>>>
>>> 0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch
>>>
>>> XXX.XXX.XXX.A  datacenter1 rack1       Up     Normal  140.61 GB
>>> 12.50%
>>> XXX.XXX.XXX.B  datacenter1 rack1       Up     Normal  139.92 GB
>>> 12.50%
>>> XXX.XXX.XXX.C  datacenter1 rack1       Up     Normal  138.81 GB
>>> 12.50%
>>> XXX.XXX.XXX.D  datacenter1 rack1       Up     Normal  139.78 GB
>>> 12.50%
>>> XXX.XXX.XXX.E  datacenter1 rack1       Up     Normal  137.44 GB
>>> 12.50%
>>> XXX.XXX.XXX.F  datacenter1 rack1       Up     Normal  138.48 GB
>>> 12.50%
>>> XXX.XXX.XXX.G  datacenter1 rack1       Up     Normal  140.52 GB
>>> 12.50%
>>> XXX.XXX.XXX.H  datacenter1 rack1       Up     Normal  145.24 GB
>>> 12.50%
>>>
>>> 1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c
>>> yet to join ring],
>>> PropertyFileSnitch
>>>
>>> XXX.XXX.XXX.A  DC1 RAC1       Up     Normal   48.72  GB       12.50%
>>> XXX.XXX.XXX.B  DC1 RAC1       Up     Normal   51.23  GB       12.50%
>>>
>>> XXX.XXX.XXX.C  DC1 RAC1       Up     Normal   52.4    GB       12.50%
>>>
>>> XXX.XXX.XXX.D  DC1 RAC1       Up     Normal   49.64  GB       12.50%
>>>
>>> XXX.XXX.XXX.E  DC1 RAC1       Up     Normal   48.5    GB       12.50%
>>>
>>> XXX.XXX.XXX.F  DC1 RAC1       Up     Normal    53.38  GB       12.50%
>>>
>>> XXX.XXX.XXX.G  DC1 RAC1       Up     Normal   51.11  GB       12.50%
>>>
>>> XXX.XXX.XXX.H  DC1 RAC1       Up     Normal   53.36  GB       12.50%
>>>
>>> There seems to be 3X savings in size for the same dataset running 1.0.7.
>>> I have not enabled compression for any of the CFs. Will it be enabled by
>>> default when creating a new CF in 1.0.7? cassandra.yaml is also mostly
>>> identical.
>>>
>>> Thanks and Regards,
>>> Ravi
>>
>>
>


Re: Question regarding secondary indices

2012-03-19 Thread aaron morton
> This way one can take adv of the speedup that you get from reading accross 
> multiple drives.
> Or alternatively is it possible to run multiple instances of sstableloader on 
> the same machine concurrently?
Without checking the code, i would assume you can run multiple instances. 

Alternatively place all the files on a raid-0 volume for improved io throughput.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/03/2012, at 2:44 PM, Sanjeev Kulkarni wrote:

> Thanks Aaron for the response. I see those logs.
> I had one more question. Looks like sstableloader takes only one directory at 
> a time. Is it possible to load multiple directories in one call.
> Something like sstableloader /drive1/keyspace1 /drive2/keyspace1...
> This way one can take adv of the speedup that you get from reading accross 
> multiple drives.
> Or alternatively is it possible to run multiple instances of sstableloader on 
> the same machine concurrently?
> Thanks!
> 
> On Thu, Mar 15, 2012 at 6:54 PM, aaron morton  wrote:
> You should see a log line with "Index build of {} complete". 
> 
> You can also see which indexes are built using the describe command in 
> cassandra-cli. 
> 
> 
> -
> Aaron Morton[default@XX] describe;
> Keyspace: XX:
> ...
>   Column Families:
> ColumnFamily: XXX
>   ...
>   Built indexes: []
> 
> Cheers
> 
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/03/2012, at 10:04 AM, Sanjeev Kulkarni wrote:
> 
>> Hi,
>> I'm using a 4 node cassandra cluster running 0.8.10 with rf=3. Its a brand 
>> new setup.
>> I have a single col family which contains about 10 columns. I have enabled 
>> secondary indices on 3 of them. I used sstableloader to bulk load some data 
>> into this cluster. 
>> I poked around the logs and saw the following messages
>> Submitting index build of attr_001 ..
>> which indicates that cassandra has started building indices. 
>> How will I know when the building of the indices is done? Is there some log 
>> messages that I should look for?
>> Thanks!
> 
> 



Re: Single Node Cassandra Installation

2012-03-19 Thread aaron morton
> Even more: if you enable read repair the chances of having bad writes 
> decreases for any further reads. This will make your cluster become faster 
> consistent again after some failure.
Under 1.0 the default RR probability was reduced to 10%. Because Hinted Handoff 
 was changed to also store hints for nodes that fail to respond to a write. 
Previously  it only storied hints for nodes that were down when the request 
started.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/03/2012, at 1:48 AM, R. Verlangen wrote:

> " By default Cassandra tries to write to both nodes, always. Writes will only 
> fail (on a node) if it is down, and even then hinted handoff will attempt to 
> keep both nodes in sync when the troubled node comes back up. The point of 
> having two nodes is to have read and write availability in the face of 
> transient failure. "
> 
> Even more: if you enable read repair the chances of having bad writes 
> decreases for any further reads. This will make your cluster become faster 
> consistent again after some failure.
> 
> Also consider to use different CL's for different operations. E.g. the 
> Twitter timeline can miss some records, however if you would want to display 
> my bank account I would prefer to see the right thing: or a nice error 
> message. 
> 
> 2012/3/16 Ben Coverston 
> Doing reads and writes at CL=1 with RF=2 N=2 does not imply that the reads 
> will be inconsistent. It's more complicated than the simple counting of 
> blocked replicas. It is easy to support the notion that it will be largely 
> consistent, in fact very consistent for most use cases.
> 
> By default Cassandra tries to write to both nodes, always. Writes will only 
> fail (on a node) if it is down, and even then hinted handoff will attempt to 
> keep both nodes in sync when the troubled node comes back up. The point of 
> having two nodes is to have read and write availability in the face of 
> transient failure.
> 
> If you are interested there is a good exposition of what 'consistency' means 
> in a system like Cassandra from the link below[1].
> 
> [1]
> http://www.eecs.berkeley.edu/~pbailis/projects/pbs/
> 
> 
> On Fri, Mar 16, 2012 at 6:50 AM, Thomas van Neerijnen  
> wrote:
> You'll need to either read or write at at least quorum to get consistent data 
> from the cluster so you may as well do both.
> Now that you mention it, I was wrong about downtime, with a two node cluster 
> reads or writes at quorum will mean both nodes need to be online. Perhaps you 
> could have an emergency switch in your application which flips to consistency 
> of 1 if one of your Cassandra servers goes down? Just make sure it's set back 
> to quorum when the second one returns or again you could end up with 
> inconsistent data.
> 
> 
> On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcharian  wrote:
> Thanks for the comments, I guess I will end up doing a 2 node cluster with 
> replica count 2 and read consistency 1.
> 
> -- Drew
> 
> 
> 
> On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote:
> 
>> So long as data loss and downtime are acceptable risks a one node cluster is 
>> fine.
>> Personally this is usually only acceptable on my workstation, even my dev 
>> environment is redundant, because servers fail, usually when you least want 
>> them to, like for example when you've decided to save costs by waiting 
>> before implementing redundancy. Could a failure end up costing you more than 
>> you've saved? I'd rather get cheaper servers (maybe even used off ebay??) so 
>> I could have at least two of them.
>> 
>> If you do go with a one node solution, altho I haven't tried it myself Priam 
>> looks like a good place to start for backups, otherwise roll your own with 
>> incremental snapshotting turned on and a watch on the snapshot directory. 
>> Storage on something like S3 or Cloud Files is very cheap so there's no good 
>> excuse for no backups.
>> 
>> On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen  wrote:
>> Hi Drew,
>> 
>> One other disadvantage is the lack of "consistency level" and "replication". 
>> Both ware part of the high availability / redundancy. So you would really 
>> need to backup your single-node-"cluster" to some other external location.
>> 
>> Good luck!
>> 
>> 
>> 2012/3/15 Drew Kutcharian 
>> Hi,
>> 
>> We are working on a project that initially is going to have very little 
>> data, but we would like to use Cassandra to ease the future scalability. Due 
>> to budget constraints, we were thinking to run a single node Cassandra for 
>> now and then add more nodes as required.
>> 
>> I was wondering if it is recommended to run a single node cassandra in 
>> production? Are there any other issues besides lack of high availability?
>> 
>> Thanks,
>> 
>> Drew
>> 
>> 
>> 
> 
> 
> 
> 
> 
> -- 
> Ben Coverston
> DataStax -- The Apache Cassandra Company
> 
> 



Re: Secondary Index Validation Type Parse Error

2012-03-19 Thread aaron morton
> java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: 
> cannot parse 'subject' as hex bytes
This has to do with the create column family statement...

>   and comparator = 'BytesType'
Tells Cassandra that all column names in this CF should be interpreted as raw 
bytes. The BytesType expects string input to be Hexadecimal formatted strings. 

>   and column_metadata=[{column_name: subject, validation_class: BytesType,  
> index_type: KEYS}]; 

Tells Cassandra to create a secondary index on the column named 'subject'. 
Column names will in interpreted as hex however, and 'subject' is not a valid 
hex string. 

Assuming the column names are not thing things you are validating in PHP. 
Consider changing the comparator to UTF8Type…

create column family subjects
  with column_type = 'Standard'
  and comparator = 'UTF8Type'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and rows_cached = 20.0
  and row_cache_save_period = 0
  and row_cache_keys_to_save = 2147483647
  and keys_cached = 20.0
  and key_cache_save_period = 14400
  and read_repair_chance = 1.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and row_cache_provider = 'SerializingCacheProvider'
  and compaction_strategy = 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and column_metadata=[{column_name: subject, validation_class: BytesType,  
index_type: KEYS}];
 

hope that helps.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/03/2012, at 2:22 AM, Sam Hodgson wrote:

> Hi me again - sorry i've just read that bytestype will expect hex input so my 
> question now is how to create a column that will accept non-validated text as 
> as input?  I think I can maybe get round this by forcing UTF8Encoding 
> regardless if the string is already identified as UTF8 or not however it 
> seems like im missing some fundamental knowledge about casandra validation?
> 
> Cheers
> 
> Sam
> 
> From: hodgson_...@hotmail.com
> To: user@cassandra.apache.org
> Subject: Secondary Index Validation Type Parse Error
> Date: Sun, 18 Mar 2012 13:02:10 +
> 
> Hi All,
> 
> Getting the following parse error when trying to create a CF with a secondary 
> index using the bytestype attribute, the index is for a column called 
> 'subject':
> 
> java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: 
> cannot parse 'subject' as hex bytes
> 
> Im doing all my validation in php however im unable to validate some UTF8 
> sources accurately (using mb_detect_encoding) - Cass picks up on bits of 
> non-UTF8 compatible text that the php doesnt so its throwing exceptions.  
> Figured id set everything to bytestype to try and effectively turn off 
> validation in Cass?  
> 
> Im using the following to try and build the CF:
> 
> create column family subjects
>   with column_type = 'Standard'
>   and comparator = 'BytesType'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'BytesType'
>   and rows_cached = 20.0
>   and row_cache_save_period = 0
>   and row_cache_keys_to_save = 2147483647
>   and keys_cached = 20.0
>   and key_cache_save_period = 14400
>   and read_repair_chance = 1.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and row_cache_provider = 'SerializingCacheProvider'
>   and compaction_strategy = 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and column_metadata=[{column_name: subject, validation_class: BytesType,  
> index_type: KEYS}];
> 
> Any help is greatly appreciated! :)
> 
> Cheers
> 
> Sam



Re: consistency level question

2012-03-19 Thread aaron morton
Some information on node failures, consistency levels and availability 
http://thelastpickle.com/2011/06/13/Down-For-Me/

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/03/2012, at 1:08 PM, Watanabe Maki wrote:

> Yes, read and write won't fail with single node failure.
> But your read may return old data.
> 
> maki
> 
> On 2012/03/19, at 1:08, Caleb Rackliffe  wrote:
> 
>> That sounds right to me :)
>> 
>> Caleb Rackliffe | Software Developer 
>> M 949.981.0159 | ca...@steelhouse.com
>> 
>> 
>> From: Tamar Fraenkel 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Sun, 18 Mar 2012 04:20:58 -0400
>> To: "user@cassandra.apache.org" 
>> Subject: Re: consistency level question
>> 
>> Thanks!
>> I updated replication factor to 2, and now when I took one node down all 
>> continued running (I did see Hector complaining on the node being down), but 
>> things were saved to db and read from it.
>> 
>> Just so I understand, now, having replication factor of 2, if I have 2 out 
>> of 3 nodes running all my read and writes with CL=1 should work, right?
>> 
>> 
>> Tamar Fraenkel 
>> Senior Software Engineer, TOK Media 
>> 
>> 
>> 
>> ta...@tok-media.com
>> Tel:   +972 2 6409736 
>> Mob:  +972 54 8356490 
>> Fax:   +972 2 5612956 
>> 
>> 
>> 
>> 
>> 
>> On Sun, Mar 18, 2012 at 9:57 AM, Watanabe Maki  
>> wrote:
>> Because your RF is 1, so you need all nodes up.
>> 
>> maki
>> 
>> 
>> On 2012/03/18, at 16:15, Tamar Fraenkel  wrote:
>> 
>>> Hi!
>>> I have a 3 node cassandra cluster.
>>> I use Hector API.
>>> 
>>> I give hecotr one of the node's IP address
>>> I call setAutoDiscoverHosts(true) and setRunAutoDiscoveryAtStartup(true).
>>> 
>>> The describe on one node returns:
>>> 
>>> Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>>   Durable Writes: true
>>> Options: [replication_factor:1]
>>> 
>>> The odd thing is that when I take one of the nodes down, expecting all to 
>>> continue running smoothly, I get exceptions of the format seen bellow, and 
>>> no read or write succeeds. When I bring the node back up, exceptions stop 
>>> and read and write resumes.
>>> 
>>> Any idea or explanation why this is the case?
>>> Thanks!
>>> 
>>> 
>>> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be 
>>> enough replicas present to handle consistency level.
>>> at 
>>> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:66)
>>> at 
>>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:285)
>>> at 
>>> me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:268)
>>> at 
>>> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
>>> at 
>>> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
>>> at 
>>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
>>> at 
>>> me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289)
>>> at 
>>> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
>>> at 
>>> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
>>> at 
>>> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
>>> at 
>>> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
>>> at 
>>> me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
>>> at 
>>> me.prettyprint.cassandra.service.ColumnSliceIterator.hasNext(ColumnSliceIterator.java:60)
>>> at 
>>> 
>>> 
>>> Tamar Fraenkel 
>>> Senior Software Engineer, TOK Media 
>>> 
>>> 
>>> 
>>> 
>>> ta...@tok-media.com
>>> Tel:   +972 2 6409736 
>>> Mob:  +972 54 8356490 
>>> Fax:   +972 2 5612956 
>>> 
>>> 
>>> 
>> 



repair broke TTL based expiration

2012-03-19 Thread Radim Kolar
I suspect that running cluster wide repair interferes with TTL based 
expiration. I am running repair every 7 days and using TTL expiration 
time 7 days too. Data are never deleted.
Stored data in cassandra are always growing (watching them for 3 months) 
but they should not. If i run manual cleanup, some data are deleted but 
just about 5%. Currently there are about 3-5 times more rows then i 
estimate.


I suspect that running repair on data with TTL can cause:

1. time check for expired records is ignored and these data are streamed 
to other node and they will be alive again

 or
2. streaming data are propagated with full TTL. Lets say that i have ttl 
7 days, data are stored for 5 days and then repaired, they should be 
sent to other node with ttl 2 days not 7.


Can someone do testing on this case? I could not play with production 
cluster too much.


Another consistency level problem

2012-03-19 Thread Everton Lima
Hello people, I was having the following problem:

I was running a single node of Cassandra, using cassandra's consistency
level ALL. My program (in java) is a B-Tree like and a node store how many
childrens it has. In every update my app do, like a insert of a new
children, it do a commit, saving the update in commitlog.  But the problem
is: when I try to recover the value of how many children a node has (when
it should answer 0) it answer 4 or 5 (not deterministic). The interesting
is that if I put a Thread.sleep(1), it works right.

Some one know why this happen? And a manner to fix it?

Since now thanks.

-- 

Everton Lima Aleixo
Bacharel em Ciencia da Computação
Universidade Federal de Goiás


Get few rows by composite key.

2012-03-19 Thread Michael Cherkasov
Hello,
Assume that we have table like this one:

Key:Columns names:
AA:AA 1:A 1:B 1:C 2:A 2:C
AA:BB 1:C 2:A 2:C
AA:CC 2:A 2:C
AA:DD 1:A 1:B 1:C
BB:AA 1:A 1:B 2:C
BB:BB 1:A 1:B 1:C 2:C
BB:CC 1:A  2:A 2:C
BB:DD 1:A  1:C 2:A 2:C

Is there any way to take rows with first key's part equals AA and second
more or equal BB?
I'm interesting about Hector code.


Max # of CFs

2012-03-19 Thread A J
How many Column Families are one too many for Cassandra ?
I created a db with 5000 CFs (I can go into the reasons later) but the
latency seems to be very erratic now. Not sure if it is because of the
number of CFs.

Thanks.


Re: Get few rows by composite key.

2012-03-19 Thread Michael Cherkasov
Also one more question:
Can someone show query that will fetch all rows match to this condition:  (
AA:(part 2>= BB)  ) or ( key == (BB:CC) )

2012/3/19 Michael Cherkasov 

> Hello,
> Assume that we have table like this one:
>
> Key:Columns names:
> AA:AA 1:A 1:B 1:C 2:A 2:C
> AA:BB 1:C 2:A 2:C
> AA:CC 2:A 2:C
> AA:DD 1:A 1:B 1:C
> BB:AA 1:A 1:B 2:C
> BB:BB 1:A 1:B 1:C 2:C
> BB:CC 1:A  2:A 2:C
> BB:DD 1:A  1:C 2:A 2:C
>
> Is there any way to take rows with first key's part equals AA and second
> more or equal BB?
> I'm interesting about Hector code.
>


RE: Get few rows by composite key.

2012-03-19 Thread Stephen Pope
I'm not sure about Hector code (somebody else can chime in here), but to find 
the keys you're after you can slice to get the keys from AA:BB to BB:AA.

Cheers,
Steve

From: Michael Cherkasov [mailto:michael.cherka...@gmail.com]
Sent: Monday, March 19, 2012 9:30 AM
To: user@cassandra.apache.org
Subject: Get few rows by composite key.

Hello,
Assume that we have table like this one:

Key:Columns names:
AA:AA 1:A 1:B 1:C 2:A 2:C
AA:BB 1:C 2:A 2:C
AA:CC 2:A 2:C
AA:DD 1:A 1:B 1:C
BB:AA 1:A 1:B 2:C
BB:BB 1:A 1:B 1:C 2:C
BB:CC 1:A  2:A 2:C
BB:DD 1:A  1:C 2:A 2:C

Is there any way to take rows with first key's part equals AA and second more 
or equal BB?
I'm interesting about Hector code.


RE: Get few rows by composite key.

2012-03-19 Thread Stephen Pope
Those are going to have to be separate queries, since the first is a slice, and 
the second is a fetch.

Cheers,
Steve

From: Michael Cherkasov [mailto:michael.cherka...@gmail.com]
Sent: Monday, March 19, 2012 9:41 AM
To: user@cassandra.apache.org
Subject: Re: Get few rows by composite key.

Also one more question:
Can someone show query that will fetch all rows match to this condition:  ( 
AA:(part 2>= BB)  ) or ( key == (BB:CC) )
2012/3/19 Michael Cherkasov 
mailto:michael.cherka...@gmail.com>>
Hello,
Assume that we have table like this one:

Key:Columns names:
AA:AA 1:A 1:B 1:C 2:A 2:C
AA:BB 1:C 2:A 2:C
AA:CC 2:A 2:C
AA:DD 1:A 1:B 1:C
BB:AA 1:A 1:B 2:C
BB:BB 1:A 1:B 1:C 2:C
BB:CC 1:A  2:A 2:C
BB:DD 1:A  1:C 2:A 2:C

Is there any way to take rows with first key's part equals AA and second more 
or equal BB?
I'm interesting about Hector code.



Re: Max # of CFs

2012-03-19 Thread Alain RODRIGUEZ
This subject was already discussed, this may help you :
http://markmail.org/message/6dybhww56bxvufzf#query:+page:1+mid:6dybhww56bxvufzf+state:results


If you still got questions after reading this thread or some others about
the same topic, do not hesitate asking again,

Alain

2012/3/19 A J 

> How many Column Families are one too many for Cassandra ?
> I created a db with 5000 CFs (I can go into the reasons later) but the
> latency seems to be very erratic now. Not sure if it is because of the
> number of CFs.
>
> Thanks.
>


Mutator or Template?

2012-03-19 Thread Tamar Fraenkel
Hi!
I am using Cassandra with Hector. Usually I use ColumnFamilyTemplate and
ColumnFamilyUpdater to update column families, but sometimes I use Mutator.

1. Is there a preference of using one vs. the other?
2. Are there any actions that can be done with only one of them?

Thanks,


*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: cassandra-cli and "uncreachable" status confusion

2012-03-19 Thread aaron morton
There is a server side check to ensure that all available nodes share the same 
schema version. 

The migration will proceed though if any of the nodes are unavailable.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/03/2012, at 11:07 AM, Shoaib Mir wrote:

> Hi guys,
> 
> While creating schema on our cluster today I didn't get any errors even when 
> some of the hosts in the cluster were unreachable (not the ones in the same 
> data centre but in another region). cli kept on showing all nodes agreeing 
> where all nodes were agreeing.
> 
> Now after this when I did "describe cluster" I did get appropriate 
> "unreachable" messages for the nodes that were timing out on connections.
> 
> Can someone please explain if at the time of schema creation does the nodes 
> just talk to other nodes within the DC for agreement or it has to talk to 
> each and every node within the whole cluster before agreeing on schema 
> changes?
> 
> cheers,
> Shoaib



Re: Token Ring Gaps in a 2 DC Setup

2012-03-19 Thread aaron morton
>  I've also run repair on a few nodes in both data centers, but the sizes are 
> still vastly different.
If repair is completing on all the nodes then the data is fully distributed. 

If you want to dig around…

Take a look at the data files on disk. Do the nodes in DC 1 have some larger, 
older, data files ? These may be waiting for compaction to catch up them. 

If you have done any toke moves, did you run cleanup afterwards ? 


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/03/2012, at 8:35 PM, Caleb Rackliffe wrote:

> More detail…
> 
> I'm running 1.0.7 on these boxes, and the keyspace readout from the CLI looks 
> like this:
> 
> create keyspace Users
>   with placement_strategy = 'NetworkTopologyStrategy'
>   and strategy_options = {DC2 : 1, DC1 : 2}
>   and durable_writes = true;
> 
> Thanks!
> 
> Caleb Rackliffe | Software Developer  
> M 949.981.0159 | ca...@steelhouse.com
> 
> From: Caleb Rackliffe 
> Date: Sun, 18 Mar 2012 02:47:05 -0400
> To: "user@cassandra.apache.org" 
> Subject: Token Ring Gaps in a 2 DC Setup
> 
> Hi Everyone,
> 
> I have a cluster using NetworkTopologyStrategy that looks like this:
> 
> 10.41.116.22 DC1 RAC1 Up Normal  13.21 GB
> 10.00%  0   
> 10.54.149.202   DC2 RAC1 Up Normal  6.98 GB
> 0.00%   1   
> 10.41.116.20 DC1 RAC2 Up Normal  12.75 GB
> 10.00%  1701411830  
> 10.41.116.16 DC1 RAC3 Up Normal  12.62 GB
> 10.00%  3402823670  
> 10.54.149.203   DC2 RAC1 Up Normal  6.7 GB  
> 0.00%   3402823671  
> 10.41.116.18 DC1 RAC4 Up Normal  10.8 GB  
> 10.00%  5104235500  
> 10.41.116.14 DC1 RAC5 Up Normal  10.27 GB
> 10.00%  6805647340  
> 10.54.149.204   DC2 RAC1 Up Normal  6.7 GB 
> 0.00%   6805647341  
> 10.41.116.12 DC1 RAC6 Up Normal  10.58 GB
> 10.00%  8507059170  
> 10.41.116.10 DC1 RAC7 Up Normal  10.89 GB
> 10.00%  10208471000 
> 10.54.149.205   DC2 RAC1 Up Normal  7.51 GB   
> 0.00%   10208471001 
> 10.41.116.8   DC1 RAC8  Up Normal  10.48 GB
> 10.00%  11909882800 
> 10.41.116.24 DC1 RAC9 Up Normal  10.89 GB
> 10.00%  13611294700 
> 10.54.149.206   DC2 RAC1 Up Normal  6.37 GB   
> 0.00%   13611294701 
> 10.41.116.26 DC1 RAC10   Up Normal  11.17 GB
> 10.00%  15312706500
> 
> There are two data centers, one with 10 nodes/2 replicas and one with 5 
> nodes/1 replica.  What I've attempted to do with my token assignments is have 
> each node in the smaller DC handle 20% of the keyspace, and this would mean 
> that I should see roughly equal usage on all 15 boxes.  It just doesn't seem 
> to be happening that way, though.  It looks like the "1 replica" nodes are 
> carrying about half the data the "2 replica" nodes are.  It's almost as if 
> those nodes are only handling 10% of the keyspace instead of 20%.
> 
> Does anybody have any suggestions as to what might be going on?  I've run 
> nodetool getendpoints against a bunch of keys, and I always get back three 
> nodes, so I'm pretty confused.  I've also run repair on a few nodes in both 
> data centers, but the sizes are still vastly different.
> 
> Thanks!
> 
> Caleb Rackliffe | Software Developer  
> M 949.981.0159 | ca...@steelhouse.com



Hector counter question

2012-03-19 Thread Tamar Fraenkel
Hi!

Is there a way to read and increment counter column atomically, something
like incrementAndGet (Hector)?

Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Re: Token Ring Gaps in a 2 DC Setup

2012-03-19 Thread Caleb Rackliffe
Hey Aaron,

I've run cleanup jobs across all 15 nodes, and after that, I still have about a 
24 million to 15 million key ratio between the data centers.  The first DC is a 
few months older than the second, and it also began its life before 1.0.7 was 
out, whereas the second started at 1.0.7.  I wonder if running and 
upgradesstables would be interesting?

Also, while it was digging around, I noticed that we do a LOT of reads 
immediately after writes, and almost every read from the first DC was bringing 
a read-repair along with it.  (Possibly because the distant DC had not yet 
received certain mutations?)  I ended up turning RR off entirely, since I've 
got HH in place to handle short-duration failures :)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:CA735F54-7FB9-4D56-8DD6-944F62768556]

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Mon, 19 Mar 2012 13:34:38 -0400
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Token Ring Gaps in a 2 DC Setup

 I've also run repair on a few nodes in both data centers, but the sizes are 
still vastly different.
If repair is completing on all the nodes then the data is fully distributed.

If you want to dig around…

Take a look at the data files on disk. Do the nodes in DC 1 have some larger, 
older, data files ? These may be waiting for compaction to catch up them.

If you have done any toke moves, did you run cleanup afterwards ?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/03/2012, at 8:35 PM, Caleb Rackliffe wrote:

More detail…

I'm running 1.0.7 on these boxes, and the keyspace readout from the CLI looks 
like this:

create keyspace Users
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {DC2 : 1, DC1 : 2}
  and durable_writes = true;

Thanks!

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Date: Sun, 18 Mar 2012 02:47:05 -0400
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Token Ring Gaps in a 2 DC Setup

Hi Everyone,

I have a cluster using NetworkTopologyStrategy that looks like this:

10.41.116.22 DC1 RAC1 Up Normal  13.21 GB10.00% 
 0
10.54.149.202   DC2 RAC1 Up Normal  6.98 GB
0.00%   1
10.41.116.20 DC1 RAC2 Up Normal  12.75 GB10.00% 
 1701411830
10.41.116.16 DC1 RAC3 Up Normal  12.62 GB10.00% 
 3402823670
10.54.149.203   DC2 RAC1 Up Normal  6.7 GB  
0.00%   3402823671
10.41.116.18 DC1 RAC4 Up Normal  10.8 GB  
10.00%  5104235500
10.41.116.14 DC1 RAC5 Up Normal  10.27 GB10.00% 
 6805647340
10.54.149.204   DC2 RAC1 Up Normal  6.7 GB 
0.00%   6805647341
10.41.116.12 DC1 RAC6 Up Normal  10.58 GB10.00% 
 8507059170
10.41.116.10 DC1 RAC7 Up Normal  10.89 GB10.00% 
 10208471000
10.54.149.205   DC2 RAC1 Up Normal  7.51 GB   0.00% 
  10208471001
10.41.116.8   DC1 RAC8  Up Normal  10.48 GB
10.00%  11909882800
10.41.116.24 DC1 RAC9 Up Normal  10.89 GB10.00% 
 13611294700
10.54.149.206   DC2 RAC1 Up Normal  6.37 GB   0.00% 
  13611294701
10.41.116.26 DC1 RAC10   Up Normal  11.17 GB10.00%  
15312706500

There are two data centers, one with 10 nodes/2 replicas and one with 5 nodes/1 
replica.  What I've attempted to do with my token assignments is have each node 
in the smaller DC handle 20% of the keyspace, and this would mean that I should 
see roughly equal usage on all 15 boxes.  It just doesn't seem to be happening 
that way, though.  It looks like the "1 replica" nodes are carrying about half 
the data the "2 replica" nodes are.  It's almost as if those nodes are only 
handling 10% of the keyspace instead of 20%.

Does anybody have any suggestions as to what might be going on?  I've run 
nodetool getendpoints against a bunch of keys, and I always get back three 
nodes, so I'm pretty confused.  I've also run repair on a few nodes in both 
data centers, but the sizes are

RE: Hector counter question

2012-03-19 Thread Jeremiah Jordan
No,
Cassandra doesn't support atomic counters.  IIRC it is on the list of things 
for 1.2.

-Jeremiah


From: Tamar Fraenkel [ta...@tok-media.com]
Sent: Monday, March 19, 2012 1:26 PM
To: cassandra-u...@incubator.apache.org
Subject: Hector counter question

Hi!

Is there a way to read and increment counter column atomically, something like 
incrementAndGet (Hector)?

Thanks,

Tamar Fraenkel
Senior Software Engineer, TOK Media

[Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



<>

Re: repair broke TTL based expiration

2012-03-19 Thread igor
Hello

Datasize should decrease during minor compactions. Check logs for compactions 
results.

 



-Original Message-
From: Radim Kolar 
To: user@cassandra.apache.org
Sent: Mon, 19 Mar 2012 12:16
Subject: repair broke TTL based expiration

I suspect that running cluster wide repair interferes with TTL based 
expiration. I am running repair every 7 days and using TTL expiration 
time 7 days too. Data are never deleted.
Stored data in cassandra are always growing (watching them for 3 months) 
but they should not. If i run manual cleanup, some data are deleted but 
just about 5%. Currently there are about 3-5 times more rows then i 
estimate.

I suspect that running repair on data with TTL can cause:

1. time check for expired records is ignored and these data are streamed 
to other node and they will be alive again
  or
2. streaming data are propagated with full TTL. Lets say that i have ttl 
7 days, data are stored for 5 days and then repaired, they should be 
sent to other node with ttl 2 days not 7.

Can someone do testing on this case? I could not play with production 
cluster too much.


Re: repair broke TTL based expiration

2012-03-19 Thread Caleb Rackliffe
I've been wondering about this too, but every column has both a timestamp and a 
TTL.  Unless the timestamp is not preserved, there should be no need to adjust 
the TTL, assuming the expiration time is determined from these two variables.

Does that make sense?

My question is how often Cassandra checks for TTL expirations.  Does it happen 
at compaction time? Some other time?


Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:C02073B9-9A8A-49FE-89BE-9AC4419A3D3C]

From: "i...@4friends.od.ua" 
mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Mon, 19 Mar 2012 15:28:40 -0400
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: repair broke TTL based expiration


Hello

Datasize should decrease during minor compactions. Check logs for compactions 
results.





-Original Message-
From: Radim Kolar mailto:h...@filez.com>>
To: user@cassandra.apache.org
Sent: Mon, 19 Mar 2012 12:16
Subject: repair broke TTL based expiration


I suspect that running cluster wide repair interferes with TTL based
expiration. I am running repair every 7 days and using TTL expiration
time 7 days too. Data are never deleted.
Stored data in cassandra are always growing (watching them for 3 months)
but they should not. If i run manual cleanup, some data are deleted but
just about 5%. Currently there are about 3-5 times more rows then i
estimate.

I suspect that running repair on data with TTL can cause:

1. time check for expired records is ignored and these data are streamed
to other node and they will be alive again
  or
2. streaming data are propagated with full TTL. Lets say that i have ttl
7 days, data are stored for 5 days and then repaired, they should be
sent to other node with ttl 2 days not 7.

Can someone do testing on this case? I could not play with production
cluster too much.
<>

replication in a 3 data center setup

2012-03-19 Thread Alexandru Sicoe
Hi everyone,

If you have 3 data centers (DC1,DC2 and DC3) with 3 nodes each and you have
a keyspace where the strategy options are such that each DC gets 2
replicas. If you only write to the nodes in DC1 what is the path the
replicas take? Assuming you've correctly interleaved the tokens of all the
nodes [(DC1: x,y,z), (DC2:x+1,y+1,z+1), (DC3:x+2,y+2,z+2)]?

More exactly, if you write a record in a node in DC1 will it send one
replica of it to DC2 and then another replica to DC3? Or will the node in
DC2 replicate the record to DC3 in a chain effect?

I understand that each DC handles it's own internal replication (after a
node receives one replica).

I need to understand this because the connection between DC1 and DC2/DC3 is
limited and ideally I would only want to send a replica to DC2 and have DC2
send a replicat to DC3. Is this possible?

Cheers,
Alex


Re: repair broke TTL based expiration

2012-03-19 Thread Radim Kolar

Dne 19.3.2012 20:28, i...@4friends.od.ua napsal(a):


Hello

Datasize should decrease during minor compactions. Check logs for 
compactions results.



they do but not as much as i expect. Look at sizes and file dates:

-rw-r--r--  1 root  wheel   5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
-rw-r--r--  1 root  wheel   6.4G Feb 23 17:11 resultcache-hc-27047-Data.db
-rw-r--r--  1 root  wheel   5.5G Feb 25 06:40 resultcache-hc-27167-Data.db
-rw-r--r--  1 root  wheel   2.2G Mar  2 05:03 resultcache-hc-27323-Data.db
-rw-r--r--  1 root  wheel   2.0G Mar  5 09:15 resultcache-hc-27542-Data.db
-rw-r--r--  1 root  wheel   2.2G Mar 12 23:24 resultcache-hc-27791-Data.db
-rw-r--r--  1 root  wheel   468M Mar 15 03:27 resultcache-hc-27822-Data.db
-rw-r--r--  1 root  wheel   483M Mar 16 05:23 resultcache-hc-27853-Data.db
-rw-r--r--  1 root  wheel53M Mar 17 05:33 resultcache-hc-27901-Data.db
-rw-r--r--  1 root  wheel   485M Mar 17 09:37 resultcache-hc-27930-Data.db
-rw-r--r--  1 root  wheel   480M Mar 19 00:45 resultcache-hc-27961-Data.db
-rw-r--r--  1 root  wheel95M Mar 19 09:35 resultcache-hc-27967-Data.db
-rw-r--r--  1 root  wheel98M Mar 19 17:04 resultcache-hc-27973-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 18:23 resultcache-hc-27974-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 19:50 resultcache-hc-27975-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 21:17 resultcache-hc-27976-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 22:05 resultcache-hc-27977-Data.db

I insert everything with 7days TTL + 10 days tombstone expiration.  This 
means that there should not be in ideal case nothing older then Mar 2.


These 3x5 GB files waits to be compacted. Because they contains only 
tombstones, cassandra should make some optimalizations - mark sstable as 
tombstone only, remember time of latest tombstone and delete entire 
sstable without need to merge it first.


1. Question is why create tombstone after row expiration at all, because 
it will expire at all cluster nodes at same time without need to be deleted.
2. Its super column family. When i dump oldest sstable, i wonder why it 
looks like this:


{
"772c61727469636c65736f61702e636f6d": {},
"7175616b652d34": {"1": {"deletedAt": -9223372036854775808, 
"subColumns": [["crc32","4f34455c",1328220892597002,"d"], 
["id","4f34455c",1328220892597000,"d"], 
["name","4f34455c",1328220892597001,"d"], 
["size","4f34455c",1328220892597003,"d"]]}, "2": {"deletedAt": 
-9223372036854775808, "subColumns": 
[["crc32","4f34455c",1328220892597007,"d"], 
["id","4f34455c",1328220892597005,"d"], 
["name","4f34455c",1328220892597006,"d"], 
["size","4f34455c",1328220892597008,"d"]]}, "3": {"deletedAt": 
-9223372036854775808, "subColumns":


* all subcolums are deleted. why to keep their names in table? isnt 
marking column as deleted enough? "1": {"deletedAt": 
-9223372036854775808"} enough?
* another question is why was not tombstone entire row, because all its 
members were expired.


Re: repair broke TTL based expiration

2012-03-19 Thread Radim Kolar

Dne 19.3.2012 21:46, Caleb Rackliffe napsal(a):
I've been wondering about this too, but every column has both a 
timestamp /and/ a TTL.  Unless the timestamp is not preserved, there 
should be no need to adjust the TTL, assuming the expiration time is 
determined from these two variables.
timestamp is application defined, it can be anything. expire time is 
recorded into sstable in node local time.
another question is why to store original TTL? i dont think that it is 
that usefull to read it back. it would be enough to read expire time.


Re: cassandra-cli and "uncreachable" status confusion

2012-03-19 Thread Shoaib Mir
On Tue, Mar 20, 2012 at 4:18 AM, aaron morton wrote:

> There is a server side check to ensure that all available nodes share the
> same schema version.
>
>
Is that checked using "describe cluster" ??

cheers,
Shoaib


Re: repair broke TTL based expiration

2012-03-19 Thread ruslan usifov
Do you make major compaction??

2012/3/19 Radim Kolar :
> I suspect that running cluster wide repair interferes with TTL based
> expiration. I am running repair every 7 days and using TTL expiration time 7
> days too. Data are never deleted.
> Stored data in cassandra are always growing (watching them for 3 months) but
> they should not. If i run manual cleanup, some data are deleted but just
> about 5%. Currently there are about 3-5 times more rows then i estimate.
>
> I suspect that running repair on data with TTL can cause:
>
> 1. time check for expired records is ignored and these data are streamed to
> other node and they will be alive again
>  or
> 2. streaming data are propagated with full TTL. Lets say that i have ttl 7
> days, data are stored for 5 days and then repaired, they should be sent to
> other node with ttl 2 days not 7.
>
> Can someone do testing on this case? I could not play with production
> cluster too much.


Re: repair broke TTL based expiration

2012-03-19 Thread Radim Kolar

Dne 19.3.2012 23:33, ruslan usifov napsal(a):

Do you make major compaction??

no, i do cleanups only. Major compactions kills my node with OOM.


Re: repair broke TTL based expiration

2012-03-19 Thread igor
You can try to play with comaction thresholds - looks like your data wait too 
long before sizetiered compaction start to merge old large sstables. I have the 
same scenario as you (no deletes, all data with TTL) and I use script which 
call userdefinedcompaction on these old sstables.

-Original Message-
From: Radim Kolar 
To: user@cassandra.apache.org
Sent: Mon, 19 Mar 2012 23:48
Subject: Re: repair broke TTL based expiration

Dne 19.3.2012 20:28, i...@4friends.od.ua napsal(a):
>
> Hello
>
> Datasize should decrease during minor compactions. Check logs for 
> compactions results.
>
they do but not as much as i expect. Look at sizes and file dates:

-rw-r--r--  1 root  wheel   5.4G Feb 23 17:03 resultcache-hc-27045-Data.db
-rw-r--r--  1 root  wheel   6.4G Feb 23 17:11 resultcache-hc-27047-Data.db
-rw-r--r--  1 root  wheel   5.5G Feb 25 06:40 resultcache-hc-27167-Data.db
-rw-r--r--  1 root  wheel   2.2G Mar  2 05:03 resultcache-hc-27323-Data.db
-rw-r--r--  1 root  wheel   2.0G Mar  5 09:15 resultcache-hc-27542-Data.db
-rw-r--r--  1 root  wheel   2.2G Mar 12 23:24 resultcache-hc-27791-Data.db
-rw-r--r--  1 root  wheel   468M Mar 15 03:27 resultcache-hc-27822-Data.db
-rw-r--r--  1 root  wheel   483M Mar 16 05:23 resultcache-hc-27853-Data.db
-rw-r--r--  1 root  wheel53M Mar 17 05:33 resultcache-hc-27901-Data.db
-rw-r--r--  1 root  wheel   485M Mar 17 09:37 resultcache-hc-27930-Data.db
-rw-r--r--  1 root  wheel   480M Mar 19 00:45 resultcache-hc-27961-Data.db
-rw-r--r--  1 root  wheel95M Mar 19 09:35 resultcache-hc-27967-Data.db
-rw-r--r--  1 root  wheel98M Mar 19 17:04 resultcache-hc-27973-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 18:23 resultcache-hc-27974-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 19:50 resultcache-hc-27975-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 21:17 resultcache-hc-27976-Data.db
-rw-r--r--  1 root  wheel19M Mar 19 22:05 resultcache-hc-27977-Data.db

I insert everything with 7days TTL + 10 days tombstone expiration.  This 
means that there should not be in ideal case nothing older then Mar 2.

These 3x5 GB files waits to be compacted. Because they contains only 
tombstones, cassandra should make some optimalizations - mark sstable as 
tombstone only, remember time of latest tombstone and delete entire 
sstable without need to merge it first.

1. Question is why create tombstone after row expiration at all, because 
it will expire at all cluster nodes at same time without need to be deleted.
2. Its super column family. When i dump oldest sstable, i wonder why it 
looks like this:

{
"772c61727469636c65736f61702e636f6d": {},
"7175616b652d34": {"1": {"deletedAt": -9223372036854775808, 
"subColumns": [["crc32","4f34455c",1328220892597002,"d"], 
["id","4f34455c",1328220892597000,"d"], 
["name","4f34455c",1328220892597001,"d"], 
["size","4f34455c",1328220892597003,"d"]]}, "2": {"deletedAt": 
-9223372036854775808, "subColumns": 
[["crc32","4f34455c",1328220892597007,"d"], 
["id","4f34455c",1328220892597005,"d"], 
["name","4f34455c",1328220892597006,"d"], 
["size","4f34455c",1328220892597008,"d"]]}, "3": {"deletedAt": 
-9223372036854775808, "subColumns":

* all subcolums are deleted. why to keep their names in table? isnt 
marking column as deleted enough? "1": {"deletedAt": 
-9223372036854775808"} enough?
* another question is why was not tombstone entire row, because all its 
members were expired.


Storing Counters in Hive

2012-03-19 Thread Sunit Randhawa
I am trying to store Counters CF from cassandra to Hive. Below is the
CREATE TABLE syntax in Hive:

DROP TABLE IF EXISTS Counters;
create external table Counters(row_key string, column_name string, value
string)
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key,:column,:value",
  "cassandra.ks.name" = "BAMSchema",
  "cassandra.ks.repfactor" = "1",
  "cassandra.ks.strategy" = "org.apache.cassandra.locator.SimpleStrategy",
  "cassandra.cf.name" = "Counters" ,
  "cassandra.host" = "127.0.0.1" ,
  "cassandra.port" = "9160",
  "cassandra.partitioner" = "org.apache.cassandra.dht.RandomPartitioner")
TBLPROPERTIES (
  "cassandra.input.split.size" = "64000",
  "cassandra.range.size" = "1000",
  "cassandra.slice.predicate.size" = "1000");

and Counter CF is defined as :

create column family Counters
with comparator = UTF8Type
and default_validation_class=CounterColumnType
  and replicate_on_write=true;


I am not able to import the Counter value in Hive. I am getting other
row_key and column_name properly.


Below is the output from Hive:

hive> select * from Counters;
OK
213_debit_1326691-sess_countd
213_debit_1326691-total_db_time
213_debit_1326691-total_exec_time
213_debit_1326691-txn_count
213_debit_1326692-sess_count
213_debit_1326692-total_db_time
213_debit_1326692-total_exec_time
213_debit_1326692-txn_count
Time taken: 0.263 seconds


Below is output from Cassandra:

[default@BAMSchema] list Counters;
Using default limit of 100
---
RowKey: 213_debit_132669
=> (counter=1-sess_count, value=100)
=> (counter=1-total_db_time, value=20)
=> (counter=1-total_exec_time, value=30)
=> (counter=1-txn_count, value=1)
=> (counter=2-sess_count, value=30)
=> (counter=2-total_db_time, value=30)
=> (counter=2-total_exec_time, value=30)
=> (counter=2-txn_count, value=1)


As you can see that "d" junk letter is getting added in Hive when import
happens from Cassandra to Hive. Wondering what am I missing.

Thanks for your help!


Re: repair broke TTL based expiration

2012-03-19 Thread ruslan usifov
cleanup in you case doesn't have any seens. You write that repair work
for you, so you can stop cassandra daemon, delete all data from folder
that contain problem data, start cassandra daemon, and run nodetool
repair, but in this case ypu must have replication factor for keyspace
> 3 and have consistency level for data manipulation QUORUM

2012/3/20 Radim Kolar :
> Dne 19.3.2012 23:33, ruslan usifov napsal(a):
>
>> Do you make major compaction??
>
> no, i do cleanups only. Major compactions kills my node with OOM.


Mutator or Template?

2012-03-19 Thread Tamar Fraenkel
Hi!
I am using Cassandra with Hector. Usually I use ColumnFamilyTemplate and
ColumnFamilyUpdater to update column families, but sometimes I use Mutator.

1. Is there a preference of using one vs. the other?
2. Are there any actions that can be done with only one of them?

Thanks,


*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
<>

Cassandra as Database for Role Based Access Control System

2012-03-19 Thread Maciej Miklas
Hi *,

I would like to know your opinion about using Cassandra to implement a
RBAC-like authentication & authorization model. We have simplified the
central relationship of the general model (
http://en.wikipedia.org/wiki/Role-based_access_control) to:

user ---n:m--- role ---n:m--- resource

user(s) and resource(s) are indexed with externally visible identifiers.
These identifiers need to be "re-ownable" (think: mail aliases), too.

The main reason to consider Cassandra is the availability, scalability and
(global) geo-redundancy. This is hard to achieve with a RBDMS.

On the other side, RBAC has many m:n relations. While some inconsistencies
may be acceptable, resource ownership (i.e. role=owner) must never ever be
mixed up.

What do you think? Is such relational model an antipattern for Cassandra
usage? Do you know similar solutions based on Cassandra?


Regards,

Maciej


ps. I've posted this question also on stackoverflow, but I would like to
also get feedback from Cassandra community.


Re: Hector counter question

2012-03-19 Thread Tamar Fraenkel
Thanks.
But the increment is thread safe right? if I have two threads trying to
increment a counter, then they won't step on each other toe?


*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, Mar 19, 2012 at 9:05 PM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:

>  No,
> Cassandra doesn't support atomic counters.  IIRC it is on the list of
> things for 1.2.
>
> -Jeremiah
>
>  --
> *From:* Tamar Fraenkel [ta...@tok-media.com]
> *Sent:* Monday, March 19, 2012 1:26 PM
> *To:* cassandra-u...@incubator.apache.org
> *Subject:* Hector counter question
>
>   Hi!
>
>  Is there a way to read and increment counter column atomically,
> something like incrementAndGet (Hector)?
>
>  Thanks,
>
>  *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
<><>