Re: Single Node Cassandra Installation

2012-03-17 Thread R. Verlangen
" By default Cassandra tries to write to both nodes, always. Writes will
only fail (on a node) if it is down, and even then hinted handoff will
attempt to keep both nodes in sync when the troubled node comes back up.
The point of having two nodes is to have read and write availability in the
face of transient failure. "

Even more: if you enable read repair the chances of having bad writes
decreases for any further reads. This will make your cluster become faster
consistent again after some failure.

Also consider to use different CL's for different operations. E.g. the
Twitter timeline can miss some records, however if you would want to
display my bank account I would prefer to see the right thing: or a nice
error message.

2012/3/16 Ben Coverston 

> Doing reads and writes at CL=1 with RF=2 N=2 does not imply that the reads
> will be inconsistent. It's more complicated than the simple counting of
> blocked replicas. It is easy to support the notion that it will be largely
> consistent, in fact very consistent for most use cases.
>
> By default Cassandra tries to write to both nodes, always. Writes will
> only fail (on a node) if it is down, and even then hinted handoff will
> attempt to keep both nodes in sync when the troubled node comes back up.
> The point of having two nodes is to have read and write availability in the
> face of transient failure.
>
> If you are interested there is a good exposition of what 'consistency'
> means in a system like Cassandra from the link below[1].
>
> [1]
> http://www.eecs.berkeley.edu/~pbailis/projects/pbs/
>
>
> On Fri, Mar 16, 2012 at 6:50 AM, Thomas van Neerijnen <
> t...@bossastudios.com> wrote:
>
>> You'll need to either read or write at at least quorum to get consistent
>> data from the cluster so you may as well do both.
>> Now that you mention it, I was wrong about downtime, with a two node
>> cluster reads or writes at quorum will mean both nodes need to be online.
>> Perhaps you could have an emergency switch in your application which flips
>> to consistency of 1 if one of your Cassandra servers goes down? Just make
>> sure it's set back to quorum when the second one returns or again you could
>> end up with inconsistent data.
>>
>>
>> On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcharian  wrote:
>>
>>> Thanks for the comments, I guess I will end up doing a 2 node cluster
>>> with replica count 2 and read consistency 1.
>>>
>>> -- Drew
>>>
>>>
>>>
>>> On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote:
>>>
>>> So long as data loss and downtime are acceptable risks a one node
>>> cluster is fine.
>>> Personally this is usually only acceptable on my workstation, even my
>>> dev environment is redundant, because servers fail, usually when you least
>>> want them to, like for example when you've decided to save costs by waiting
>>> before implementing redundancy. Could a failure end up costing you more
>>> than you've saved? I'd rather get cheaper servers (maybe even used off
>>> ebay??) so I could have at least two of them.
>>>
>>> If you do go with a one node solution, altho I haven't tried it myself
>>> Priam looks like a good place to start for backups, otherwise roll your own
>>> with incremental snapshotting turned on and a watch on the snapshot
>>> directory. Storage on something like S3 or Cloud Files is very cheap so
>>> there's no good excuse for no backups.
>>>
>>> On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen  wrote:
>>>
 Hi Drew,

 One other disadvantage is the lack of "consistency level" and
 "replication". Both ware part of the high availability / redundancy. So you
 would really need to backup your single-node-"cluster" to some other
 external location.

 Good luck!


 2012/3/15 Drew Kutcharian 

> Hi,
>
> We are working on a project that initially is going to have very
> little data, but we would like to use Cassandra to ease the future
> scalability. Due to budget constraints, we were thinking to run a single
> node Cassandra for now and then add more nodes as required.
>
> I was wondering if it is recommended to run a single node cassandra in
> production? Are there any other issues besides lack of high availability?
>
> Thanks,
>
> Drew
>
>

>>>
>>>
>>
>
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
>
>


Re: 0.8.1 Vs 1.0.7

2012-03-17 Thread R. Verlangen
Check your log for messages about rebuilding indices: that might grow your
dataset some.

One thing is for sure: the data import removed all the crap that lasted in
the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly
dramatic but not unlogical at all.

2012/3/16 Jeremiah Jordan 

>  I would guess more aggressive compaction settings, did you update rows
> or insert some twice?
> If you run major compaction a couple times on the 0.8.1 cluster does the
> data size get smaller?
>
> You can use the "describe" command to check if compression got turned on.
>
> -Jeremiah
>
>  --
> *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
> *Sent:* Thursday, March 15, 2012 4:41 AM
> *To:* user@cassandra.apache.org
> *Subject:* 0.8.1 Vs 1.0.7
>
>  Hi,
>
>  I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results
> were a little bit surprising
>
>  0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch
>
>  XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB
> 12.50%
> XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB
> 12.50%
> XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB
> 12.50%
> XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB
> 12.50%
> XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB
> 12.50%
> XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB
> 12.50%
> XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB
> 12.50%
> XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB
> 12.50%
>
>  1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c
> yet to join ring],
> PropertyFileSnitch
>
>  XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
> XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%
> XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%
>
> XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%
> XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%
>
> XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%
>
> XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%
> XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%
>
>  There seems to be 3X savings in size for the same dataset running 1.0.7.
> I have not enabled compression for any of the CFs. Will it be enabled by
> default when creating a new CF in 1.0.7? cassandra.yaml is also mostly
> identical.
>
>  Thanks and Regards,
> Ravi
>


Re: Composite Key Query in CLI

2012-03-17 Thread Tamar Fraenkel
I think you are doing ok,
I have a CF with the following schema
 ColumnFamily: tk_counters
  Key Validation Class:
org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UUIDType)
  Default column value validator:
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  Key cache size / save period in seconds: 20.0/14400
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

*and the following cli query:*
get tk_counters['d:9eff24f7-949f-487b-a566-0dedd07656ce'];
*returns:*
=> (counter=no, value=1)
=> (counter=yes, value=2)
Returned 2 results.



*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Tue, Mar 13, 2012 at 11:28 PM, Ali Basiri  wrote:

> Hey,
>
> I'm have a set of composite keys with data and trying to query them
> through the CLI. However, the result set returned is always empty.
>
> The schema is like this:
>
>ColumnFamily: Routes
>   Key Validation Class:
> org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.TimeUUIDType,org.apache.cassandra.db.marshal.IntegerType)
>   Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   Row Cache Provider:
> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
>   ...
>
> The Data:
> ---
> RowKey: fd24a000-6d51-11e1-a260-109addb27473:4
> => (column=enabled, value=true, timestamp=1331673484419000)
> => (column=providerId, value=0575af10-6d52-11e1-a260-109addb27473,
> timestamp=1331673484419001)
> ---
> RowKey: fd24a000-6d51-11e1-a260-109addb27473:5
> => (column=enabled, value=true, timestamp=1331673476181000)
> => (column=providerId, value=0086b6c0-6d52-11e1-a260-109addb27473,
> timestamp=1331673476181001)
> ---
>
>
> The Query:
> >  get Routes['fd24a000-6d51-11e1-a260-109addb27473:4'];
> Returned 0 results.
> Elapsed time: 4 msec(s).
>
> The cli correctly identifies the composite key types if I type them wrong.
> From example an 'a' instead of the '4'.
>
> What am I doing wrong?
>
> Thanks.
>
<>