Re: can't start cqlsh on new Amazon node

2012-11-08 Thread Tamar Fraenkel
Hi
A bit more info on that
I have one working setup with
python-cql1.0.9-1
python-thrift  0.6.0-2~riptano1
cassandra1.0.8

The setup where cqlsh is not working has:
python-cql1.0.10-1
python-thrift  0.6.0-2~riptano1
cassandra1.0.11

Maybe this will give someone a hint of what the problem may be and how to
solve it.
Thanks!

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Thu, Nov 8, 2012 at 9:38 AM, Tamar Fraenkel  wrote:

> Nope...
> Same error:
>
> *cqlsh --debug --cql3 localhost 9160*
>
> Using CQL driver:  '/usr/lib/pymodules/python2.6/cql/__init__.pyc'>
> Using thrift lib:  '/usr/lib/pymodules/python2.6/thrift/__init__.pyc'>
> Connection error: Invalid method name: 'set_cql_version'
>
> I believe it is some version mismatch. But this was DataStax AMI, I
> thought all should be coordinated, and I am not sure what to check for.
>
>
> Thanks,
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
> On Thu, Nov 8, 2012 at 4:56 AM, Jason Wee  wrote:
>
>> should it be --cql3 ?
>> http://www.datastax.com/docs/1.1/dml/using_cql#start-cql3
>>
>>
>>
>> On Wed, Nov 7, 2012 at 11:16 PM, Tamar Fraenkel wrote:
>>
>>> Hi!
>>> I installed new cluster using DataStax AMI with --release 1.0.11, so I
>>> have cassandra 1.0.11 installed.
>>> Nodes have python-cql 1.0.10-1 and python2.6
>>>
>>> Cluster works well, BUT when I try to connect to the cqlsh I get:
>>> *cqlsh --debug --cqlversion=2 localhost 9160*
>>> Using CQL driver: >> '/usr/lib/pymodules/python2.6/cql/__init__.pyc'>
>>> Using thrift lib: >> '/usr/lib/pymodules/python2.6/thrift/__init__.pyc'>
>>> Connection error: Invalid method name: 'set_cql_version'
>>> *
>>> *This is the same if I chose cqlversion=3*
>>>
>>> *Any idea how to solve?*
>>>
>>> *Thanks,*
>>>
>>> Tamar Fraenkel *
>>> Senior Software Engineer, TOK Media
>>>
>>> [image: Inline image 1]
>>>
>>> ta...@tok-media.com
>>> Tel:   +972 2 6409736
>>> Mob:  +972 54 8356490
>>> Fax:   +972 2 5612956
>>>
>>>
>>>
>>>
>>
>
<>

Storage limit for a particular user on Cassandra

2012-11-08 Thread mallikharjun.vemana
Hi,

Is there a way we can limit the data of a particular user on the Cassandra 
cluster?

Say for example, I have three users namely, Jsmith, Elvis, Dilbert configured 
in my Cassandra deployment.
And I wanted to limit the data usage for them as follows.

Jsmith - 1 GB
Elvis - 2 GB
Dilbert - 500 MB

Is there a way to achieve by fine tuning the configuration?
If not, any workarounds?

Thanks,
~Mallik.



Compact and Repair

2012-11-08 Thread Henrik Schröder
Hi,

We recently ran a major compaction across our cluster, which reduced the
storage used by about 50%. This is fine, since we do a lot of updates to
existing data, so that's the expected result.

The day after, we ran a full repair -pr across the cluster, and when that
finished, each storage node was at about the same size as before the major
compaction. Why does that happen? What gets transferred to other nodes, and
why does it suddenly take up a lot of space again?

We haven't run repair -pr regularly, so is this just something that happens
on the first weekly run, and can we expect a different result next week? Or
does repair always cause the data to grow on each node? To me it just
doesn't seem proportional?


/Henrik


Re: Strange delay in query

2012-11-08 Thread André Cruz
On Nov 7, 2012, at 12:15 PM, André Cruz  wrote:

> This error also happens on my application that uses pycassa, so I don't think 
> this is the same bug.

I have narrowed it down to a slice between two consecutive columns. Observe 
this behaviour using pycassa:

>>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
>>>  column_count=2, 
>>> column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 
Connection 52905488 (xxx:9160) was checked out from pool 51715344
DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 
Connection 52905488 (xxx:9160) was checked in to pool 51715344
[UUID('13957152-234b-11e2-92bc-e0db550199f4'), 
UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]

A two column slice took more than 2s to return. If I request the next 2 column 
slice:

>>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
>>>  column_count=2, 
>>> column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 
Connection 52904912 (xxx:9160) was checked out from pool 51715344
DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 
Connection 52904912 (xxx:9160) was checked in to pool 51715344
[UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), 
UUID('a364b028-2449-11e2-8882-e0db550199f4')]

This takes 20msec... Is there a rational explanation for this different 
behaviour? Is there some threshold that I'm running into? Is there any way to 
obtain more debugging information about this problem?

Thanks,
André

Re: Compact and Repair

2012-11-08 Thread Henrik Schröder
No, we're not using columns with TTL, and I performed a major compaction
before the repair, so there shouldn't be vast amounts of tombstones moving
around.

And the increase happened during the repair, the nodes gained ~20-30GB each.


/Henrik


On Thu, Nov 8, 2012 at 12:40 PM, horschi  wrote:

> Hi,
>
> is it possible that your repair is overrepairing due to any of the issues
> discussed here:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html?
>
>
> I've seen repair increasing the load on my cluster, but what you are
> describing sounds like a lot to me.
>
> Does this increase happen due to repair entirely? Or was the load maybe
> increasing gradually over the week and you just checked for the first time?
>
> cheers,
> Christian
>
>
>
> On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder wrote:
>
>> Hi,
>>
>> We recently ran a major compaction across our cluster, which reduced the
>> storage used by about 50%. This is fine, since we do a lot of updates to
>> existing data, so that's the expected result.
>>
>> The day after, we ran a full repair -pr across the cluster, and when that
>> finished, each storage node was at about the same size as before the major
>> compaction. Why does that happen? What gets transferred to other nodes, and
>> why does it suddenly take up a lot of space again?
>>
>> We haven't run repair -pr regularly, so is this just something that
>> happens on the first weekly run, and can we expect a different result next
>> week? Or does repair always cause the data to grow on each node? To me it
>> just doesn't seem proportional?
>>
>>
>> /Henrik
>>
>
>


Re: Compact and Repair

2012-11-08 Thread Alain RODRIGUEZ
Did you change the RF or had a node down since you repaired last time ?


2012/11/8 Henrik Schröder 

> No, we're not using columns with TTL, and I performed a major compaction
> before the repair, so there shouldn't be vast amounts of tombstones moving
> around.
>
> And the increase happened during the repair, the nodes gained ~20-30GB
> each.
>
>
> /Henrik
>
>
>
> On Thu, Nov 8, 2012 at 12:40 PM, horschi  wrote:
>
>> Hi,
>>
>> is it possible that your repair is overrepairing due to any of the issues
>> discussed here:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html?
>>
>>
>> I've seen repair increasing the load on my cluster, but what you are
>> describing sounds like a lot to me.
>>
>> Does this increase happen due to repair entirely? Or was the load maybe
>> increasing gradually over the week and you just checked for the first time?
>>
>> cheers,
>> Christian
>>
>>
>>
>> On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder wrote:
>>
>>> Hi,
>>>
>>> We recently ran a major compaction across our cluster, which reduced the
>>> storage used by about 50%. This is fine, since we do a lot of updates to
>>> existing data, so that's the expected result.
>>>
>>> The day after, we ran a full repair -pr across the cluster, and when
>>> that finished, each storage node was at about the same size as before the
>>> major compaction. Why does that happen? What gets transferred to other
>>> nodes, and why does it suddenly take up a lot of space again?
>>>
>>> We haven't run repair -pr regularly, so is this just something that
>>> happens on the first weekly run, and can we expect a different result next
>>> week? Or does repair always cause the data to grow on each node? To me it
>>> just doesn't seem proportional?
>>>
>>>
>>> /Henrik
>>>
>>
>>
>


Re: Compact and Repair

2012-11-08 Thread Henrik Schröder
No, we haven't changed RF, but it's been a very long time since we repaired
last, so we're guessing this is an effect of not running repair regularly,
and that doing it regularly will fix it. It would just be nice to know.

Also, running major compaction after the repair made the data size shrink
back to what it was before, soe clearly a lot of junk data was sent over on
that repair, most probably tombstones of some kind, as discussed in the
other thread.


/Henrik


On Thu, Nov 8, 2012 at 1:53 PM, Alain RODRIGUEZ  wrote:

> Did you change the RF or had a node down since you repaired last time ?
>
>
> 2012/11/8 Henrik Schröder 
>
>> No, we're not using columns with TTL, and I performed a major compaction
>> before the repair, so there shouldn't be vast amounts of tombstones moving
>> around.
>>
>> And the increase happened during the repair, the nodes gained ~20-30GB
>> each.
>>
>>
>> /Henrik
>>
>>
>>
>> On Thu, Nov 8, 2012 at 12:40 PM, horschi  wrote:
>>
>>> Hi,
>>>
>>> is it possible that your repair is overrepairing due to any of the
>>> issues discussed here:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html?
>>>
>>>
>>> I've seen repair increasing the load on my cluster, but what you are
>>> describing sounds like a lot to me.
>>>
>>> Does this increase happen due to repair entirely? Or was the load maybe
>>> increasing gradually over the week and you just checked for the first time?
>>>
>>> cheers,
>>> Christian
>>>
>>>
>>>
>>> On Thu, Nov 8, 2012 at 11:55 AM, Henrik Schröder wrote:
>>>
 Hi,

 We recently ran a major compaction across our cluster, which reduced
 the storage used by about 50%. This is fine, since we do a lot of updates
 to existing data, so that's the expected result.

 The day after, we ran a full repair -pr across the cluster, and when
 that finished, each storage node was at about the same size as before the
 major compaction. Why does that happen? What gets transferred to other
 nodes, and why does it suddenly take up a lot of space again?

 We haven't run repair -pr regularly, so is this just something that
 happens on the first weekly run, and can we expect a different result next
 week? Or does repair always cause the data to grow on each node? To me it
 just doesn't seem proportional?


 /Henrik

>>>
>>>
>>
>


How to insert composite column in CQL3?

2012-11-08 Thread Alan Ristić
Hi there!

I'm strugguling to figure out (for quite few hours now) how can I insert
for example column with TimeUUID name and empy value in CQL3 in fictional
table. And what's the table design? I'm interested in syntax (e.g. example).

I'm trying to do something like Matt Dennis did here (*Cassandra NYC 2011:
Matt Dennis - Data Modeling Workshop*):
http://www.youtube.com/watch?v=OzBJrQZjge0&t=9m45s

Is that even possible in CQL3? Tnx.

Lp,
*Alan Ristić*


Re: Strange delay in query

2012-11-08 Thread Andrey Ilinykh
What is the size of columns? Probably those two are huge.


On Thu, Nov 8, 2012 at 4:01 AM, André Cruz  wrote:

> On Nov 7, 2012, at 12:15 PM, André Cruz  wrote:
>
> > This error also happens on my application that uses pycassa, so I don't
> think this is the same bug.
>
> I have narrowed it down to a slice between two consecutive columns.
> Observe this behaviour using pycassa:
>
> >>>
> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
> column_count=2,
> column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
> DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976
> Connection 52905488 (xxx:9160) was checked out from pool 51715344
> DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976
> Connection 52905488 (xxx:9160) was checked in to pool 51715344
> [UUID('13957152-234b-11e2-92bc-e0db550199f4'),
> UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]
>
> A two column slice took more than 2s to return. If I request the next 2
> column slice:
>
> >>>
> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
> column_count=2,
> column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
> DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976
> Connection 52904912 (xxx:9160) was checked out from pool 51715344
> DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976
> Connection 52904912 (xxx:9160) was checked in to pool 51715344
> [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'),
> UUID('a364b028-2449-11e2-8882-e0db550199f4')]
>
> This takes 20msec... Is there a rational explanation for this different
> behaviour? Is there some threshold that I'm running into? Is there any way
> to obtain more debugging information about this problem?
>
> Thanks,
> André


leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
we are having the problem where we have huge SSTABLEs with tombstoned data
in them that is not being compacted soon enough (because size tiered
compaction requires, by default, 4 like sized SSTABLEs).  this is using
more disk space than we anticipated.

we are very write heavy compared to reads, and we delete the data after N
number of days (depends on the column family, but N is around 7 days)

my question is would leveled compaction help to get rid of the tombstoned
data faster than size tiered, and therefore reduce the disk space usage?

thx


Re: leveled compaction and tombstoned data

2012-11-08 Thread Radim Kolar

Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
my question is would leveled compaction help to get rid of the 
tombstoned data faster than size tiered, and therefore reduce the disk 
space usage?


leveled compaction will kill your performance. get patch from jira for 
maximum sstable size per CF and force cassandra to make smaller tables, 
they expire faster.




Re: How to insert composite column in CQL3?

2012-11-08 Thread Alan Ristić
Ok, this article answered all the confusion in my head:
http://www.datastax.com/dev/blog/thrift-to-cql3

It's a must read for noobs (like me). It perfectly explains mappings and
diffs between internals and CQL3(abstractions). First read this and THEN go
study all the resources out there ;)

Lp,
Alan Ristić

Lp,
*Alan Ristić*

*m*: 040 423 688



2012/11/8 Alan Ristić 

> Hi there!
>
> I'm strugguling to figure out (for quite few hours now) how can I insert
> for example column with TimeUUID name and empy value in CQL3 in fictional
> table. And what's the table design? I'm interested in syntax (e.g. example).
>
> I'm trying to do something like Matt Dennis did here (*Cassandra NYC
> 2011: Matt Dennis - Data Modeling Workshop*):
> http://www.youtube.com/watch?v=OzBJrQZjge0&t=9m45s
>
> Is that even possible in CQL3? Tnx.
>
> Lp,
> *Alan Ristić*
>
>


Kundera 2.2 released

2012-11-08 Thread Amresh Kumar Singh
Hi All,

We are happy to announce release of Kundera 2.2.

Kundera is a JPA 2.0 based, object-datastore mapping library for NoSQL 
datastores. The idea behind Kundera is to make working with NoSQL Databases
drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB and 
relational databases.

Major Changes in this release:
---
* Geospatial Persistence and Queries for MongoDB
* Composite keys support for Cassandra and MongoDB
* Cassandra 1.1.6 migration
* Support for enum data type
* Named and Native queries support for REST based access

Github Issues Fixes (https://github.com/impetus-opensource/Kundera/issues):
--
Issue 136 - JPQL queries without WHERE clause or parameters fail
Issue 135 - MongoDB: enable WriteConcern, Safe mode and other properties on 
operation level.
Issue 133 - Externalize the database connection configuration
Issue 132 - problem in loading entity metadata when giving class name in class 
tag of persistence.xml
Issue 130 - Row not fully deleted from cassandra on em.remove(obj) - then 
cannot reinsert row with same key

We have revamped our wiki, so you might want to have a look at it here:
https://github.com/impetus-opensource/Kundera/wiki

To download, use or contribute to Kundera, visit:
http://github.com/impetus-opensource/Kundera

Latest released tag version is 2.2. Kundera maven libraries are now available 
at: https://oss.sonatype.org/content/repositories/releases/com/impetus

Sample codes and examples for using Kundera can be found here:
http://github.com/impetus-opensource/Kundera-Examples
and
https://github.com/impetus-opensource/Kundera/tree/trunk/kundera-tests

Thank you all for your contributions!

Regards,
Kundera Team.



Neustar VP and Impetus CEO to present on 'Innovative information services 
powered by Cloud and Big Data technologies'at Cloud Expo - Santa Clara, Nov 
6th. http://www.impetus.com/events#2.

Check out Impetus contribution to build Luminar - a new business unit at 
Entravision. http://lf1.me/MS/


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
we are running Datastax enterprise and cannot patch it.  how bad is
"kill performance"?  if it is so bad, why is it an option?


On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar  wrote:
> Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
>
>> my question is would leveled compaction help to get rid of the tombstoned
>> data faster than size tiered, and therefore reduce the disk space usage?
>>
> leveled compaction will kill your performance. get patch from jira for
> maximum sstable size per CF and force cassandra to make smaller tables, they
> expire faster.
>


Re: leveled compaction and tombstoned data

2012-11-08 Thread Aaron Turner
"kill performance" is relative.  Leveled Compaction basically costs 2x disk
IO.  Look at iostat, etc and see if you have the headroom.

There are also ways to bring up a test node and just run Level Compaction
on that.  Wish I had a URL handy, but hopefully someone else can find it.

Also, if you're not using compression, check it out.

On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss  wrote:

> we are running Datastax enterprise and cannot patch it.  how bad is
> "kill performance"?  if it is so bad, why is it an option?
>
>
> On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar  wrote:
> > Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
> >
> >> my question is would leveled compaction help to get rid of the
> tombstoned
> >> data faster than size tiered, and therefore reduce the disk space usage?
> >>
> > leveled compaction will kill your performance. get patch from jira for
> > maximum sstable size per CF and force cassandra to make smaller tables,
> they
> > expire faster.
> >
>



-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
"carpe diem quam minimum credula postero"


Re: leveled compaction and tombstoned data

2012-11-08 Thread Jeremy Hanna
LCS works well in specific circumstances, this blog post gives some good 
considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

On Nov 8, 2012, at 1:33 PM, Aaron Turner  wrote:

> "kill performance" is relative.  Leveled Compaction basically costs 2x disk 
> IO.  Look at iostat, etc and see if you have the headroom.
> 
> There are also ways to bring up a test node and just run Level Compaction on 
> that.  Wish I had a URL handy, but hopefully someone else can find it.
> 
> Also, if you're not using compression, check it out.
> 
> On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss  wrote:
> we are running Datastax enterprise and cannot patch it.  how bad is
> "kill performance"?  if it is so bad, why is it an option?
> 
> 
> On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar  wrote:
> > Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
> >
> >> my question is would leveled compaction help to get rid of the tombstoned
> >> data faster than size tiered, and therefore reduce the disk space usage?
> >>
> > leveled compaction will kill your performance. get patch from jira for
> > maximum sstable size per CF and force cassandra to make smaller tables, they
> > expire faster.
> >
> 
> 
> 
> -- 
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
> Windows
> Those who would give up essential Liberty, to purchase a little temporary 
> Safety, deserve neither Liberty nor Safety.  
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
> 



Re: leveled compaction and tombstoned data

2012-11-08 Thread Brandon Williams
On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner  wrote:
> There are also ways to bring up a test node and just run Level Compaction on
> that.  Wish I had a URL handy, but hopefully someone else can find it.

This rather handsome fellow wrote a blog about it:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling

-Brandon


Re: leveled compaction and tombstoned data

2012-11-08 Thread Ben Coverston
http://www.datastax.com/docs/1.1/operations/tuning#testing-compaction-and-compression

Write Survey mode.

After you have it up and running you can modify the column family mbean to
use LeveledCompactionStrategy on that node to see how your hardware/load
fares with LCS.


On Thu, Nov 8, 2012 at 11:33 AM, Aaron Turner  wrote:

> "kill performance" is relative.  Leveled Compaction basically costs 2x
> disk IO.  Look at iostat, etc and see if you have the headroom.
>
> There are also ways to bring up a test node and just run Level Compaction
> on that.  Wish I had a URL handy, but hopefully someone else can find it.
>
> Also, if you're not using compression, check it out.
>
>
> On Thu, Nov 8, 2012 at 11:20 AM, B. Todd Burruss  wrote:
>
>> we are running Datastax enterprise and cannot patch it.  how bad is
>> "kill performance"?  if it is so bad, why is it an option?
>>
>>
>> On Thu, Nov 8, 2012 at 10:17 AM, Radim Kolar  wrote:
>> > Dne 8.11.2012 19:12, B. Todd Burruss napsal(a):
>> >
>> >> my question is would leveled compaction help to get rid of the
>> tombstoned
>> >> data faster than size tiered, and therefore reduce the disk space
>> usage?
>> >>
>> > leveled compaction will kill your performance. get patch from jira for
>> > maximum sstable size per CF and force cassandra to make smaller tables,
>> they
>> > expire faster.
>> >
>>
>
>
>
> --
> Aaron Turner
> http://synfin.net/ Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
> -- Benjamin Franklin
> "carpe diem quam minimum credula postero"
>
>


-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: leveled compaction and tombstoned data

2012-11-08 Thread Ben Coverston
Also to answer your question, LCS is well suited to workloads where
overwrites and tombstones come into play. The tombstones are _much_ more
likely to be merged with LCS than STCS.

I would be careful with the patch that was referred to above, it hasn't
been reviewed, and from a glance it appears that it will cause an infinite
compaction loop if you get more than 4 SSTables at max size.



On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams  wrote:

> On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner  wrote:
> > There are also ways to bring up a test node and just run Level
> Compaction on
> > that.  Wish I had a URL handy, but hopefully someone else can find it.
>
> This rather handsome fellow wrote a blog about it:
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
>
> -Brandon
>



-- 
Ben Coverston
DataStax -- The Apache Cassandra Company


Re: leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
thanks for the links!  i had forgotten about live sampling

On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams  wrote:
> On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner  wrote:
>> There are also ways to bring up a test node and just run Level Compaction on
>> that.  Wish I had a URL handy, but hopefully someone else can find it.
>
> This rather handsome fellow wrote a blog about it:
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
>
> -Brandon


Re: Strange delay in query

2012-11-08 Thread André Cruz
These are the two columns in question:

=> (super_column=13957152-234b-11e2-92bc-e0db550199f4,
 (column=attributes, value=, timestamp=1351681613263657)
 (column=blocks, 
value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q,
 timestamp=1351681613263657)
 (column=hash, 
value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg,
 timestamp=1351681613263657)
 (column=icon, value=image_jpg, timestamp=1351681613263657)
 (column=is_deleted, value=true, timestamp=1351681613263657)
 (column=is_dir, value=false, timestamp=1351681613263657)
 (column=mime_type, value=image/jpeg, timestamp=1351681613263657)
 (column=mtime, value=1351646803, timestamp=1351681613263657)
 (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg, 
timestamp=1351681613263657)
 (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4, 
timestamp=1351681613263657)
 (column=size, value=1379001, timestamp=1351681613263657)
 (column=thumb_exists, value=true, timestamp=1351681613263657))
=> (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4,
 (column=attributes, value={"posix": 420}, timestamp=1351790781154800)
 (column=blocks, 
value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ,
 timestamp=1351790781154800)
 (column=hash, 
value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw,
 timestamp=1351790781154800)
 (column=icon, value=text_txt, timestamp=1351790781154800)
 (column=is_dir, value=false, timestamp=1351790781154800)
 (column=mime_type, value=text/plain, timestamp=1351790781154800)
 (column=mtime, value=1351378576, timestamp=1351790781154800)
 (column=name, value=/Documents/VIMDocument.txt, timestamp=1351790781154800)
 (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4, 
timestamp=1351790781154800)
 (column=size, value=13, timestamp=1351790781154800)
 (column=thumb_exists, value=false, timestamp=1351790781154800))


I don't think their size is an issue here.

André

On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh  wrote:

> What is the size of columns? Probably those two are huge.
> 
> 
> On Thu, Nov 8, 2012 at 4:01 AM, André Cruz  wrote:
> On Nov 7, 2012, at 12:15 PM, André Cruz  wrote:
> 
> > This error also happens on my application that uses pycassa, so I don't 
> > think this is the same bug.
> 
> I have narrowed it down to a slice between two consecutive columns. Observe 
> this behaviour using pycassa:
> 
> >>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
> >>>  column_count=2, 
> >>> column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
> DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849 139928791262976 
> Connection 52905488 (xxx:9160) was checked out from pool 51715344
> DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849 139928791262976 
> Connection 52905488 (xxx:9160) was checked in to pool 51715344
> [UUID('13957152-234b-11e2-92bc-e0db550199f4'), 
> UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]
> 
> A two column slice took more than 2s to return. If I request the next 2 
> column slice:
> 
> >>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
> >>>  column_count=2, 
> >>> column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
> DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849 139928791262976 
> Connection 52904912 (xxx:9160) was checked out from pool 51715344
> DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849 139928791262976 
> Connection 52904912 (xxx:9160) was checked in to pool 51715344
> [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'), 
> UUID('a364b028-2449-11e2-8882-e0db550199f4')]
> 
> This takes 20msec... Is there a rational explanation for this different 
> behaviour? Is there some threshold that I'm running into? Is there any way to 
> obtain more debugging information about this problem?
> 
> Thanks,
> André
> 



Re: leveled compaction and tombstoned data

2012-11-08 Thread B. Todd Burruss
@ben, thx, we will be deploying 2.2.1 of DSE soon and will try to
setup a traffic sampling node so we can test leveled compaction.

we essentially keep a rolling window of data written once.  it is
written, then after N days it is deleted, so it seems that leveled
compaction should help

On Thu, Nov 8, 2012 at 11:53 AM, B. Todd Burruss  wrote:
> thanks for the links!  i had forgotten about live sampling
>
> On Thu, Nov 8, 2012 at 11:41 AM, Brandon Williams  wrote:
>> On Thu, Nov 8, 2012 at 1:33 PM, Aaron Turner  wrote:
>>> There are also ways to bring up a test node and just run Level Compaction on
>>> that.  Wish I had a URL handy, but hopefully someone else can find it.
>>
>> This rather handsome fellow wrote a blog about it:
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
>>
>> -Brandon


Re: Hinted Handoff runs every ten minutes

2012-11-08 Thread Mike Heffner
Is there a ticket open for this for 1.1.6?

We also noticed this after upgrading from 1.1.3 to 1.1.6. Every node runs a
0 row hinted handoff every 10 minutes. N-1 nodes hint to the same node,
while that node hints to another node.


On Tue, Oct 30, 2012 at 1:35 PM, Vegard Berget  wrote:

> Hi,
>
> I have the exact same problem with 1.1.6.  HintsColumnFamily consists of
> one row (Rowkey 00, nothing more).   The "problem" started after upgrading
> from 1.1.4 to 1.1.6.  Every ten minutes HintedHandoffManager starts and
> finishes  after sending "0 rows".
>
> .vegard,
>
>
>
> - Original Message -
> From:
> user@cassandra.apache.org
>
> To:
> 
> Cc:
>
> Sent:
> Mon, 29 Oct 2012 23:45:30 +0100
>
> Subject:
> Re: Hinted Handoff runs every ten minutes
>
>
> Dne 29.10.2012 23:24, Stephen Pierce napsal(a):
> > I'm running 1.1.5; the bug says it's fixed in 1.0.9/1.1.0.
> >
> > How can I check to see why it keeps running HintedHandoff?
> you have tombstone is system.HintsColumnFamily use list command in
> cassandra-cli to check
>
>


-- 

  Mike Heffner 
  Librato, Inc.


Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
keyspaces with 1 CF each.
I am talking in terms of memory footprint.
Also I would be interested to know how much better one is over other.

Thanks,
Sankalp


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
it is better to have one keyspace unless you need to replicate the
keyspaces differently. The main reason for this is that changing
keyspaces requires an RPC operation. Having 10 keyspaces would mean
having 10 connection pools.

On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli  wrote:
> Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
> keyspaces with 1 CF each.
> I am talking in terms of memory footprint.
> Also I would be interested to know how much better one is over other.
>
> Thanks,
> Sankalp


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
Which connection pool are you talking about?


On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo wrote:

> it is better to have one keyspace unless you need to replicate the
> keyspaces differently. The main reason for this is that changing
> keyspaces requires an RPC operation. Having 10 keyspaces would mean
> having 10 connection pools.
>
> On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli 
> wrote:
> > Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
> > keyspaces with 1 CF each.
> > I am talking in terms of memory footprint.
> > Also I would be interested to know how much better one is over other.
> >
> > Thanks,
> > Sankalp
>


Read during digest mismatch

2012-11-08 Thread sankalp kohli
Hi,
Lets say I am reading with consistency TWO and my replication is 3. The
read is eligible for global read repair. It will send a request to get data
from one node and a digest request to two.
If there is a digest mismatch, what I am reading from the code looks like
it will get the data from all three nodes and do a resolve of the data
before returning to the client.

Is it correct or I am readind the code wrong?

Also if this is correct, look like if the third node is in other DC, the
read will slow down even when the consistency was TWO?

Thanks,
Sankalp


Re: Strange delay in query

2012-11-08 Thread Josep Blanquer
Can it be that you have tons and tons of tombstoned columns in the middle
of these two? I've seen plenty of performance issues with wide
rows littered with column tombstones (you could check with dumping the
sstables...)

Just a thought...

Josep M.

On Thu, Nov 8, 2012 at 12:23 PM, André Cruz  wrote:

> These are the two columns in question:
>
> => (super_column=13957152-234b-11e2-92bc-e0db550199f4,
>  (column=attributes, value=, timestamp=1351681613263657)
>  (column=blocks,
> value=A4edo5MhHvojv3Ihx_JkFMsF3ypthtBvAZkoRHsjulw06pez86OHch3K3OpmISnDjHODPoCf69bKcuAZSJj-4Q,
> timestamp=1351681613263657)
>  (column=hash,
> value=8_p2QaeRaX_QwJbUWQ07ZqlNHei7ixu0MHxgu9oennfYOGfyH6EsEe_LYO8V8EC_1NPL44Gx8B7UhYV9VSb7Lg,
> timestamp=1351681613263657)
>  (column=icon, value=image_jpg, timestamp=1351681613263657)
>  (column=is_deleted, value=true, timestamp=1351681613263657)
>  (column=is_dir, value=false, timestamp=1351681613263657)
>  (column=mime_type, value=image/jpeg, timestamp=1351681613263657)
>  (column=mtime, value=1351646803, timestamp=1351681613263657)
>  (column=name, value=/Mobile Photos/Photo 2012-10-28 17_13_50.jpeg,
> timestamp=1351681613263657)
>  (column=revision, value=13957152-234b-11e2-92bc-e0db550199f4,
> timestamp=1351681613263657)
>  (column=size, value=1379001, timestamp=1351681613263657)
>  (column=thumb_exists, value=true, timestamp=1351681613263657))
> => (super_column=40b7ae4e-2449-11e2-8610-e0db550199f4,
>  (column=attributes, value={"posix": 420}, timestamp=1351790781154800)
>  (column=blocks,
> value=9UCDkHNb8-8LuKr2bv9PjKcWCT0v7FCZa0ebNSflES4-o7QD6eYschVaweCKSbR29Dq2IeGl_Cu7BVnYJYphTQ,
> timestamp=1351790781154800)
>  (column=hash,
> value=kao2EV8jw_wN4EBoMkCXZWCwg3qQ0X6m9_X9JIGkEkiGKJE_JeKgkdoTAkAefXgGtyhChuhWPlWMxl_tX7VZUw,
> timestamp=1351790781154800)
>  (column=icon, value=text_txt, timestamp=1351790781154800)
>  (column=is_dir, value=false, timestamp=1351790781154800)
>  (column=mime_type, value=text/plain, timestamp=1351790781154800)
>  (column=mtime, value=1351378576, timestamp=1351790781154800)
>  (column=name, value=/Documents/VIMDocument.txt,
> timestamp=1351790781154800)
>  (column=revision, value=40b7ae4e-2449-11e2-8610-e0db550199f4,
> timestamp=1351790781154800)
>  (column=size, value=13, timestamp=1351790781154800)
>  (column=thumb_exists, value=false, timestamp=1351790781154800))
>
>
> I don't think their size is an issue here.
>
> André
>
> On Nov 8, 2012, at 6:04 PM, Andrey Ilinykh  wrote:
>
> What is the size of columns? Probably those two are huge.
>
>
> On Thu, Nov 8, 2012 at 4:01 AM, André Cruz  wrote:
>
>> On Nov 7, 2012, at 12:15 PM, André Cruz  wrote:
>>
>> > This error also happens on my application that uses pycassa, so I don't
>> think this is the same bug.
>>
>> I have narrowed it down to a slice between two consecutive columns.
>> Observe this behaviour using pycassa:
>>
>> >>>
>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
>> column_count=2,
>> column_start=uuid.UUID('13957152-234b-11e2-92bc-e0db550199f4')).keys()
>> DEBUG 2012-11-08 11:55:51,170 pycassa_library.pool:30 6849
>> 139928791262976 Connection 52905488 (xxx:9160) was checked out from pool
>> 51715344
>> DEBUG 2012-11-08 11:55:53,415 pycassa_library.pool:37 6849
>> 139928791262976 Connection 52905488 (xxx:9160) was checked in to pool
>> 51715344
>> [UUID('13957152-234b-11e2-92bc-e0db550199f4'),
>> UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')]
>>
>> A two column slice took more than 2s to return. If I request the next 2
>> column slice:
>>
>> >>>
>> DISCO_CASS.col_fam_nsrev.get(uuid.UUID('3cd88d97-ffde-44ca-8ae9-5336caaebc4e'),
>> column_count=2,
>> column_start=uuid.UUID('40b7ae4e-2449-11e2-8610-e0db550199f4')).keys()
>> DEBUG 2012-11-08 11:57:32,750 pycassa_library.pool:30 6849
>> 139928791262976 Connection 52904912 (xxx:9160) was checked out from pool
>> 51715344
>> DEBUG 2012-11-08 11:57:32,774 pycassa_library.pool:37 6849
>> 139928791262976 Connection 52904912 (xxx:9160) was checked in to pool
>> 51715344
>> [UUID('40b7ae4e-2449-11e2-8610-e0db550199f4'),
>> UUID('a364b028-2449-11e2-8882-e0db550199f4')]
>>
>> This takes 20msec... Is there a rational explanation for this different
>> behaviour? Is there some threshold that I'm running into? Is there any way
>> to obtain more debugging information about this problem?
>>
>> Thanks,
>> André
>
>
>
>


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
Any connection pool. Imagine if you have 10 column families in 10
keyspaces. You pull a connection off the pool and the odds are 1 in 10
of it being connected to the keyspace you want. So 9 out of 10 times
you have to have a network round trip just to change the keyspace, or
you have to build a keyspace aware connection pool.
Edward

On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli  wrote:
> Which connection pool are you talking about?
>
>
> On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo 
> wrote:
>>
>> it is better to have one keyspace unless you need to replicate the
>> keyspaces differently. The main reason for this is that changing
>> keyspaces requires an RPC operation. Having 10 keyspaces would mean
>> having 10 connection pools.
>>
>> On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli 
>> wrote:
>> > Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
>> > keyspaces with 1 CF each.
>> > I am talking in terms of memory footprint.
>> > Also I would be interested to know how much better one is over other.
>> >
>> > Thanks,
>> > Sankalp
>
>


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
I am a bit confused. One connection pool I know is the one which
MessageService has to other nodes. Then there will be incoming connections
via thrift from clients. How are they affected by multiple keyspaces?


On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo wrote:

> Any connection pool. Imagine if you have 10 column families in 10
> keyspaces. You pull a connection off the pool and the odds are 1 in 10
> of it being connected to the keyspace you want. So 9 out of 10 times
> you have to have a network round trip just to change the keyspace, or
> you have to build a keyspace aware connection pool.
> Edward
>
> On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli 
> wrote:
> > Which connection pool are you talking about?
> >
> >
> > On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo 
> > wrote:
> >>
> >> it is better to have one keyspace unless you need to replicate the
> >> keyspaces differently. The main reason for this is that changing
> >> keyspaces requires an RPC operation. Having 10 keyspaces would mean
> >> having 10 connection pools.
> >>
> >> On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli 
> >> wrote:
> >> > Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
> >> > keyspaces with 1 CF each.
> >> > I am talking in terms of memory footprint.
> >> > Also I would be interested to know how much better one is over other.
> >> >
> >> > Thanks,
> >> > Sankalp
> >
> >
>


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
In the old days the API looked like this.

  client.insert("Keyspace1",
 key_user_id,
   new ColumnPath("Standard1", null, "name".getBytes("UTF-8")),
  "Chris Goffinet".getBytes("UTF-8"),
   timestamp,
   ConsistencyLevel.ONE);

but now it works like this

/pay attention to this below -/
client.set_keyspace("keyspace1");
/pay attention to this above -/
  client.insert(
 key_user_id,
 new ColumnPath("Standard1", null,
"name".getBytes("UTF-8")),
  "Chris Goffinet".getBytes("UTF-8"),
  timestamp,
  ConsistencyLevel.ONE);

So each time you switch keyspaces you make a network round trip.

On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli  wrote:
> I am a bit confused. One connection pool I know is the one which
> MessageService has to other nodes. Then there will be incoming connections
> via thrift from clients. How are they affected by multiple keyspaces?
>
>
> On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo 
> wrote:
>>
>> Any connection pool. Imagine if you have 10 column families in 10
>> keyspaces. You pull a connection off the pool and the odds are 1 in 10
>> of it being connected to the keyspace you want. So 9 out of 10 times
>> you have to have a network round trip just to change the keyspace, or
>> you have to build a keyspace aware connection pool.
>> Edward
>>
>> On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli 
>> wrote:
>> > Which connection pool are you talking about?
>> >
>> >
>> > On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo 
>> > wrote:
>> >>
>> >> it is better to have one keyspace unless you need to replicate the
>> >> keyspaces differently. The main reason for this is that changing
>> >> keyspaces requires an RPC operation. Having 10 keyspaces would mean
>> >> having 10 connection pools.
>> >>
>> >> On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli 
>> >> wrote:
>> >> > Is it better to have 10 Keyspaces with 10 CF in each keyspace. or 100
>> >> > keyspaces with 1 CF each.
>> >> > I am talking in terms of memory footprint.
>> >> > Also I would be interested to know how much better one is over other.
>> >> >
>> >> > Thanks,
>> >> > Sankalp
>> >
>> >
>
>


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread sankalp kohli
I think this code is from the thrift part. I use hector. In hector, I can
create multiple keyspace objects for each keyspace and use them when I want
to talk to that keyspace. Why will it need to do a round trip to the server
for each switch.


On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo wrote:

> In the old days the API looked like this.
>
>   client.insert("Keyspace1",
>  key_user_id,
>new ColumnPath("Standard1", null, "name".getBytes("UTF-8")),
>   "Chris Goffinet".getBytes("UTF-8"),
>timestamp,
>ConsistencyLevel.ONE);
>
> but now it works like this
>
> /pay attention to this below -/
> client.set_keyspace("keyspace1");
> /pay attention to this above -/
>   client.insert(
>  key_user_id,
>  new ColumnPath("Standard1", null,
> "name".getBytes("UTF-8")),
>   "Chris Goffinet".getBytes("UTF-8"),
>   timestamp,
>   ConsistencyLevel.ONE);
>
> So each time you switch keyspaces you make a network round trip.
>
> On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli 
> wrote:
> > I am a bit confused. One connection pool I know is the one which
> > MessageService has to other nodes. Then there will be incoming
> connections
> > via thrift from clients. How are they affected by multiple keyspaces?
> >
> >
> > On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo 
> > wrote:
> >>
> >> Any connection pool. Imagine if you have 10 column families in 10
> >> keyspaces. You pull a connection off the pool and the odds are 1 in 10
> >> of it being connected to the keyspace you want. So 9 out of 10 times
> >> you have to have a network round trip just to change the keyspace, or
> >> you have to build a keyspace aware connection pool.
> >> Edward
> >>
> >> On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli 
> >> wrote:
> >> > Which connection pool are you talking about?
> >> >
> >> >
> >> > On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo <
> edlinuxg...@gmail.com>
> >> > wrote:
> >> >>
> >> >> it is better to have one keyspace unless you need to replicate the
> >> >> keyspaces differently. The main reason for this is that changing
> >> >> keyspaces requires an RPC operation. Having 10 keyspaces would mean
> >> >> having 10 connection pools.
> >> >>
> >> >> On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli <
> kohlisank...@gmail.com>
> >> >> wrote:
> >> >> > Is it better to have 10 Keyspaces with 10 CF in each keyspace. or
> 100
> >> >> > keyspaces with 1 CF each.
> >> >> > I am talking in terms of memory footprint.
> >> >> > Also I would be interested to know how much better one is over
> other.
> >> >> >
> >> >> > Thanks,
> >> >> > Sankalp
> >> >
> >> >
> >
> >
>


Re: Loading data on-demand in Cassandra

2012-11-08 Thread sal
Pierre Chalamet  chalamet.net> writes:

> 
> Hi,You do not need to have 700 Go of data in RAM. Cassandra is able to store
on disks and query from there if data is not cached in memory. Caches are
maintained by C* by itself but you still have to some configuration.Supposing
you want to store around 800 Go and with a RF=3, you will need at least 6
servers if you want to store all data of your db (keeping max 400 Go per server)
: 800x3/400=6.There is no native implementation of trigger in C*. Anyway, there
is an extension bringing this feature:
https://github.com/hmsonline/cassandra-triggers. This should allow you to be
notified of mutations (ie: not query). Some peoples on this ML are involved in
this, maybe they could help on this.Cheers,- Pierre
> From:  Oliver Plohmann  objectscape.org>
> 
> Date: Sun, 12 Aug 2012 21:24:43 +0200
> To:  cassandra.apache.org>
> ReplyTo:  user  cassandra.apache.org
> 
> Subject: Loading data on-demand in Cassandra
> 
> Hello,
> I'm looking a bit into Cassandra to
>   see whether it would be something to go with for my company. I
>   searched through the Internet, looked through the FAQs, etc. but
>   there are still some few open questions. Hope I don't bother
>   anybody with the usual beginner questions ...
> Is there a way to do load-on-demand
>   of data in Cassandra? For the time being, we cannot afford to
>   built up a cluster that holds our 700 GB SQL-Database in RAM. So
>   we need to be able to load data on-demand from our relational
>   database. Can this be done in Cassandra? Then there also needs to
>   be a way to unload data in order to reclaim RAM space. Would be
>   nice if it were possible to register for an asynchronous
>   notification in case some value was changed. Can this be done?
> Thanks for any answers.
>   Regards, Oliver
>   

I would consider looking into distributed caching technology (ehcache, gemfire)






unsubscribe

2012-11-08 Thread Jeremy McKay





smime.p7s
Description: S/MIME cryptographic signature


Multiple Clusters Keyspacse to one core cluster

2012-11-08 Thread ws
If I have multiple clusters can I  replicate a keyspace from each of those
cluster to separate cluster?



Re: get_range_slice gets no rowcache support?

2012-11-08 Thread Manu Zhang
I did overlook something. get_range_slice will invoke cfs.getRawCachedRow
instead of cfs.getThroughCache. Hence, no row will be cached if it's not
present in the row cache. Well, this puzzles me further as to that how the
range of rows is expected to get stored into the row cache in the first
place.

Would someone please clarify it for me? Thanks in advance.


On Thu, Nov 8, 2012 at 3:23 PM, Manu Zhang  wrote:

> I've asked this question before. And after reading the source codes, I
> find that get_range_slice doesn't query rowcache before reading from
> Memtable and SSTable. I just want to make sure whether I've overlooked
> something. If my observation is correct, what's the consideration here?


Re: Multiple keyspaces vs Multiple CFs

2012-11-08 Thread Edward Capriolo
It is not as bad with hector, but still each Keyspace object is
another socket open to Cassandra. If you have 500 webservers and 10
keyspaces. Instead of having 5000 connections you now have 5000.

On Thu, Nov 8, 2012 at 6:35 PM, sankalp kohli  wrote:
> I think this code is from the thrift part. I use hector. In hector, I can
> create multiple keyspace objects for each keyspace and use them when I want
> to talk to that keyspace. Why will it need to do a round trip to the server
> for each switch.
>
>
> On Thu, Nov 8, 2012 at 3:28 PM, Edward Capriolo 
> wrote:
>>
>> In the old days the API looked like this.
>>
>>   client.insert("Keyspace1",
>>  key_user_id,
>>new ColumnPath("Standard1", null,
>> "name".getBytes("UTF-8")),
>>   "Chris Goffinet".getBytes("UTF-8"),
>>timestamp,
>>ConsistencyLevel.ONE);
>>
>> but now it works like this
>>
>> /pay attention to this below -/
>> client.set_keyspace("keyspace1");
>> /pay attention to this above -/
>>   client.insert(
>>  key_user_id,
>>  new ColumnPath("Standard1", null,
>> "name".getBytes("UTF-8")),
>>   "Chris Goffinet".getBytes("UTF-8"),
>>   timestamp,
>>   ConsistencyLevel.ONE);
>>
>> So each time you switch keyspaces you make a network round trip.
>>
>> On Thu, Nov 8, 2012 at 6:17 PM, sankalp kohli 
>> wrote:
>> > I am a bit confused. One connection pool I know is the one which
>> > MessageService has to other nodes. Then there will be incoming
>> > connections
>> > via thrift from clients. How are they affected by multiple keyspaces?
>> >
>> >
>> > On Thu, Nov 8, 2012 at 3:14 PM, Edward Capriolo 
>> > wrote:
>> >>
>> >> Any connection pool. Imagine if you have 10 column families in 10
>> >> keyspaces. You pull a connection off the pool and the odds are 1 in 10
>> >> of it being connected to the keyspace you want. So 9 out of 10 times
>> >> you have to have a network round trip just to change the keyspace, or
>> >> you have to build a keyspace aware connection pool.
>> >> Edward
>> >>
>> >> On Thu, Nov 8, 2012 at 5:36 PM, sankalp kohli 
>> >> wrote:
>> >> > Which connection pool are you talking about?
>> >> >
>> >> >
>> >> > On Thu, Nov 8, 2012 at 2:19 PM, Edward Capriolo
>> >> > 
>> >> > wrote:
>> >> >>
>> >> >> it is better to have one keyspace unless you need to replicate the
>> >> >> keyspaces differently. The main reason for this is that changing
>> >> >> keyspaces requires an RPC operation. Having 10 keyspaces would mean
>> >> >> having 10 connection pools.
>> >> >>
>> >> >> On Thu, Nov 8, 2012 at 4:59 PM, sankalp kohli
>> >> >> 
>> >> >> wrote:
>> >> >> > Is it better to have 10 Keyspaces with 10 CF in each keyspace. or
>> >> >> > 100
>> >> >> > keyspaces with 1 CF each.
>> >> >> > I am talking in terms of memory footprint.
>> >> >> > Also I would be interested to know how much better one is over
>> >> >> > other.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Sankalp
>> >> >
>> >> >
>> >
>> >
>
>


Indexing Data in Cassandra with Elastic Search

2012-11-08 Thread Brian O'Neill
For those looking to index data in Cassandra with Elastic Search, here
is what we decided to do:
http://brianoneill.blogspot.com/2012/11/big-data-quadfecta-cassandra-storm.html

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42


read request distribution

2012-11-08 Thread Wei Zhu
Hi All,
I am doing a benchmark on a Cassandra. I have a three node cluster with RF=3. I 
generated 6M rows with sequence  number from 1 to 6m, so the rows should be 
evenly distributed among the three nodes disregarding the replicates. 

I am doing a benchmark with read only requests, I generate read request for 
randomly generated keys from 1 to 6M. Oddly, nodetool cfstats, reports that one 
node has only half the requests as the other one and the third node sits in the 
middle. So the ratio is like 2:3:4. The node with the most read requests 
actually has the smallest latency and the one with the least read requests 
reports the largest latency. The difference is pretty big, the fastest is 
almost double the slowest.

All three nodes have the exactly the same hardware and the data size on each 
node are the same since the RF is three and all of them have the complete data. 
I am using Hector as client and the random read request are in millions. I 
can't think of a reasonable explanation.  Can someone please shed some lights?

Thanks.
-Wei


Re: composite column validation_class question

2012-11-08 Thread Wei Zhu
Any thoughts?

Thanks.
-Wei





 From: Wei Zhu 
To: Cassandr usergroup  
Sent: Wednesday, November 7, 2012 12:47 PM
Subject: composite column validation_class question
 

Hi All,
I am trying to design my schema using composite column. One thing I am a bit 
confused is how to define validation_class for the composite column, or is 
there a way to define it?
for the composite column, I might insert different value based on the column 
name, for example
I will insert date for column "created": 

set user[1]['7:1:100:created'] = 1351728000; 

and insert String for description

set user[1]['7:1:100:desc'] = my description; 

I don't see a way to define validation_class for composite column. Am I right?

Thanks.
-Wei