Re: Enable compression Cassandra 1.1.2

2012-11-15 Thread Alain RODRIGUEZ
We just upgrade to C* 1.1.6

We can now change anything using the CLI or cqlsh. So problem solved.


2012/11/14 Alain RODRIGUEZ 

> Oh! That's obviously the exact same issue. I didn't find this thread while
> searching about my issue.
>
> We will upgrade.
>
> Thanks for the link.
>
>
> 2012/11/14 aaron morton 
>
>> May be https://issues.apache.org/jira/browse/CASSANDRA-4561
>>
>> Can you upgrade to 1.1.6 ?
>>
>> Cheers
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 14/11/2012, at 11:39 PM, Alain RODRIGUEZ  wrote:
>>
>> Hi, I am running C* 1.1.2 and there is no way to turn the compression on
>> for a CF.
>>
>> Here is the command I ran in the CLI:
>>
>> UPDATE COLUMN FAMILY data_action WITH
>> compression_options={sstable_compression:SnappyCompressor, chunk_length_kb
>> : 64};
>>
>> Show schema :
>>
>> create column family data_action
>>   with column_type = 'Standard'
>>   and comparator = 'UTF8Type'
>>   and default_validation_class = 'UTF8Type'
>>   and key_validation_class = 'UTF8Type'
>>   and read_repair_chance = 1.0
>>   and dclocal_read_repair_chance = 0.0
>>   and gc_grace = 864000
>>   and min_compaction_threshold = 4
>>   and max_compaction_threshold = 32
>>   and replicate_on_write = true
>>   and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>>   and caching = 'KEYS_ONLY';
>>
>> I tried also through cqlsh without success ("ALTER TABLE data_action WITH
>> compression_parameters:sstable_compression = 'SnappyCompressor' AND
>> compression_parameters:chunk_length_kb = 64;")
>>
>> I have no error message just these few lines on system.log:
>>
>> INFO 10:17:12,051 Completed flushing
>> /raid0/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-20-Data.db
>> (1367 bytes) for commitlog position
>> ReplayPosition(segmentId=721932060965088, position=116556860)
>>  INFO 10:17:12,052 Compacting
>> [SSTableReader(path='/raid0/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-20-Data.db'),
>> SSTableReader(path='/raid0/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-19-Data.db'),
>> SSTableReader(path='/raid0/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-17-Data.db'),
>> SSTableReader(path='/raid0/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-18-Data.db')]
>>  INFO 10:17:12,134 Compacted to
>> [/raid0/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-hd-21-Data.db,].
>>  50,928 to 46,827 (~91% of original) bytes for 3 keys at 0.544606MB/s.
>>  Time: 82ms.
>>
>> I also tried upgrading sstables but compression is definitely not enabled
>> (same size of data + JMX show a CompressionRatio of 0.0).
>>
>> Does anyone already saw something similar ?
>>
>> Alain
>>
>>
>>
>


Re: Offsets and Range Queries

2012-11-15 Thread Edward Capriolo
There are several reasons. First there is no "absolute offset". The
rows are sorted by the data. If someone inserts new data between your
query and this query the rows have changed.

Unless you doing select queries inside a transaction with repeatable
read and your database supports this the query you mention does not
really have "absolute offsets " either. The results of the query can
change between reads.

In cassandra we do not execute large queries (that might results to
temp tables or whatever) and allow you to page them. Slices have a
fixed size, this ensures that the the "query" does not execute for
arbitrary lengths of time.


On Thu, Nov 15, 2012 at 6:39 AM, Ravikumar Govindarajan
 wrote:
> Usually we do a SELECT * FROM  ORDER BY  LIMIT 26,25 for pagination
> purpose, but specifying offset is not available for range queries in
> cassandra.
>
> I always have to specify a start-key to achieve this. Are there reasons for
> choosing such an approach rather than providing an absolute offset?
>
> --
> Ravi


Re: unable to read saved rowcache from disk

2012-11-15 Thread Edward Capriolo
If the startup is taking a long time or not working and you believe it
to be corrupt in some way it is safe to delete the saved cache files.
If you think the process is taking longer then it should you could try
attaching a debugger to the process.

I try to avoid the row cache these days, even with cache auto tuning
(which I am not using) 1 really wide row can cause issues. I like
letting the os disk cache do it's thing.



On Thu, Nov 15, 2012 at 2:20 AM, Wz1975  wrote:
> Before shut down,  you saw rowcache has 500m, 1.6m rows,  each row average
> 300B, so 700k row should be a little over 200m, unless it is reading more,
> maybe tombstone?  Or the rows on disk  have grown for some reason,  but row
> cache was not updated?  Could be something else eats up the memory.  You may
> profile memory and see who consumes the memory.
>
>
> Thanks.
> -Wei
>
> Sent from my Samsung smartphone on AT&T
>
>
>  Original message 
> Subject: Re: unable to read saved rowcache from disk
> From: Manu Zhang 
> To: user@cassandra.apache.org
> CC:
>
>
> 3G, other jvm parameters are unchanged.
>
>
> On Thu, Nov 15, 2012 at 2:40 PM, Wz1975  wrote:
>>
>> How big is your heap?  Did you change the jvm parameter?
>>
>>
>>
>> Thanks.
>> -Wei
>>
>> Sent from my Samsung smartphone on AT&T
>>
>>
>>  Original message 
>> Subject: Re: unable to read saved rowcache from disk
>> From: Manu Zhang 
>> To: user@cassandra.apache.org
>> CC:
>>
>>
>> add a counter and print out myself
>>
>>
>> On Thu, Nov 15, 2012 at 1:51 PM, Wz1975  wrote:
>>>
>>> Curious where did you see this?
>>>
>>>
>>> Thanks.
>>> -Wei
>>>
>>> Sent from my Samsung smartphone on AT&T
>>>
>>>
>>>  Original message 
>>> Subject: Re: unable to read saved rowcache from disk
>>> From: Manu Zhang 
>>> To: user@cassandra.apache.org
>>> CC:
>>>
>>>
>>> OOM at deserializing 747321th row
>>>
>>>
>>> On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang 
>>> wrote:

 oh, as for the number of rows, it's 165. How long would you expect
 it to be read back?


 On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu  wrote:
>
> Good information Edward.
> For my case, we have good size of RAM (76G) and the heap is 8G. So I
> set the row cache to be 800M as recommended. Our column is kind of big, so
> the hit ratio for row cache is around 20%, so according to datastax, might
> just turn the row cache altogether.
> Anyway, for restart, it took about 2 minutes to load the row cache
>
>  INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108)
> reading saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
>  INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451)
> completed loading (102801 ms; 21125 keys) row cache for XXX.f2
>
> Just for comparison, our key is long, the disk usage for row cache is
> 253K. (it only stores key when row cache is saved to disk, so 253KB/ 
> 8bytes
> = 31625 number of keys). It's about right...
> So for 15MB, there could be a lot of "narrow" rows. (if the key is
> Long, could be more than 1M rows)
>
> Thanks.
> -Wei
> 
> From: Edward Capriolo 
> To: user@cassandra.apache.org
> Sent: Tuesday, November 13, 2012 11:13 PM
> Subject: Re: unable to read saved rowcache from disk
>
> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>
> A negative side-effect of a large row-cache is start-up time. The
> periodic saving of the row cache information only saves the keys that
> are cached; the data has to be pre-fetched on start-up. On a large
> data set, this is probably going to be seek-bound and the time it
> takes to warm up the row cache will be linear with respect to the row
> cache size (assuming sufficiently large amounts of data that the seek
> bound I/O is not subject to optimization by disks)
>
> Assuming a row cache 15MB and the average row is 300 bytes, that could
> be 50,000 entries. 4 hours seems like a long time to read back 50K
> entries. Unless the source table was very large and you can only do a
> small number / reads/sec.
>
> On Tue, Nov 13, 2012 at 9:47 PM, Manu Zhang 
> wrote:
> > "incorrect"... what do you mean? I think it's only 15MB, which is not
> > big.
> >
> >
> > On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo
> > 
> > wrote:
> >>
> >> Yes the row cache "could be" incorrect so on startup cassandra
> >> verify they
> >> saved row cache by re reading. It takes a long time so do not save a
> >> big row
> >> cache.
> >>
> >>
> >> On Tuesday, November 13, 2012, Manu Zhang 
> >> wrote:
> >> > I have a rowcache provieded by SerializingCacheProvider.
> >> > The data that has been read into it is about 500MB, as claimed by
> >> > jconsole. After saving cache, it is around 15MB o

Re: unable to read saved rowcache from disk

2012-11-15 Thread aaron morton
For a row cache of 1,650,000:

16 byte token
300 byte row key ? 
and row data ? 
multiply by a java fudge factor or 5 or 10. 

Trying delete the saved cache and restarting.

Cheers
 


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/11/2012, at 8:20 PM, Wz1975  wrote:

> Before shut down,  you saw rowcache has 500m, 1.6m rows,  each row average 
> 300B, so 700k row should be a little over 200m, unless it is reading more,  
> maybe tombstone?  Or the rows on disk  have grown for some reason,  but row 
> cache was not updated?  Could be something else eats up the memory.  You may 
> profile memory and see who consumes the memory. 
> 
> 
> Thanks.
> -Wei
> 
> Sent from my Samsung smartphone on AT&T 
> 
> 
>  Original message 
> Subject: Re: unable to read saved rowcache from disk 
> From: Manu Zhang  
> To: user@cassandra.apache.org 
> CC: 
> 
> 
> 3G, other jvm parameters are unchanged. 
> 
> 
> On Thu, Nov 15, 2012 at 2:40 PM, Wz1975  wrote:
> How big is your heap?  Did you change the jvm parameter? 
> 
> 
> 
> Thanks.
> -Wei
> 
> Sent from my Samsung smartphone on AT&T 
> 
> 
>  Original message 
> Subject: Re: unable to read saved rowcache from disk 
> From: Manu Zhang  
> To: user@cassandra.apache.org 
> CC: 
> 
> 
> add a counter and print out myself
> 
> 
> On Thu, Nov 15, 2012 at 1:51 PM, Wz1975  wrote:
> Curious where did you see this? 
> 
> 
> Thanks.
> -Wei
> 
> Sent from my Samsung smartphone on AT&T 
> 
> 
>  Original message 
> Subject: Re: unable to read saved rowcache from disk 
> From: Manu Zhang  
> To: user@cassandra.apache.org 
> CC: 
> 
> 
> OOM at deserializing 747321th row
> 
> 
> On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang  wrote:
> oh, as for the number of rows, it's 165. How long would you expect it to 
> be read back?
> 
> 
> On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu  wrote:
> Good information Edward. 
> For my case, we have good size of RAM (76G) and the heap is 8G. So I set the 
> row cache to be 800M as recommended. Our column is kind of big, so the hit 
> ratio for row cache is around 20%, so according to datastax, might just turn 
> the row cache altogether. 
> Anyway, for restart, it took about 2 minutes to load the row cache
> 
>  INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108) reading 
> saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
>  INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451) 
> completed loading (102801 ms; 21125 keys) row cache for XXX.f2 
> 
> Just for comparison, our key is long, the disk usage for row cache is 253K. 
> (it only stores key when row cache is saved to disk, so 253KB/ 8bytes = 31625 
> number of keys). It's about right...
> So for 15MB, there could be a lot of "narrow" rows. (if the key is Long, 
> could be more than 1M rows)
>   
> Thanks.
> -Wei
> From: Edward Capriolo 
> To: user@cassandra.apache.org 
> Sent: Tuesday, November 13, 2012 11:13 PM
> Subject: Re: unable to read saved rowcache from disk
> 
> http://wiki.apache.org/cassandra/LargeDataSetConsiderations
> 
> A negative side-effect of a large row-cache is start-up time. The
> periodic saving of the row cache information only saves the keys that
> are cached; the data has to be pre-fetched on start-up. On a large
> data set, this is probably going to be seek-bound and the time it
> takes to warm up the row cache will be linear with respect to the row
> cache size (assuming sufficiently large amounts of data that the seek
> bound I/O is not subject to optimization by disks)
> 
> Assuming a row cache 15MB and the average row is 300 bytes, that could
> be 50,000 entries. 4 hours seems like a long time to read back 50K
> entries. Unless the source table was very large and you can only do a
> small number / reads/sec.
> 
> On Tue, Nov 13, 2012 at 9:47 PM, Manu Zhang  wrote:
> > "incorrect"... what do you mean? I think it's only 15MB, which is not big.
> >
> >
> > On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo 
> > wrote:
> >>
> >> Yes the row cache "could be" incorrect so on startup cassandra verify they
> >> saved row cache by re reading. It takes a long time so do not save a big 
> >> row
> >> cache.
> >>
> >>
> >> On Tuesday, November 13, 2012, Manu Zhang  wrote:
> >> > I have a rowcache provieded by SerializingCacheProvider.
> >> > The data that has been read into it is about 500MB, as claimed by
> >> > jconsole. After saving cache, it is around 15MB on disk. Hence, I suppose
> >> > the size from jconsole is before serializing.
> >> > Now while restarting Cassandra, it's unable to read saved rowcache back.
> >> > By "unable", I mean around 4 hours and I have to abort it and remove 
> >> > cache
> >> > so as not to suspend other tasks.
> >> > Since the data aren't huge, why Cassandra can't read it back?
> >> > My Cassandra is 1.2.0-beta2.
> >
> >
> 
> 
> 
> 
> 
> 9



Re: Upgrade 1.1.2 -> 1.1.6

2012-11-15 Thread aaron morton
Can you provide an example of the increase ? 

Can you provide the log from startup ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/11/2012, at 3:21 AM, Alain RODRIGUEZ  wrote:

> We had an issue with counters over-counting even using the nodetool drain 
> command before upgrading...
> 
> Here is my bash history
> 
>69  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>70  cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak
>71  sudo apt-get install cassandra
>72  nodetool disablethrift
>73  nodetool drain
>74  service cassandra stop
>75  cat /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak
>76  vim /etc/cassandra/cassandra-env.sh
>77  cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>78  vim /etc/cassandra/cassandra.yaml
>79  service cassandra start
> 
> So I think I followed these steps 
> http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps
> 
> I merged my conf files with an external tool so consider I merged my conf 
> files on steps 76 and 78.
> 
> I saw that the "sudo apt-get install cassandra" stop the server and restart 
> it automatically. So it updated without draining and restart before I had the 
> time to reconfigure the conf files. Is this "normal" ? Is there a way to 
> avoid it ?
> 
> So for the second node I decided to try to stop C*before the upgrade.
> 
>   125  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>   126  cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak
>   127  nodetool disablegossip
>   128  nodetool disablethrift
>   129  nodetool drain
>   130  service cassandra stop
>   131  sudo apt-get install cassandra
> 
> //131 : This restarted cassandra
> 
>   132  nodetool disablethrift
>   133  nodetool disablegossip
>   134  nodetool drain
>   135  service cassandra stop
>   136  cat /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh.bak
>   137  cim /etc/cassandra/cassandra-env.sh
>   138  vim /etc/cassandra/cassandra-env.sh
>   139  cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>   140  vim /etc/cassandra/cassandra.yaml
>   141  service cassandra start
> 
> After both of these updates I saw my current counters increase without any 
> reason.
> 
> Did I do anything wrong ?
> 
> Alain
> 



Re: unable to read saved rowcache from disk

2012-11-15 Thread Wei Zhu
Just curious why do you think row key will take 300 byte? If the row key is 
Long type, doesn't it take 8 bytes?
In his case, the rowCache was 500M with 1.6M rows, so the row data is 300B. Did 
I miss something?

Thanks.
-Wei



 From: aaron morton 
To: user@cassandra.apache.org 
Sent: Thursday, November 15, 2012 12:15 PM
Subject: Re: unable to read saved rowcache from disk
 

For a row cache of 1,650,000:

16 byte token
300 byte row key ? 
and row data ? 
multiply by a java fudge factor or 5 or 10. 

Trying delete the saved cache and restarting.

Cheers
 



-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/11/2012, at 8:20 PM, Wz1975  wrote:

Before shut down,  you saw rowcache has 500m, 1.6m rows,  each row average 
300B, so 700k row should be a little over 200m, unless it is reading more,  
maybe tombstone?  Or the rows on disk  have grown for some reason,  but row 
cache was not updated?  Could be something else eats up the memory.  You may 
profile memory and see who consumes the memory. 
>
>
>Thanks.
>-Wei
>
>Sent from my Samsung smartphone on AT&T 
>
>
> Original message 
>Subject: Re: unable to read saved rowcache from disk 
>From: Manu Zhang  
>To: user@cassandra.apache.org 
>CC: 
>
>
>3G, other jvm parameters are unchanged. 
>
>
>
>On Thu, Nov 15, 2012 at 2:40 PM, Wz1975  wrote:
>
>How big is your heap?  Did you change the jvm parameter? 
>>
>>
>>
>>Thanks.
>>-Wei
>>
>>Sent from my Samsung smartphone on AT&T 
>>
>>
>> Original message 
>>Subject: Re: unable to read saved rowcache from disk 
>>From: Manu Zhang  
>>To: user@cassandra.apache.org 
>>CC: 
>>
>>
>>add a counter and print out myself
>>
>>
>>
>>On Thu, Nov 15, 2012 at 1:51 PM, Wz1975  wrote:
>>
>>Curious where did you see this? 
>>>
>>>
>>>Thanks.
>>>-Wei
>>>
>>>Sent from my Samsung smartphone on AT&T 
>>>
>>>
>>>
>>> Original message 
>>>Subject: Re: unable to read saved rowcache from disk 
>>>
>>>From: Manu Zhang  
>>>To: user@cassandra.apache.org 
>>>CC: 
>>>
>>>
>>>OOM at deserializing 747321th row
>>>
>>>
>>>
>>>On Thu, Nov 15, 2012 at 9:08 AM, Manu Zhang  wrote:
>>>
>>>oh, as for the number of rows, it's 165. How long would you expect it to 
>>>be read back?



On Thu, Nov 15, 2012 at 3:57 AM, Wei Zhu  wrote:

Good information Edward. 
>For my case, we have good size of RAM (76G) and the heap is 8G. So I set 
>the row cache to be 800M as recommended. Our column is kind of big, so the 
>hit ratio for row cache is around 20%, so according to datastax, might 
>just turn the row cache altogether. 
>Anyway, for restart, it took about 2 minutes to load the row cache
>
>
> INFO [main] 2012-11-14 11:43:29,810 AutoSavingCache.java (line 108) 
>reading saved cache /var/lib/cassandra/saved_caches/XXX-f2-RowCache
> INFO [main] 2012-11-14 11:45:12,612 ColumnFamilyStore.java (line 451) 
>completed loading (102801 ms; 21125keys) row cache for XXX.f2 
>
>
>Just for comparison, our key is long, the disk usage for row cache is 
>253K. (it only stores key when row cache is saved to disk, so 253KB/ 
>8bytes = 31625number of keys). It's about right...
>So for 15MB, there could be a lot of "narrow" rows. (if the key is Long, 
>could be more than 1M rows)
>  
>Thanks.
>-Wei
>
>
> From: Edward Capriolo 
>To: user@cassandra.apache.org 
>Sent: Tuesday, November 13, 2012 11:13 PM
>Subject: Re: unable to read saved rowcache from disk
> 
>
>http://wiki.apache.org/cassandra/LargeDataSetConsiderations
>
>A negative side-effect of a large row-cache is start-up time. The
>periodic saving of the row cache information only saves the keys that
>are cached; the data has to be pre-fetched on start-up. On a large
>data set, this is probably going to be seek-bound and the time it
>takes to warm up the row cache will be linear with respect to the row
>cache size (assuming sufficiently large amounts of data that the seek
>bound I/O is not subject to optimization by disks)
>
>Assuming a row cache 15MB and the average row is 300 bytes, that could
>be 50,000 entries. 4 hours seems like a long time to read back 50K
>entries. Unless the source table was very large and you can only do a
>small number / reads/sec.
>
>On Tue, Nov 13, 2012 at 9:47 PM, Manu Zhang  
>wrote:
>> "incorrect"... what do you mean? I think it's only 15MB, which is not 
>> big.
>>
>>
>> On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo 
>> wrote:
>>>
>>> Yes the row cache "could be" incorrect so on startup cassandra verify 
>>> they
>>> saved row cache by re reading. It takes a long time so do not save a 
>>> big row
>>> cache.
>>>
>>>
>>> On Tuesday, Novemb

Re: Looking for a good Ruby client

2012-11-15 Thread Harry Wilkinson
Update on this: someone just pointed me towards the Cequel gem:
https://github.com/brewster/cequel

The way it's described in the readme it looks like exactly what I was
looking for - a modern, CQL-based gem that is in active development and
also follows the ActiveModel pattern.  I'd be very interested to hear if
anybody has used this, whether it's stable/reliable, etc.

Thanks.

Harry

On 2 August 2012 00:31, Thorsten von Eicken  wrote:

>  Harry, we're in a similar situation and are starting to work out our own
> ruby client. The biggest issue is that it doesn't make much sense to build
> a higher level abstraction on anything other than CQL3, given where things
> are headed. At least this is our opinion.
> At the same time, CQL3 is just barely becoming usable and still seems
> rather deficient in wide-row usage. The tricky part is that with the
> current CQL3 you have to construct quite complex iterators to retrieve a
> large result set. Which means that you end up having to either parse CQL3
> coming in to insert the iteration stuff, or you have to pass CQL3 fragments
> in and compose them together with iterator clauses. Not fun stuff either
> way.
> The only good solution I see is to switch to a streaming protocol (or
> build some form of "continue" on top of thrift) such that the client can
> ask for a huge result set and the cassandra coordinator can break it into
> sub-queries as it sees fit and return results chunk-by-chunk. If this is
> really the path forward then all abstractions built above CQL3 before that
> will either have a good piece of complex code that can be deleted or worse,
> will have an interface that is no longer best practice.
> Good luck!
> Thorsten
>
>
>
> On 8/1/2012 1:47 PM, Harry Wilkinson wrote:
>
> Hi,
>
>  I'm looking for a Ruby client for Cassandra that is pretty high-level.
>  I am really hoping to find a Ruby gem of high quality that allows a
> developer to create models like you would with ActiveModel.
>
>  So far I have figured out that the canonical Ruby client for Cassandra
> is Twitter's Cassandra gem  of the
> same name.  It looks great - mature, still in active development, etc.  No
> stated support for Ruby 1.9.3 that I can see, but I can probably live with
> that for now.
>
>  What I'm looking for is a higher-level gem built on that gem that works
> like ActiveModel in that you just include a module in your model class and
> that gives you methods to declare your model's serialized attributes and
> also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc.
>
>  I've been trying out some different NoSQL databases recently, and for
> example there is an official Ruby 
> clientfor Riak with a domain model 
> that is close to Riak's, but then there's also
> a gem called 'Ripple'  that uses a
> domain model that is closer to what most Ruby developers are used to.  So
> it looks like Twitter's Cassandra gem is the one that stays close to the
> domain model of Cassandra, and what I'm looking for is a gem that's a
> Cassandra equivalent of RIpple.
>
>  From some searching I found 
> cassandra_object,
> which has been inactive for a couple of years, but there's a 
> forkthat looks like it's being 
> maintained, but I have not found any kind of
> information to suggest the maintained fork is in general use yet.  I have
> found quite a lot of gems of a similar style that people have started and
> then not really got very far with.
>
>  So, does anybody know of a suitable gem?  Would you recommend it?  Or
> perhaps you would recommend not using such a gem and sticking with the
> lower-level client gem?
>
>  Thanks in advance for your advice.
>
>  Harry
>
>
>


Re: Upgrade 1.1.2 -> 1.1.6

2012-11-15 Thread Alain RODRIGUEZ
Here is an example of the increase for some counter (counting events per
hour)

time (UTC) 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Good value   88 44 26 35 26 86 187 251 455 389 473 367 453 373
C* counter   149 82 45 68 38 146 329 414 746 566 473 377 453 373

I finished my Cassandra 1.1.6 upgrades at 9:30 UTC.

I found wrong values since the day before at 20:00 UTC (counters from hours
before are good)

Here are the logs from the output:
Server 1: http://pastebin.com/WyCm6Ef5 (This one is from the same server as
the first bash history on my first mail)
Server 2: http://pastebin.com/gBe2KL2b  (This one is from the same server
as the second bash history on my first mail)

Alain

2012/11/15 aaron morton 

> Can you provide an example of the increase ?
>
> Can you provide the log from startup ?
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/11/2012, at 3:21 AM, Alain RODRIGUEZ  wrote:
>
> We had an issue with counters over-counting even using the nodetool drain
> command before upgrading...
>
> Here is my bash history
>
>69  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>70  cp /etc/cassandra/cassandra-env.sh
> /etc/cassandra/cassandra-env.sh.bak
>71  sudo apt-get install cassandra
>72  nodetool disablethrift
>73  nodetool drain
>74  service cassandra stop
>75  cat /etc/cassandra/cassandra-env.sh
> /etc/cassandra/cassandra-env.sh.bak
>76  vim /etc/cassandra/cassandra-env.sh
>77  cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>78  vim /etc/cassandra/cassandra.yaml
>79  service cassandra start
>
> So I think I followed these steps
> http://www.datastax.com/docs/1.1/install/upgrading#upgrade-steps
>
> I merged my conf files with an external tool so consider I merged my conf
> files on steps 76 and 78.
>
> I saw that the "sudo apt-get install cassandra" stop the server and
> restart it automatically. So it updated without draining and restart before
> I had the time to reconfigure the conf files. Is this "normal" ? Is there a
> way to avoid it ?
>
> So for the second node I decided to try to stop C*before the upgrade.
>
>   125  cp /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>   126  cp /etc/cassandra/cassandra-env.sh
> /etc/cassandra/cassandra-env.sh.bak
>   127  nodetool disablegossip
>   128  nodetool disablethrift
>   129  nodetool drain
>   130  service cassandra stop
>   131  sudo apt-get install cassandra
>
> //131 : This restarted cassandra
>
>   132  nodetool disablethrift
>   133  nodetool disablegossip
>   134  nodetool drain
>   135  service cassandra stop
>   136  cat /etc/cassandra/cassandra-env.sh
> /etc/cassandra/cassandra-env.sh.bak
>   137  cim /etc/cassandra/cassandra-env.sh
>   138  vim /etc/cassandra/cassandra-env.sh
>   139  cat /etc/cassandra/cassandra.yaml /etc/cassandra/cassandra.yaml.bak
>   140  vim /etc/cassandra/cassandra.yaml
>   141  service cassandra start
>
> After both of these updates I saw my current counters increase without any
> reason.
>
> Did I do anything wrong ?
>
> Alain
>
>
>


Datatype Conversion in CQL-Client?

2012-11-15 Thread Timmy Turner
Does the official/built-in Cassandra CQL client (in 1.2) offer any built-in
option to get direct values/objects when reading a field, instead of just a
byte array?


Admin for cassandra?

2012-11-15 Thread Kevin Burton
Is there an IDE for a Cassandra database? Similar to the SQL Server
Management Studio for SQL server. I mainly want to execute queries and see
the results. Preferably that runs under a Windows OS.

 

Thank you.

 



cassandra-sharp 2.0-ALPHA available

2012-11-15 Thread Pierre Chalamet
Hi all,

 

I'm happy to announce a new release of cassandra-sharp (version 2.0-alpha),
a .NET client for Cassandra - all bits are available at
http://code.google.com/p/cassandra-sharp/.

 

This version is quite special because only the CQL Binary Protocol interface
is supported ! Yes, Thrift is dead for real !

Obviously, Cassandra 1.2 is the only supported version at the moment.

 

All commands are supported but REGISTER/EVENT/Compression - this will come
in a future release. Some features from 1.0 are not supported (retry,
topology discovery) but they will come back for sure later.

This version also support only basic POCO mapping - but since this is
extensible, you can bring your own if you want (contribution welcomed btw). 

Documentation is totally out of sync, all my apologies - samples in the
projects should explain how the new interface works.

 

I'm confident there are some bugs hanging around, so feel free to fill
issues/improvements at http://code.google.com/p/cassandra-sharp/issues/list.

 

Hope you will enjoy it !

- Pierre Chalamet

 



Question regarding the need to run nodetool repair

2012-11-15 Thread Dwight Smith
I have a 4 node cluster,  version 1.1.2, replication factor of 4,
read/write consistency of 3, level compaction. Several questions.

 

1)  Should nodetool repair be run regularly to assure it has
completed before gc_grace?  If it is not run, what are the exposures?

2)  If a node goes down, and is brought back up prior to the 1 hour
hinted handoff expiration, should repair be run immediately?

3)  If the hinted handoff has expired, the plan is to remove the
node and start a fresh node in its place.  Does this approach cause
problems?

 

Thanks

 



Re: Question regarding the need to run nodetool repair

2012-11-15 Thread Edward Capriolo
On Thursday, November 15, 2012, Dwight Smith 
wrote:
> I have a 4 node cluster,  version 1.1.2, replication factor of 4,
read/write consistency of 3, level compaction. Several questions.
>
>
>
> 1)  Should nodetool repair be run regularly to assure it has
completed before gc_grace?  If it is not run, what are the exposures?

Yes. Lost tombstones could cause deleted data to re appear.
>
> 2)  If a node goes down, and is brought back up prior to the 1 hour
hinted handoff expiration, should repair be run immediately?

If node is brought up prior to 1 hour. You should let the hints replay.
 Repair is always safe to run.
>
> 3)  If the hinted handoff has expired, the plan is to remove the node
and start a fresh node in its place.  Does this approach cause problems?
>
You only need to join a fresh mode if the node was down longer then gc
grace. Default is 10 days.
>
>
> Thanks
>
>

If you read and write at quorum and run repair regularly you can worry less
about the things above because they are essentially non factors.


RE: Question regarding the need to run nodetool repair

2012-11-15 Thread Dwight Smith
Thanks

 

From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Thursday, November 15, 2012 4:30 PM
To: user@cassandra.apache.org
Subject: Re: Question regarding the need to run nodetool repair

 



On Thursday, November 15, 2012, Dwight Smith
 wrote:
> I have a 4 node cluster,  version 1.1.2, replication factor of 4,
read/write consistency of 3, level compaction. Several questions.
>
>  
>
> 1)  Should nodetool repair be run regularly to assure it has
completed before gc_grace?  If it is not run, what are the exposures?

Yes. Lost tombstones could cause deleted data to re appear.
>
> 2)  If a node goes down, and is brought back up prior to the 1
hour hinted handoff expiration, should repair be run immediately?

If node is brought up prior to 1 hour. You should let the hints replay.
Repair is always safe to run.
>
> 3)  If the hinted handoff has expired, the plan is to remove the
node and start a fresh node in its place.  Does this approach cause
problems?
>
You only need to join a fresh mode if the node was down longer then gc
grace. Default is 10 days.
>  
>
> Thanks
>
>

If you read and write at quorum and run repair regularly you can worry
less about the things above because they are essentially non factors. 



Re: Question regarding the need to run nodetool repair

2012-11-15 Thread Rob Coli
On Thu, Nov 15, 2012 at 4:12 PM, Dwight Smith
 wrote:
> I have a 4 node cluster,  version 1.1.2, replication factor of 4, read/write
> consistency of 3, level compaction. Several questions.

Hinted Handoff is broken in your version [1] (and all versions between
1.0.0 and 1.0.3 [2]). Upgrade to 1.1.6 ASAP so that the answers below
actually apply, because working Hinted Handoff is involved.

> 1)  Should nodetool repair be run regularly to assure it has completed
> before gc_grace?  If it is not run, what are the exposures?

If you do DELETE logical operations, yes. If not, no. gc_grace_seconds
only applies to tombstones, and if you do not delete you have no
tombstones. If you only DELETE in one columnfamily, that is the only
one you have to repair within gc_grace.

Exposure is zombie data, where a node missed a DELETE (and associated
tombstone) but had a previous value for that column or row and this
zombie value is resurrected and propagated by read repair.

> 2)  If a node goes down, and is brought back up prior to the 1 hour
> hinted handoff expiration, should repair be run immediately?

In theory, if hinted handoff is working, no. This is a good thing
because otherwise simply restarting a node would trigger the need for
repair. In practice I would be shocked if anyone has scientifically
tested it to the degree required to be certain all edge cases are
covered, so I'm not sure I would rely on this being true. Especially
as key components of this guarantee such as Hinted Handoff can be
broken for 3-5 point releases before anyone notices.

It is because of this uncertainty that I recommend periodic repair
even in clusters that don't do DELETE.

> 3)  If the hinted handoff has expired, the plan is to remove the node
> and start a fresh node in its place.  Does this approach cause problems?

Yes.

1) You've lost any data that was only ever replicated to this node.
With RF>=3, this should be relatively rare, even with CL.ONE, because
writes are much more likely to succeed-but-report-they-failed than
vice versa. If you run periodic repair, you cover the case where
something gets under-replicated and then even less replicated as nodes
are replaced.
2) When you replace the node in its place (presumably using
replace_token) you will only stream the relevant data from a single
other replica. This means that, given 3 nodes A B C where datum X is
on A and B, and B fails, it might be bootstrapped using C as a source,
decreasing your replica count of X by 1.

In order to deal with these issues, you need to run a repair of the
affected node after bootstrapping/replace_tokening. Until this repair
completes, CL.ONE reads might be stale or missing. I think what
operators really want is a path by which they can bootstrap and then
repair, before returning the node to the cluster. Unfortunately there
are significant technical reasons which prevent this from being
trivial.

As such, I suggest increasing gc_grace_seconds and
max_hint_window_in_ms to reduce the amount of repair you need to run.
The negative to increasing gc_grace is that you store tombstones for
longer before purging them. The negative to increasing
max_hint_window_in_ms is that hints for a given token are stored in
one row.. and very wide rows can exhibit pathological behavior.

Also if you set max_hint_window_in_ms too high, you could cause
cascading failure as nodes fill with hints, become less performant...
thereby increasing the cluster-wide hint rate. Unless you have a very
high write rate or really lazy ops people who leave nodes down for
very long times, the cascading failure case is relatively unlikely.

=Rob

[1] https://issues.apache.org/jira/browse/CASSANDRA-4772
[2] https://issues.apache.org/jira/browse/CASSANDRA-3466


-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


RE: Admin for cassandra?

2012-11-15 Thread Wz1975
Cqlsh is probably the closest you will get. Or pay big bucks to hire someone to 
develop one for you:)


Thanks.
-Wei

Sent from my Samsung smartphone on AT&T

 Original message 
Subject: Admin for cassandra? 
From: Kevin Burton  
To: user@cassandra.apache.org 
CC:  

Is there an IDE for a Cassandra database? Similar to the SQL Server Management 
Studio for SQL server. I mainly want to execute queries and see the results. 
Preferably that runs under a Windows OS.

 

Thank you.

 

Re: Admin for cassandra?

2012-11-15 Thread Edward Capriolo
We should build an eclipse plugin named Eclipsandra or something.

On Thu, Nov 15, 2012 at 9:45 PM, Wz1975  wrote:
> Cqlsh is probably the closest you will get. Or pay big bucks to hire someone
> to develop one for you:)
>
>
> Thanks.
> -Wei
>
> Sent from my Samsung smartphone on AT&T
>
>
>  Original message 
> Subject: Admin for cassandra?
> From: Kevin Burton 
> To: user@cassandra.apache.org
> CC:
>
>
> Is there an IDE for a Cassandra database? Similar to the SQL Server
> Management Studio for SQL server. I mainly want to execute queries and see
> the results. Preferably that runs under a Windows OS.
>
>
>
> Thank you.
>
>


Re: Offsets and Range Queries

2012-11-15 Thread Ravikumar Govindarajan
Thanks Ed, for the clarifications

Yes you are correct that the apps have to handle repeatable reads and not
the databases themselves when using absolute offsets, but SQL databases do
provide such an option at app's peril!!!

"Slices have a fixed size, this ensures that the the "query" does not
execute for arbitrary lengths of time."

I assume it's because of iterators in read-time, which go over results do
merging/reducing/collating results one-by-one that is not so well suited
for jumping to arbitrary offsets, given the practically huge number of
columns involved, right? Did I understand it correctly?

We are now faced with persisting the page with both first & last-key for
prev/next navigation. The problem gets quickly complex, when there we have
to support multiple pages per user. I just wanted to know, if there any
known work-arounds for this.

--
Ravi

On Thu, Nov 15, 2012 at 9:03 PM, Edward Capriolo wrote:

> There are several reasons. First there is no "absolute offset". The
> rows are sorted by the data. If someone inserts new data between your
> query and this query the rows have changed.
>
> Unless you doing select queries inside a transaction with repeatable
> read and your database supports this the query you mention does not
> really have "absolute offsets " either. The results of the query can
> change between reads.
>
> In cassandra we do not execute large queries (that might results to
> temp tables or whatever) and allow you to page them. Slices have a
> fixed size, this ensures that the the "query" does not execute for
> arbitrary lengths of time.
>
>
> On Thu, Nov 15, 2012 at 6:39 AM, Ravikumar Govindarajan
>  wrote:
> > Usually we do a SELECT * FROM  ORDER BY  LIMIT 26,25 for
> pagination
> > purpose, but specifying offset is not available for range queries in
> > cassandra.
> >
> > I always have to specify a start-key to achieve this. Are there reasons
> for
> > choosing such an approach rather than providing an absolute offset?
> >
> > --
> > Ravi
>


Re: Admin for cassandra?

2012-11-15 Thread Timmy Turner
I think an eclipse plugin would be the wrong way to go here. Most people
probably just want to browse through the columnfamilies and see whether
their queries work out or not. This functionality is imho best implemented
as some form of a light-weight editor, not a full blown IDE.

I do have something of this kind scheduled as small part of a larger
project (seeing as how there is currently no properly working tool that
provides this functionality), but concrete results are probably still a few
months out..


2012/11/16 Edward Capriolo 

> We should build an eclipse plugin named Eclipsandra or something.
>
> On Thu, Nov 15, 2012 at 9:45 PM, Wz1975  wrote:
> > Cqlsh is probably the closest you will get. Or pay big bucks to hire
> someone
> > to develop one for you:)
> >
> >
> > Thanks.
> > -Wei
> >
> > Sent from my Samsung smartphone on AT&T
> >
> >
> >  Original message 
> > Subject: Admin for cassandra?
> > From: Kevin Burton 
> > To: user@cassandra.apache.org
> > CC:
> >
> >
> > Is there an IDE for a Cassandra database? Similar to the SQL Server
> > Management Studio for SQL server. I mainly want to execute queries and
> see
> > the results. Preferably that runs under a Windows OS.
> >
> >
> >
> > Thank you.
> >
> >
>