Recommended way of data migration

2013-09-07 Thread Renat Gilfanov
 Hello,

Let's say we have a simple CQL3 table 

CREATE TABLE example (
    id UUID PRIMARY KEY,
    timestamp TIMESTAMP,
    data ASCII
);

And I need to mutate  (for example encrypt) column values in the "data" column 
for all rows.

What's the recommended approach to perform such migration programatically? 

For me the general approach is:

1. Create another column family
2. extract a batch of records
3. for each extracted record, perform mutation, insert it in the new cf and 
delete from old one
4. repeat until source cf not empty

Is it correct approach and if yes, how to implement some kind of paging for the 
step 2?


Re: Recommended way of data migration

2013-09-07 Thread Edward Capriolo
I would do something like you are suggesting. I would not do the delete
until all the rows are moved. Since writes in cassandra are idempotent you
can even run the migration process multiple times without harm.


On Sat, Sep 7, 2013 at 5:31 PM, Renat Gilfanov  wrote:

> Hello,
>
> Let's say we have a simple CQL3 table
>
> CREATE TABLE example (
> id UUID PRIMARY KEY,
> timestamp TIMESTAMP,
> data ASCII
> );
>
> And I need to mutate  (for example encrypt) column values in the "data"
> column for all rows.
>
> What's the recommended approach to perform such migration programatically?
>
> For me the general approach is:
>
> 1. Create another column family
> 2. extract a batch of records
> 3. for each extracted record, perform mutation, insert it in the new cf
> and delete from old one
> 4. repeat until source cf not empty
>
> Is it correct approach and if yes, how to implement some kind of paging
> for the step 2?
>


[ANN] Cassaforte 1.2.0 is released

2013-09-07 Thread Oleksandr Petrov
Cassaforte [1] is a Clojure client for Apache Cassandra 1.2+. It is built
around CQL 3
and focuses on ease of use. You will likely find that using Cassandra from
Clojure has
never been so easy.

1.2.0 is a minor release that introduces one minor feature, fixes a couple
of bugs, and
makes Cassaforte compatible with Cassandra 2.0.

Release notes:
http://blog.clojurewerkz.org/blog/2013/09/07/cassaforte-1-dot-2-0-is-released/

1. http://clojurecassandra.info/ 

--
Alex P

https://github.com/ifesdjeen
https://twitter.com/ifesdjeen


Re: row cache

2013-09-07 Thread Edward Capriolo
I have found row cache to be more trouble then bene.

The term fools gold comes to mind.

Using key cache and leaving more free main memory seems stable and does not
have as many complications.
On Wednesday, September 4, 2013, S C  wrote:
> Thank you all for your valuable comments and information.
>
> -SC
>
>
>> Date: Tue, 3 Sep 2013 12:01:59 -0400
>> From: chris.burrou...@gmail.com
>> To: user@cassandra.apache.org
>> CC: fsareshw...@quantcast.com
>> Subject: Re: row cache
>>
>> On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
>> > Yes, that is correct.
>> >
>> > The SerializingCacheProvider stores row cache contents off heap. I
believe you
>> > need JNA enabled for this though. Someone please correct me if I am
wrong here.
>> >
>> > The ConcurrentLinkedHashCacheProvider stores row cache contents on the
java heap
>> > itself.
>> >
>>
>> Naming things is hard. Both caches are in memory and are backed by a
>> ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
>> the *values* are stored in off heap buffers. Both must store a half
>> dozen or so objects (on heap) per entry
>> (org.apache.cassandra.cache.RowCacheKey,
>>
com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
>> java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
>> probably be better to call this a "mixed-heap" rather than off-heap
>> cache. You may find the number of entires you can hold without gc
>> problems to be surprising low (relative to say memcached, or physical
>> memory on modern hardware).
>>
>> Invalidating a column with SerializingCacheProvider invalidates the
>> entire row while with ConcurrentLinkedHashCacheProvider it does not.
>> SerializingCacheProvider does not require JNA.
>>
>> Both also use memory estimation of the size (of the values only) to
>> determine the total number of entries retained. Estimating the size of
>> the totally on-heap ConcurrentLinkedHashCacheProvider has historically
>> been dicey since we switched from sizing in entries, and it has been
>> removed in 2.0.0.
>>
>> As said elsewhere in this thread the utility of the row cache varies
>> from "absolutely essential" to "source of numerous problems" depending
>> on the specifics of the data model and request distribution.
>>
>>
>


w00tw00t.at.ISC.SANS.DFind not found

2013-09-07 Thread Tim Dunphy
Hey all,

 I'm seeing this exception in my cassandra logs:

Exception during http request
mx4j.tools.adaptor.http.HttpException: file
mx4j/tools/adaptor/http/xsl/w00tw00t.at.ISC.SANS.DFind:) not found
at
mx4j.tools.adaptor.http.XSLTProcessor.notFoundElement(XSLTProcessor.java:314)
at
mx4j.tools.adaptor.http.HttpAdaptor.findUnknownElement(HttpAdaptor.java:800)
at
mx4j.tools.adaptor.http.HttpAdaptor$HttpClient.run(HttpAdaptor.java:976)

Do I need to be concerned about the security of this server? How can I
correct/eliminate this error message? I've just upgraded to Cassandra 2.0
,and this is the first time I've seen this error.

Thanks!
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: row cache

2013-09-07 Thread Mohit Anchlia
I agree. We've had similar experience.

Sent from my iPhone

On Sep 7, 2013, at 6:05 PM, Edward Capriolo  wrote:

> I have found row cache to be more trouble then bene.
> 
> The term fools gold comes to mind.
> 
> Using key cache and leaving more free main memory seems stable and does not 
> have as many complications. 
> On Wednesday, September 4, 2013, S C  wrote:
> > Thank you all for your valuable comments and information.
> >
> > -SC
> >
> >
> >> Date: Tue, 3 Sep 2013 12:01:59 -0400
> >> From: chris.burrou...@gmail.com
> >> To: user@cassandra.apache.org
> >> CC: fsareshw...@quantcast.com
> >> Subject: Re: row cache
> >>
> >> On 09/01/2013 03:06 PM, Faraaz Sareshwala wrote:
> >> > Yes, that is correct.
> >> >
> >> > The SerializingCacheProvider stores row cache contents off heap. I 
> >> > believe you
> >> > need JNA enabled for this though. Someone please correct me if I am 
> >> > wrong here.
> >> >
> >> > The ConcurrentLinkedHashCacheProvider stores row cache contents on the 
> >> > java heap
> >> > itself.
> >> >
> >>
> >> Naming things is hard. Both caches are in memory and are backed by a
> >> ConcurrentLinkekHashMap. In the case of the SerializingCacheProvider
> >> the *values* are stored in off heap buffers. Both must store a half
> >> dozen or so objects (on heap) per entry
> >> (org.apache.cassandra.cache.RowCacheKey,
> >> com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue,
> >> java.util.concurrent.ConcurrentHashMap$HashEntry, etc). It would
> >> probably be better to call this a "mixed-heap" rather than off-heap
> >> cache. You may find the number of entires you can hold without gc
> >> problems to be surprising low (relative to say memcached, or physical
> >> memory on modern hardware).
> >>
> >> Invalidating a column with SerializingCacheProvider invalidates the
> >> entire row while with ConcurrentLinkedHashCacheProvider it does not.
> >> SerializingCacheProvider does not require JNA.
> >>
> >> Both also use memory estimation of the size (of the values only) to
> >> determine the total number of entries retained. Estimating the size of
> >> the totally on-heap ConcurrentLinkedHashCacheProvider has historically
> >> been dicey since we switched from sizing in entries, and it has been
> >> removed in 2.0.0.
> >>
> >> As said elsewhere in this thread the utility of the row cache varies
> >> from "absolutely essential" to "source of numerous problems" depending
> >> on the specifics of the data model and request distribution.
> >>
> >>
> >