Hadoop + Cassandra

2012-01-06 Thread Alain RODRIGUEZ
Hello.

I have a 4 nodes cluster running Cassandra (without Datastax Brisk) in
production.

Now I want to add hadoop (and maybe Pig / Hive ?) to be able to perform
some analytics.
I don't know how to get started ? Is there a tutorial explaining how to
install, configure and use hadoop andadvantages using it as a cassandra
overlay or on separated nodes
http://www.slideshare.net/jeromatron/cassandrahadoop-4399672 ?

I am already able to read a lot of statistics in real time thanks to
Cassandra only and to the way I model my CFs but I also have a lot of raw
data I would like to use them in order to get more statistics.

I'll be glad to learn about any interesting things you learnt with your own
experiences with hadoop + Cassandra.

Thanks in advance.


Re: Composite column docs

2012-01-06 Thread Shimi Kiviti
On Thu, Jan 5, 2012 at 9:13 PM, aaron morton wrote:

> What client are you using ?
>
I am writing a client.


> For example pycassa has some sweet documentation
> http://pycassa.github.com/pycassa/assorted/composite_types.html
>
It is a sweet documentation but it doesn't help me. I a lower level
documntation


> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/01/2012, at 12:48 AM, Shimi Kiviti wrote:
>
> Is there a doc for using composite columns with thrift?
> Is
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
>  the
> only doc?
> does the client needs to add the length to the get \ get_slice... queries
> or is it taken care of on the server side?
>
> Shimi
>
>
>


Re: Hadoop + Cassandra

2012-01-06 Thread Jeremy Hanna
I would first look at http://wiki.apache.org/cassandra/HadoopSupport - you'll 
want to look in the section on cluster configuration.  DataStax also has a 
product that makes it pretty simple to use Hadoop with Cassandra if you don't 
mind paying for it - http://www.datastax.com/products/enterprise  Where I work, 
we've been using hadoop with cassandra for almost a year now but we're looking 
into using datastax enterprise right now.

On Jan 6, 2012, at 4:35 AM, Alain RODRIGUEZ wrote:

> Hello.
> 
> I have a 4 nodes cluster running Cassandra (without Datastax Brisk) in 
> production.
> 
> Now I want to add hadoop (and maybe Pig / Hive ?) to be able to perform some 
> analytics.
> I don't know how to get started ? Is there a tutorial explaining how to 
> install, configure and use hadoop andadvantages using it as a cassandra 
> overlay or on separated nodes 
> http://www.slideshare.net/jeromatron/cassandrahadoop-4399672 ?
> 
> I am already able to read a lot of statistics in real time thanks to 
> Cassandra only and to the way I model my CFs but I also have a lot of raw 
> data I would like to use them in order to get more statistics.
> 
> I'll be glad to learn about any interesting things you learnt with your own 
> experiences with hadoop + Cassandra.
> 
> Thanks in advance.



Re: Should I throttle deletes?

2012-01-06 Thread Vitalii Tymchyshyn

05.01.12 22:29, Philippe ???(??):


Then I do have a question, what do people generally use as the
batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered I've 
moved to batches of 20 for writes and 256 for reads. Everything is a 
lot smoother : no more timeouts.


I'd better reduce mutation thread pool with concurrent_writes setting. 
This will lower server load no matter, how many clients are sending 
batches, at the same time you still have good batching.


Best regards, Vitalii Tymchyshyn


Re: is it bad to have lots of column families?

2012-01-06 Thread Vitalii Tymchyshyn
Yes, as far as I know. Note, that it's not full index, but "sampling" 
one. See index_interval configuration parameter and it's description.
As of bloom filters, it's not configurable now, yet there is a ticket 
with a patch that should make it configurable.


05.01.12 22:45, Carlo Pires написав(ла):

Does index for CFs must fit in node's memory?

2012/1/5 Віталій Тимчишин mailto:tiv...@gmail.com>>



2012/1/5 Michael Cetrulo mailto:mail2sa...@gmail.com>>

in a traditional database it's not a good a idea to have
hundreds of tables but is it also bad to have hundreds of
column families in cassandra? thank you.


As far as I can see, this may raise memory requirements for you,
since you need to have index/bloom filter for each column family
in memory.


--
Best regards,
 Vitalii Tymchyshyn


Re: Should I throttle deletes?

2012-01-06 Thread Philippe
But you will then get timeouts.
Le 6 janv. 2012 15:17, "Vitalii Tymchyshyn"  a écrit :

> **
> 05.01.12 22:29, Philippe написав(ла):
>
>  Then I do have a question, what do people generally use as the batch
>> size?
>>
>  I used to do batches from 500 to 2000 like you do.
> After investigating issues such as the one you've encountered I've moved
> to batches of 20 for writes and 256 for reads. Everything is a lot smoother
> : no more timeouts.
>
>  I'd better reduce mutation thread pool with concurrent_writes setting.
> This will lower server load no matter, how many clients are sending
> batches, at the same time you still have good batching.
>
> Best regards, Vitalii Tymchyshyn
>


Re: Should I throttle deletes?

2012-01-06 Thread Vitalii Tymchyshyn
Do you mean on writes? Yes, your timeouts must be so that your write 
batch could complete until timeout elapsed. But this will lower write 
load, so reads should not timeout.


Best regards, Vitalii Tymchyshym

06.01.12 17:37, Philippe написав(ла):


But you will then get timeouts.

Le 6 janv. 2012 15:17, "Vitalii Tymchyshyn" > a écrit :


05.01.12 22:29, Philippe написав(ла):


Then I do have a question, what do people generally use as
the batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered
I've moved to batches of 20 for writes and 256 for reads.
Everything is a lot smoother : no more timeouts.


I'd better reduce mutation thread pool with concurrent_writes
setting. This will lower server load no matter, how many clients
are sending batches, at the same time you still have good batching.

Best regards, Vitalii Tymchyshyn





Re: Cannot start cassandra node anymore

2012-01-06 Thread Peter Schuller
(too much to quote)

Looks to me like this should be fixable by removing hints from the
node in question (I don't remember whether this is a bug that's been
identified and fixed or not).

I may be wrong because I'm just basing this on a quick look at the
stack trace but it seems to me there are hints on the node, for other
nodes, that contain writes to a deleted column family.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Dynamic columns in a column family?

2012-01-06 Thread Frank Yang
Hi everyone,

I am wondering whether it is possible to not to define the column
metadata when creating a column family, but to specify the column when
client updates data, for example:

CREATE COLUMN FAMILY products WITH default_validation_class= UTF8Type
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
set products['1001']['brand']= ‘Sony’;

In other words, we don't want to fix the columns definition when
creating a column family, as we might have to insert new columns into
the column family.  Is it possible to achieve it?

Thanks,
Fan


How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Drew Kutcharian
Hi Everyone,

What's the best way to reliably have unique constraints like functionality with 
Cassandra? I have the following (which I think should be very common) use case.

User CF
Row Key: user email
Columns: userId: UUID, etc...

UserAttribute1 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

UserAttribute2 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

The issue is we need to guarantee that no two people register with the same 
email address. In addition, without locking, potentially a malicious user can 
"hijack" another user's account by registering using the user's email address.

I know that this can be done using a lock manager such as ZooKeeper or 
HazelCast, but the issue with using either of them is that if ZooKeeper or 
HazelCast is down, then you can't be sure about the reliability of the lock. So 
this potentially, in the very rare instance where the lock manager is down and 
two users are registering with the same email, can cause major issues.

In addition, I know this can be done with other tools such as Redis (use Redis 
for this use case, and Cassandra for everything else), but I'm interested in 
hearing if anyone has solved this issue using Cassandra only.

Thanks,

Drew

Re: Dynamic columns in a column family?

2012-01-06 Thread Jonathan Ellis
Yes.

(Please keep these to the user list rather than dev.)

On Fri, Jan 6, 2012 at 11:59 AM, Frank Yang  wrote:
> Hi everyone,
>
> I am wondering whether it is possible to not to define the column
> metadata when creating a column family, but to specify the column when
> client updates data, for example:
>
> CREATE COLUMN FAMILY products WITH default_validation_class= UTF8Type
> AND key_validation_class=UTF8Type AND comparator=UTF8Type;
> set products['1001']['brand']= ‘Sony’;
>
> In other words, we don't want to fix the columns definition when
> creating a column family, as we might have to insert new columns into
> the column family.  Is it possible to achieve it?
>
> Thanks,
> Fan



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Pending on ReadStage

2012-01-06 Thread Daning Wang
Hi all,

We have 5 nodes cluster(0.8.6), but the performance from one node is way
behind others, I checked tpstats, It always show non-zero pending
ReadStage, I don't see this problem on other nodes.

What caused the problem? I/O? Memory? Cpu usage is still low. How to fix
this problem?

~/bin/nodetool -h localhost tpstats
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage1115  56960
0 0
RequestResponseStage  0 0 606695
0 0
MutationStage 0 0 538634
0 0
ReadRepairStage   0 0 17
0 0
ReplicateOnWriteStage 0 0  0
0 0
GossipStage   0 0   5734
0 0
AntiEntropyStage  0 0  0
0 0
MigrationStage0 0  0
0 0
MemtablePostFlusher   0 0  7
0 0
StreamStage   0 0  0
0 0
FlushWriter   0 0  8
0 0
MiscStage 0 0  0
0 0
FlushSorter   0 0  0
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 1 4  0
0 0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ  9082
MUTATION 0
REQUEST_RESPONSE 0

Thanks you in advance.

Daning


Re: Pending on ReadStage

2012-01-06 Thread Mohit Anchlia
Are all your nodes equally balanced in terms of read requests? Are you
using RandomPartitioner? Are you reading using indexes?

First thing you can do is compare iostat -x output between the 2 nodes
to rule out any io issues assuming your read requests are equally
balanced.

On Fri, Jan 6, 2012 at 10:11 AM, Daning Wang  wrote:
> Hi all,
>
> We have 5 nodes cluster(0.8.6), but the performance from one node is way
> behind others, I checked tpstats, It always show non-zero pending ReadStage,
> I don't see this problem on other nodes.
>
> What caused the problem? I/O? Memory? Cpu usage is still low. How to fix
> this problem?
>
> ~/bin/nodetool -h localhost tpstats
> Pool Name    Active   Pending  Completed   Blocked  All
> time blocked
> ReadStage    11    15  56960
> 0 0
> RequestResponseStage  0 0 606695
> 0 0
> MutationStage 0 0 538634
> 0 0
> ReadRepairStage   0 0 17
> 0 0
> ReplicateOnWriteStage 0 0  0
> 0 0
> GossipStage   0 0   5734
> 0 0
> AntiEntropyStage  0 0  0
> 0 0
> MigrationStage    0 0  0
> 0 0
> MemtablePostFlusher   0 0  7
> 0 0
> StreamStage   0 0  0
> 0 0
> FlushWriter   0 0  8
> 0 0
> MiscStage 0 0  0
> 0 0
> FlushSorter   0 0  0
> 0 0
> InternalResponseStage 0 0  0
> 0 0
> HintedHandoff 1 4  0
> 0 0
>
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR  0
> BINARY   0
> READ  9082
> MUTATION 0
> REQUEST_RESPONSE 0
>
> Thanks you in advance.
>
> Daning
>


Is this correct way to create a Composite Type

2012-01-06 Thread investtr

I am trying to understand the composite type.
Is this a right way to create a Composite Data ?

<>Follower_For_Users
<>#{"userID",n}:"followerID"

for simplicity I have replaced userID by followerID.

regards,
Ramesh


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Mohit Anchlia
On Fri, Jan 6, 2012 at 10:03 AM, Drew Kutcharian  wrote:
> Hi Everyone,
>
> What's the best way to reliably have unique constraints like functionality 
> with Cassandra? I have the following (which I think should be very common) 
> use case.
>
> User CF
> Row Key: user email
> Columns: userId: UUID, etc...
>
> UserAttribute1 CF:
> Row Key: userId (which is the uuid that's mapped to user email)
> Columns: ...
>
> UserAttribute2 CF:
> Row Key: userId (which is the uuid that's mapped to user email)
> Columns: ...
>
> The issue is we need to guarantee that no two people register with the same 
> email address. In addition, without locking, potentially a malicious user can 
> "hijack" another user's account by registering using the user's email address.

It could be as simple as reading before writing to make sure that
email doesn't exist. But I think you are looking at how to handle 2
concurrent requests for same email? Only way I can think of is:

1) Create new CF say tracker
2) write email and time uuid to CF tracker
3) read from CF tracker
4) if you find a row other than yours then wait and read again from
tracker after few ms
5) read from USER CF
6) write if no rows in USER CF
7) delete from tracker

Please note you might have to modify this logic a little bit, but this
should give you some ideas of how to approach this problem without
locking.

Regarding hijacking accounts, can you elaborate little more?
>
> I know that this can be done using a lock manager such as ZooKeeper or 
> HazelCast, but the issue with using either of them is that if ZooKeeper or 
> HazelCast is down, then you can't be sure about the reliability of the lock. 
> So this potentially, in the very rare instance where the lock manager is down 
> and two users are registering with the same email, can cause major issues.
>
> In addition, I know this can be done with other tools such as Redis (use 
> Redis for this use case, and Cassandra for everything else), but I'm 
> interested in hearing if anyone has solved this issue using Cassandra only.
>
> Thanks,
>
> Drew


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Drew Kutcharian
Yes, my issue is with handling concurrent requests. I'm not sure how your logic 
will work with eventual consistency. I'm going to have the same issue in the 
"tracker" CF too, no?


On Jan 6, 2012, at 10:38 AM, Mohit Anchlia wrote:

> On Fri, Jan 6, 2012 at 10:03 AM, Drew Kutcharian  wrote:
>> Hi Everyone,
>> 
>> What's the best way to reliably have unique constraints like functionality 
>> with Cassandra? I have the following (which I think should be very common) 
>> use case.
>> 
>> User CF
>> Row Key: user email
>> Columns: userId: UUID, etc...
>> 
>> UserAttribute1 CF:
>> Row Key: userId (which is the uuid that's mapped to user email)
>> Columns: ...
>> 
>> UserAttribute2 CF:
>> Row Key: userId (which is the uuid that's mapped to user email)
>> Columns: ...
>> 
>> The issue is we need to guarantee that no two people register with the same 
>> email address. In addition, without locking, potentially a malicious user 
>> can "hijack" another user's account by registering using the user's email 
>> address.
> 
> It could be as simple as reading before writing to make sure that
> email doesn't exist. But I think you are looking at how to handle 2
> concurrent requests for same email? Only way I can think of is:
> 
> 1) Create new CF say tracker
> 2) write email and time uuid to CF tracker
> 3) read from CF tracker
> 4) if you find a row other than yours then wait and read again from
> tracker after few ms
> 5) read from USER CF
> 6) write if no rows in USER CF
> 7) delete from tracker
> 
> Please note you might have to modify this logic a little bit, but this
> should give you some ideas of how to approach this problem without
> locking.
> 
> Regarding hijacking accounts, can you elaborate little more?
>> 
>> I know that this can be done using a lock manager such as ZooKeeper or 
>> HazelCast, but the issue with using either of them is that if ZooKeeper or 
>> HazelCast is down, then you can't be sure about the reliability of the lock. 
>> So this potentially, in the very rare instance where the lock manager is 
>> down and two users are registering with the same email, can cause major 
>> issues.
>> 
>> In addition, I know this can be done with other tools such as Redis (use 
>> Redis for this use case, and Cassandra for everything else), but I'm 
>> interested in hearing if anyone has solved this issue using Cassandra only.
>> 
>> Thanks,
>> 
>> Drew



Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Mohit Anchlia
I don't think if you read and write with QUORUM

On Fri, Jan 6, 2012 at 11:01 AM, Drew Kutcharian  wrote:
> Yes, my issue is with handling concurrent requests. I'm not sure how your 
> logic will work with eventual consistency. I'm going to have the same issue 
> in the "tracker" CF too, no?
>
>
> On Jan 6, 2012, at 10:38 AM, Mohit Anchlia wrote:
>
>> On Fri, Jan 6, 2012 at 10:03 AM, Drew Kutcharian  wrote:
>>> Hi Everyone,
>>>
>>> What's the best way to reliably have unique constraints like functionality 
>>> with Cassandra? I have the following (which I think should be very common) 
>>> use case.
>>>
>>> User CF
>>> Row Key: user email
>>> Columns: userId: UUID, etc...
>>>
>>> UserAttribute1 CF:
>>> Row Key: userId (which is the uuid that's mapped to user email)
>>> Columns: ...
>>>
>>> UserAttribute2 CF:
>>> Row Key: userId (which is the uuid that's mapped to user email)
>>> Columns: ...
>>>
>>> The issue is we need to guarantee that no two people register with the same 
>>> email address. In addition, without locking, potentially a malicious user 
>>> can "hijack" another user's account by registering using the user's email 
>>> address.
>>
>> It could be as simple as reading before writing to make sure that
>> email doesn't exist. But I think you are looking at how to handle 2
>> concurrent requests for same email? Only way I can think of is:
>>
>> 1) Create new CF say tracker
>> 2) write email and time uuid to CF tracker
>> 3) read from CF tracker
>> 4) if you find a row other than yours then wait and read again from
>> tracker after few ms
>> 5) read from USER CF
>> 6) write if no rows in USER CF
>> 7) delete from tracker
>>
>> Please note you might have to modify this logic a little bit, but this
>> should give you some ideas of how to approach this problem without
>> locking.
>>
>> Regarding hijacking accounts, can you elaborate little more?
>>>
>>> I know that this can be done using a lock manager such as ZooKeeper or 
>>> HazelCast, but the issue with using either of them is that if ZooKeeper or 
>>> HazelCast is down, then you can't be sure about the reliability of the 
>>> lock. So this potentially, in the very rare instance where the lock manager 
>>> is down and two users are registering with the same email, can cause major 
>>> issues.
>>>
>>> In addition, I know this can be done with other tools such as Redis (use 
>>> Redis for this use case, and Cassandra for everything else), but I'm 
>>> interested in hearing if anyone has solved this issue using Cassandra only.
>>>
>>> Thanks,
>>>
>>> Drew
>


Re: Pending on ReadStage

2012-01-06 Thread Daning Wang
Thanks for your reply.

Nodes are equally balanced. and it is RandomPartitioner. I think that
machine is slower, Are you saying it is IO issue?

Daning

On Fri, Jan 6, 2012 at 10:25 AM, Mohit Anchlia wrote:

> Are all your nodes equally balanced in terms of read requests? Are you
> using RandomPartitioner? Are you reading using indexes?
>
> First thing you can do is compare iostat -x output between the 2 nodes
> to rule out any io issues assuming your read requests are equally
> balanced.
>
> On Fri, Jan 6, 2012 at 10:11 AM, Daning Wang  wrote:
> > Hi all,
> >
> > We have 5 nodes cluster(0.8.6), but the performance from one node is way
> > behind others, I checked tpstats, It always show non-zero pending
> ReadStage,
> > I don't see this problem on other nodes.
> >
> > What caused the problem? I/O? Memory? Cpu usage is still low. How to fix
> > this problem?
> >
> > ~/bin/nodetool -h localhost tpstats
> > Pool NameActive   Pending  Completed   Blocked
> All
> > time blocked
> > ReadStage1115  56960
> > 0 0
> > RequestResponseStage  0 0 606695
> > 0 0
> > MutationStage 0 0 538634
> > 0 0
> > ReadRepairStage   0 0 17
> > 0 0
> > ReplicateOnWriteStage 0 0  0
> > 0 0
> > GossipStage   0 0   5734
> > 0 0
> > AntiEntropyStage  0 0  0
> > 0 0
> > MigrationStage0 0  0
> > 0 0
> > MemtablePostFlusher   0 0  7
> > 0 0
> > StreamStage   0 0  0
> > 0 0
> > FlushWriter   0 0  8
> > 0 0
> > MiscStage 0 0  0
> > 0 0
> > FlushSorter   0 0  0
> > 0 0
> > InternalResponseStage 0 0  0
> > 0 0
> > HintedHandoff 1 4  0
> > 0 0
> >
> > Message type   Dropped
> > RANGE_SLICE  0
> > READ_REPAIR  0
> > BINARY   0
> > READ  9082
> > MUTATION 0
> > REQUEST_RESPONSE 0
> >
> > Thanks you in advance.
> >
> > Daning
> >
>


Beginner Question - Super Column Family Alternative

2012-01-06 Thread investtr

Please help me understand this.
I am not sure if this is the right place to ask such question.

I read that it is not safe to use Super Column Family.
And the alternative, I found was to use Composite Column Names.

Many managers will have many employees.

*<>Manager_Employee
<>#managerID
<>#employeeID*

So with Composite Column
Should it look like this ?
<>Manager_Employee
<>#managerID:employeeID

And is this the right approach to deal with not using SCF. Also will it 
not create a bigger row size , in case
I have many thousands of messages associated to each user and millions 
of users  ?



regards,
Ramesh


Re: Beginner Question - Super Column Family Alternative

2012-01-06 Thread investtr

On 01/06/2012 01:48 PM, investtr wrote:

Please help me understand this.
I am not sure if this is the right place to ask such question.

I read that it is not safe to use Super Column Family.
And the alternative, I found was to use Composite Column Names.

Many managers will have many employees.

*<>Manager_Employee
<>#managerID
<>#employeeID*

So with Composite Column
Should it look like this ?
<>Manager_Employee
<>#managerID:employeeID

And is this the right approach to deal with not using SCF. Also will 
it not create a bigger row size , in case
I have many thousands of messages associated to each user and millions 
of users  ?



regards,
Ramesh

I found the answer to my question in this example.

HotelByCity (CF) Key: city:state {
  key: Phoenix:AZ {AZC_053: -, AZC_011: -}
  key: San Francisco:CA {CAS_021: -}
  key: New York:NY {NYN_042: -}
}
and this slide
http://www.slideshare.net/edanuff/indexing-in-cassandra

Thanks
Ramesh


OutOfMemory Errors with Cassandra 1.0.5

2012-01-06 Thread Caleb Rackliffe
Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:6880,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(ParallelCompactionIterable.java:190)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:164)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:144)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext(ParallelCompactionIterable.java:103)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext(ParallelCompactionIterable.java:90)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.has

Re: OutOfMemory Errors with Cassandra 1.0.5

2012-01-06 Thread Caleb Rackliffe
One other item…

java –version

java version "1.7.0_01"
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Fri, 6 Jan 2012 15:28:30 -0500
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: OutOfMemory Errors with Cassandra 1.0.5

Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:6880,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(ParallelCompactionIterable.java:190)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:164)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:144)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:116)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNe

Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Bryce Allen
On Fri, 6 Jan 2012 10:38:17 -0800
Mohit Anchlia  wrote:
> It could be as simple as reading before writing to make sure that
> email doesn't exist. But I think you are looking at how to handle 2
> concurrent requests for same email? Only way I can think of is:
> 
> 1) Create new CF say tracker
> 2) write email and time uuid to CF tracker
> 3) read from CF tracker
> 4) if you find a row other than yours then wait and read again from
> tracker after few ms
> 5) read from USER CF
> 6) write if no rows in USER CF
> 7) delete from tracker
> 
> Please note you might have to modify this logic a little bit, but this
> should give you some ideas of how to approach this problem without
> locking.

Distributed locking is pretty subtle; I haven't seen a correct solution
that uses just Cassandra, even with QUORUM read/write. I suspect it's
not possible.

With the above proposal, in step 4 two processes could both have
inserted an entry in the tracker before either gets a chance to check,
so you need a way to order the requests. I don't think the timestamp
works for ordering, because it's set by the client (even the internal
timestamp is set by the client), and will likely be different from
when the data is actually committed and available to read by other
clients.

For example:

* At time 0ms, client 1 starts insert of u...@example.org
* At time 1ms, client 2 also starts insert for u...@example.org
* At time 2ms, client 2 data is committed
* At time 3ms, client 2 reads tracker and sees that it's the only one,
  so enters the critical section
* At time 4ms, client 1 data is committed
* At time 5ms, client 2 reads tracker, and sees that is not the only
  one, but since it has the lowest timestamp (0ms vs 1ms), it enters
  the critical section.

I don't think Cassandra counters work for ordering either.

This approach is similar to the Zookeeper lock recipe:
http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
but zookeeper has sequence nodes, which provide a consistent way of
ordering the requests. Zookeeper also avoids the busy waiting.

I'd be happy to be proven wrong. But even if it is possible, if it
involves a lot of complexity and busy waiting it's probably not worth
it. There's a reason people are using Zookeeper with Cassandra.

-Bryce


signature.asc
Description: PGP signature


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Bryce Allen
On Fri, 6 Jan 2012 10:03:38 -0800
Drew Kutcharian  wrote:
> I know that this can be done using a lock manager such as ZooKeeper
> or HazelCast, but the issue with using either of them is that if
> ZooKeeper or HazelCast is down, then you can't be sure about the
> reliability of the lock. So this potentially, in the very rare
> instance where the lock manager is down and two users are registering
> with the same email, can cause major issues.

For most applications, if the lock managers is down, you don't acquire
the lock, so you don't enter the critical section. Rather than allowing
inconsistency, you become unavailable (at least to writes that require
a lock).

-Bryce


signature.asc
Description: PGP signature


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Jeremiah Jordan
Correct, any kind of locking in Cassandra requires clocks that are in 
sync, and requires you to wait "possible clock out of sync time" before 
reading to check if you got the lock, to prevent the issue you describe 
below.


There was a pretty detailed discussion of locking with only Cassandra a 
month or so back on this list.


-Jeremiah

On 01/06/2012 02:42 PM, Bryce Allen wrote:

On Fri, 6 Jan 2012 10:38:17 -0800
Mohit Anchlia  wrote:

It could be as simple as reading before writing to make sure that
email doesn't exist. But I think you are looking at how to handle 2
concurrent requests for same email? Only way I can think of is:

1) Create new CF say tracker
2) write email and time uuid to CF tracker
3) read from CF tracker
4) if you find a row other than yours then wait and read again from
tracker after few ms
5) read from USER CF
6) write if no rows in USER CF
7) delete from tracker

Please note you might have to modify this logic a little bit, but this
should give you some ideas of how to approach this problem without
locking.

Distributed locking is pretty subtle; I haven't seen a correct solution
that uses just Cassandra, even with QUORUM read/write. I suspect it's
not possible.

With the above proposal, in step 4 two processes could both have
inserted an entry in the tracker before either gets a chance to check,
so you need a way to order the requests. I don't think the timestamp
works for ordering, because it's set by the client (even the internal
timestamp is set by the client), and will likely be different from
when the data is actually committed and available to read by other
clients.

For example:

* At time 0ms, client 1 starts insert of u...@example.org
* At time 1ms, client 2 also starts insert for u...@example.org
* At time 2ms, client 2 data is committed
* At time 3ms, client 2 reads tracker and sees that it's the only one,
   so enters the critical section
* At time 4ms, client 1 data is committed
* At time 5ms, client 2 reads tracker, and sees that is not the only
   one, but since it has the lowest timestamp (0ms vs 1ms), it enters
   the critical section.

I don't think Cassandra counters work for ordering either.

This approach is similar to the Zookeeper lock recipe:
http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
but zookeeper has sequence nodes, which provide a consistent way of
ordering the requests. Zookeeper also avoids the busy waiting.

I'd be happy to be proven wrong. But even if it is possible, if it
involves a lot of complexity and busy waiting it's probably not worth
it. There's a reason people are using Zookeeper with Cassandra.

-Bryce


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Jeremiah Jordan
Since a Zookeeper cluster is a quorum based system similar to Cassandra, 
it only goes down when n/2 nodes go down.  And the same way you have to 
stop writing to Cassandra if N/2 nodes are down (if using QUoRUM), your 
App will have to wait for the Zookeeper cluster to come online again 
before it can proceed.


On 01/06/2012 12:03 PM, Drew Kutcharian wrote:

Hi Everyone,

What's the best way to reliably have unique constraints like functionality with 
Cassandra? I have the following (which I think should be very common) use case.

User CF
Row Key: user email
Columns: userId: UUID, etc...

UserAttribute1 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

UserAttribute2 CF:
Row Key: userId (which is the uuid that's mapped to user email)
Columns: ...

The issue is we need to guarantee that no two people register with the same email 
address. In addition, without locking, potentially a malicious user can 
"hijack" another user's account by registering using the user's email address.

I know that this can be done using a lock manager such as ZooKeeper or 
HazelCast, but the issue with using either of them is that if ZooKeeper or 
HazelCast is down, then you can't be sure about the reliability of the lock. So 
this potentially, in the very rare instance where the lock manager is down and 
two users are registering with the same email, can cause major issues.

In addition, I know this can be done with other tools such as Redis (use Redis 
for this use case, and Cassandra for everything else), but I'm interested in 
hearing if anyone has solved this issue using Cassandra only.

Thanks,

Drew


java.lang.IllegalArgumentException occurred when creating a keyspcace with replication factor

2012-01-06 Thread Sajith Kariyawasam
Hi all,

I tried creating a keyspace with the replication factor 3, using cli
interface ... in Cassandra 1.0.6  (earlier tried in 0.8.2 and failed too)

But I'm getting an exception

"java.lang.IllegalArgumentException: No enum const class
org.apache.cassandra.cli.CliClient$AddKeyspaceArgument.REPLICATION_FACTOR"

The command I used was

[default@unknown] create keyspace testkeyspace with replication_factor=3;

What has gone wrong  ?

Many thanks in advance
-- 
Best Regards
Sajith


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Bryce Allen
This looks like it:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html

There's also some interesting JIRA tickets related to locking/CAS:
https://issues.apache.org/jira/browse/CASSANDRA-2686
https://issues.apache.org/jira/browse/CASSANDRA-48

-Bryce

On Fri, 06 Jan 2012 14:53:21 -0600
Jeremiah Jordan  wrote:
> Correct, any kind of locking in Cassandra requires clocks that are in 
> sync, and requires you to wait "possible clock out of sync time"
> before reading to check if you got the lock, to prevent the issue you
> describe below.
> 
> There was a pretty detailed discussion of locking with only Cassandra
> a month or so back on this list.
> 
> -Jeremiah
> 
> On 01/06/2012 02:42 PM, Bryce Allen wrote:
> > On Fri, 6 Jan 2012 10:38:17 -0800
> > Mohit Anchlia  wrote:
> >> It could be as simple as reading before writing to make sure that
> >> email doesn't exist. But I think you are looking at how to handle 2
> >> concurrent requests for same email? Only way I can think of is:
> >>
> >> 1) Create new CF say tracker
> >> 2) write email and time uuid to CF tracker
> >> 3) read from CF tracker
> >> 4) if you find a row other than yours then wait and read again from
> >> tracker after few ms
> >> 5) read from USER CF
> >> 6) write if no rows in USER CF
> >> 7) delete from tracker
> >>
> >> Please note you might have to modify this logic a little bit, but
> >> this should give you some ideas of how to approach this problem
> >> without locking.
> > Distributed locking is pretty subtle; I haven't seen a correct
> > solution that uses just Cassandra, even with QUORUM read/write. I
> > suspect it's not possible.
> >
> > With the above proposal, in step 4 two processes could both have
> > inserted an entry in the tracker before either gets a chance to
> > check, so you need a way to order the requests. I don't think the
> > timestamp works for ordering, because it's set by the client (even
> > the internal timestamp is set by the client), and will likely be
> > different from when the data is actually committed and available to
> > read by other clients.
> >
> > For example:
> >
> > * At time 0ms, client 1 starts insert of u...@example.org
> > * At time 1ms, client 2 also starts insert for u...@example.org
> > * At time 2ms, client 2 data is committed
> > * At time 3ms, client 2 reads tracker and sees that it's the only
> > one, so enters the critical section
> > * At time 4ms, client 1 data is committed
> > * At time 5ms, client 2 reads tracker, and sees that is not the only
> >one, but since it has the lowest timestamp (0ms vs 1ms), it
> > enters the critical section.
> >
> > I don't think Cassandra counters work for ordering either.
> >
> > This approach is similar to the Zookeeper lock recipe:
> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
> > but zookeeper has sequence nodes, which provide a consistent way of
> > ordering the requests. Zookeeper also avoids the busy waiting.
> >
> > I'd be happy to be proven wrong. But even if it is possible, if it
> > involves a lot of complexity and busy waiting it's probably not
> > worth it. There's a reason people are using Zookeeper with
> > Cassandra.
> >
> > -Bryce


signature.asc
Description: PGP signature


How to find out when a nodetool operation has ended?

2012-01-06 Thread Maxim Potekhin

Suppose I start a repair on one or a few nodes in my cluster,
from an interactive machine in the office, and leave for the day
(which is a very realistic scenario imho).

Is there a way to know, from a remote machine, when a particular
action, such as compaction or repair, has been finished?

I figured that compaction stats can be mum at times, thus
it's not a reliable indicator.

Many thanks,

Maxim



Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Mohit Anchlia
This looks like right way to do it. But remember this still doesn't
gurantee if your clocks drifts way too much. But it's trade-off with
having to manage one additional component or use something internal to
C*. It would be good to see similar functionality implemented in C* so
that clients don't have to deal with it explicitly.

On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen  wrote:
> This looks like it:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
>
> There's also some interesting JIRA tickets related to locking/CAS:
> https://issues.apache.org/jira/browse/CASSANDRA-2686
> https://issues.apache.org/jira/browse/CASSANDRA-48
>
> -Bryce
>
> On Fri, 06 Jan 2012 14:53:21 -0600
> Jeremiah Jordan  wrote:
>> Correct, any kind of locking in Cassandra requires clocks that are in
>> sync, and requires you to wait "possible clock out of sync time"
>> before reading to check if you got the lock, to prevent the issue you
>> describe below.
>>
>> There was a pretty detailed discussion of locking with only Cassandra
>> a month or so back on this list.
>>
>> -Jeremiah
>>
>> On 01/06/2012 02:42 PM, Bryce Allen wrote:
>> > On Fri, 6 Jan 2012 10:38:17 -0800
>> > Mohit Anchlia  wrote:
>> >> It could be as simple as reading before writing to make sure that
>> >> email doesn't exist. But I think you are looking at how to handle 2
>> >> concurrent requests for same email? Only way I can think of is:
>> >>
>> >> 1) Create new CF say tracker
>> >> 2) write email and time uuid to CF tracker
>> >> 3) read from CF tracker
>> >> 4) if you find a row other than yours then wait and read again from
>> >> tracker after few ms
>> >> 5) read from USER CF
>> >> 6) write if no rows in USER CF
>> >> 7) delete from tracker
>> >>
>> >> Please note you might have to modify this logic a little bit, but
>> >> this should give you some ideas of how to approach this problem
>> >> without locking.
>> > Distributed locking is pretty subtle; I haven't seen a correct
>> > solution that uses just Cassandra, even with QUORUM read/write. I
>> > suspect it's not possible.
>> >
>> > With the above proposal, in step 4 two processes could both have
>> > inserted an entry in the tracker before either gets a chance to
>> > check, so you need a way to order the requests. I don't think the
>> > timestamp works for ordering, because it's set by the client (even
>> > the internal timestamp is set by the client), and will likely be
>> > different from when the data is actually committed and available to
>> > read by other clients.
>> >
>> > For example:
>> >
>> > * At time 0ms, client 1 starts insert of u...@example.org
>> > * At time 1ms, client 2 also starts insert for u...@example.org
>> > * At time 2ms, client 2 data is committed
>> > * At time 3ms, client 2 reads tracker and sees that it's the only
>> > one, so enters the critical section
>> > * At time 4ms, client 1 data is committed
>> > * At time 5ms, client 2 reads tracker, and sees that is not the only
>> >    one, but since it has the lowest timestamp (0ms vs 1ms), it
>> > enters the critical section.
>> >
>> > I don't think Cassandra counters work for ordering either.
>> >
>> > This approach is similar to the Zookeeper lock recipe:
>> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
>> > but zookeeper has sequence nodes, which provide a consistent way of
>> > ordering the requests. Zookeeper also avoids the busy waiting.
>> >
>> > I'd be happy to be proven wrong. But even if it is possible, if it
>> > involves a lot of complexity and busy waiting it's probably not
>> > worth it. There's a reason people are using Zookeeper with
>> > Cassandra.
>> >
>> > -Bryce


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Drew Kutcharian
Bryce, 

I'm not sure about ZooKeeper, but I know if you have a partition between 
HazelCast nodes, than the nodes can acquire the same lock independently in each 
divided partition. How does ZooKeeper handle this situation?

-- Drew


On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:

> On Fri, 6 Jan 2012 10:03:38 -0800
> Drew Kutcharian  wrote:
>> I know that this can be done using a lock manager such as ZooKeeper
>> or HazelCast, but the issue with using either of them is that if
>> ZooKeeper or HazelCast is down, then you can't be sure about the
>> reliability of the lock. So this potentially, in the very rare
>> instance where the lock manager is down and two users are registering
>> with the same email, can cause major issues.
> 
> For most applications, if the lock managers is down, you don't acquire
> the lock, so you don't enter the critical section. Rather than allowing
> inconsistency, you become unavailable (at least to writes that require
> a lock).
> 
> -Bryce



Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Bryce Allen
I don't think it's just clock drift. There is also the period of time
between when the client selects a timestamp, and when the data ends up
committed to cassandra. That drift seems harder to control, when the
nodes and/or clients are under load.

I agree that it would be nice to have something like this in Cassandra
core, but from the JIRA tickets it looks like this has been tried
before, and for various reasons was not added. It's definitely
non-trivial to get right.

On Fri, 6 Jan 2012 13:33:02 -0800
Mohit Anchlia  wrote:
> This looks like right way to do it. But remember this still doesn't
> gurantee if your clocks drifts way too much. But it's trade-off with
> having to manage one additional component or use something internal to
> C*. It would be good to see similar functionality implemented in C* so
> that clients don't have to deal with it explicitly.
> 
> On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen 
> wrote:
> > This looks like it:
> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
> >
> > There's also some interesting JIRA tickets related to locking/CAS:
> > https://issues.apache.org/jira/browse/CASSANDRA-2686
> > https://issues.apache.org/jira/browse/CASSANDRA-48
> >
> > -Bryce
> >
> > On Fri, 06 Jan 2012 14:53:21 -0600
> > Jeremiah Jordan  wrote:
> >> Correct, any kind of locking in Cassandra requires clocks that are
> >> in sync, and requires you to wait "possible clock out of sync time"
> >> before reading to check if you got the lock, to prevent the issue
> >> you describe below.
> >>
> >> There was a pretty detailed discussion of locking with only
> >> Cassandra a month or so back on this list.
> >>
> >> -Jeremiah
> >>
> >> On 01/06/2012 02:42 PM, Bryce Allen wrote:
> >> > On Fri, 6 Jan 2012 10:38:17 -0800
> >> > Mohit Anchlia  wrote:
> >> >> It could be as simple as reading before writing to make sure
> >> >> that email doesn't exist. But I think you are looking at how to
> >> >> handle 2 concurrent requests for same email? Only way I can
> >> >> think of is:
> >> >>
> >> >> 1) Create new CF say tracker
> >> >> 2) write email and time uuid to CF tracker
> >> >> 3) read from CF tracker
> >> >> 4) if you find a row other than yours then wait and read again
> >> >> from tracker after few ms
> >> >> 5) read from USER CF
> >> >> 6) write if no rows in USER CF
> >> >> 7) delete from tracker
> >> >>
> >> >> Please note you might have to modify this logic a little bit,
> >> >> but this should give you some ideas of how to approach this
> >> >> problem without locking.
> >> > Distributed locking is pretty subtle; I haven't seen a correct
> >> > solution that uses just Cassandra, even with QUORUM read/write. I
> >> > suspect it's not possible.
> >> >
> >> > With the above proposal, in step 4 two processes could both have
> >> > inserted an entry in the tracker before either gets a chance to
> >> > check, so you need a way to order the requests. I don't think the
> >> > timestamp works for ordering, because it's set by the client
> >> > (even the internal timestamp is set by the client), and will
> >> > likely be different from when the data is actually committed and
> >> > available to read by other clients.
> >> >
> >> > For example:
> >> >
> >> > * At time 0ms, client 1 starts insert of u...@example.org
> >> > * At time 1ms, client 2 also starts insert for u...@example.org
> >> > * At time 2ms, client 2 data is committed
> >> > * At time 3ms, client 2 reads tracker and sees that it's the only
> >> > one, so enters the critical section
> >> > * At time 4ms, client 1 data is committed
> >> > * At time 5ms, client 2 reads tracker, and sees that is not the
> >> > only one, but since it has the lowest timestamp (0ms vs 1ms), it
> >> > enters the critical section.
> >> >
> >> > I don't think Cassandra counters work for ordering either.
> >> >
> >> > This approach is similar to the Zookeeper lock recipe:
> >> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
> >> > but zookeeper has sequence nodes, which provide a consistent way
> >> > of ordering the requests. Zookeeper also avoids the busy waiting.
> >> >
> >> > I'd be happy to be proven wrong. But even if it is possible, if
> >> > it involves a lot of complexity and busy waiting it's probably
> >> > not worth it. There's a reason people are using Zookeeper with
> >> > Cassandra.
> >> >
> >> > -Bryce


signature.asc
Description: PGP signature


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Mohit Anchlia
On Fri, Jan 6, 2012 at 1:41 PM, Bryce Allen  wrote:
> I don't think it's just clock drift. There is also the period of time
> between when the client selects a timestamp, and when the data ends up
> committed to cassandra. That drift seems harder to control, when the
> nodes and/or clients are under load.

As suggested you control that by sleeping before reading. You are
worried about the edge case but this should work well for the use case
posted by original poster. For eg: How many people will try to create
account with the same email at the same time that will have issue
where none of the safety checks would work?

Your use case might be different and probably no tolerance whatsoever.
In that case C* probably is not the right thing to use anycase.

>
> I agree that it would be nice to have something like this in Cassandra
> core, but from the JIRA tickets it looks like this has been tried
> before, and for various reasons was not added. It's definitely
> non-trivial to get right.
>
> On Fri, 6 Jan 2012 13:33:02 -0800
> Mohit Anchlia  wrote:
>> This looks like right way to do it. But remember this still doesn't
>> gurantee if your clocks drifts way too much. But it's trade-off with
>> having to manage one additional component or use something internal to
>> C*. It would be good to see similar functionality implemented in C* so
>> that clients don't have to deal with it explicitly.
>>
>> On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen 
>> wrote:
>> > This looks like it:
>> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html
>> >
>> > There's also some interesting JIRA tickets related to locking/CAS:
>> > https://issues.apache.org/jira/browse/CASSANDRA-2686
>> > https://issues.apache.org/jira/browse/CASSANDRA-48
>> >
>> > -Bryce
>> >
>> > On Fri, 06 Jan 2012 14:53:21 -0600
>> > Jeremiah Jordan  wrote:
>> >> Correct, any kind of locking in Cassandra requires clocks that are
>> >> in sync, and requires you to wait "possible clock out of sync time"
>> >> before reading to check if you got the lock, to prevent the issue
>> >> you describe below.
>> >>
>> >> There was a pretty detailed discussion of locking with only
>> >> Cassandra a month or so back on this list.
>> >>
>> >> -Jeremiah
>> >>
>> >> On 01/06/2012 02:42 PM, Bryce Allen wrote:
>> >> > On Fri, 6 Jan 2012 10:38:17 -0800
>> >> > Mohit Anchlia  wrote:
>> >> >> It could be as simple as reading before writing to make sure
>> >> >> that email doesn't exist. But I think you are looking at how to
>> >> >> handle 2 concurrent requests for same email? Only way I can
>> >> >> think of is:
>> >> >>
>> >> >> 1) Create new CF say tracker
>> >> >> 2) write email and time uuid to CF tracker
>> >> >> 3) read from CF tracker
>> >> >> 4) if you find a row other than yours then wait and read again
>> >> >> from tracker after few ms
>> >> >> 5) read from USER CF
>> >> >> 6) write if no rows in USER CF
>> >> >> 7) delete from tracker
>> >> >>
>> >> >> Please note you might have to modify this logic a little bit,
>> >> >> but this should give you some ideas of how to approach this
>> >> >> problem without locking.
>> >> > Distributed locking is pretty subtle; I haven't seen a correct
>> >> > solution that uses just Cassandra, even with QUORUM read/write. I
>> >> > suspect it's not possible.
>> >> >
>> >> > With the above proposal, in step 4 two processes could both have
>> >> > inserted an entry in the tracker before either gets a chance to
>> >> > check, so you need a way to order the requests. I don't think the
>> >> > timestamp works for ordering, because it's set by the client
>> >> > (even the internal timestamp is set by the client), and will
>> >> > likely be different from when the data is actually committed and
>> >> > available to read by other clients.
>> >> >
>> >> > For example:
>> >> >
>> >> > * At time 0ms, client 1 starts insert of u...@example.org
>> >> > * At time 1ms, client 2 also starts insert for u...@example.org
>> >> > * At time 2ms, client 2 data is committed
>> >> > * At time 3ms, client 2 reads tracker and sees that it's the only
>> >> > one, so enters the critical section
>> >> > * At time 4ms, client 1 data is committed
>> >> > * At time 5ms, client 2 reads tracker, and sees that is not the
>> >> > only one, but since it has the lowest timestamp (0ms vs 1ms), it
>> >> > enters the critical section.
>> >> >
>> >> > I don't think Cassandra counters work for ordering either.
>> >> >
>> >> > This approach is similar to the Zookeeper lock recipe:
>> >> > http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks
>> >> > but zookeeper has sequence nodes, which provide a consistent way
>> >> > of ordering the requests. Zookeeper also avoids the busy waiting.
>> >> >
>> >> > I'd be happy to be proven wrong. But even if it is possible, if
>> >> > it involves a lot of complexity and busy waiting it's probably
>> >> > not worth it. There's a reason people are using Zook

Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Jeremiah Jordan
By using quorum.  One of the partitions will may be able to acquire 
locks, the other one won't...


On 01/06/2012 03:36 PM, Drew Kutcharian wrote:

Bryce,

I'm not sure about ZooKeeper, but I know if you have a partition between 
HazelCast nodes, than the nodes can acquire the same lock independently in each 
divided partition. How does ZooKeeper handle this situation?

-- Drew


On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:


On Fri, 6 Jan 2012 10:03:38 -0800
Drew Kutcharian  wrote:

I know that this can be done using a lock manager such as ZooKeeper
or HazelCast, but the issue with using either of them is that if
ZooKeeper or HazelCast is down, then you can't be sure about the
reliability of the lock. So this potentially, in the very rare
instance where the lock manager is down and two users are registering
with the same email, can cause major issues.

For most applications, if the lock managers is down, you don't acquire
the lock, so you don't enter the critical section. Rather than allowing
inconsistency, you become unavailable (at least to writes that require
a lock).

-Bryce


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Bryce Allen
That's a good question, and I'm not sure - I'm fairly new to both ZK
and Cassandra. I found this wiki page:
http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
and I think the lock recipe still works, even if a stale read happens.
Assuming that wiki page is correct.

There is still subtlety to locking with ZK though, see (Locks based
on ephemeral nodes) from the zk mailing list in October:
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0

-Bryce

On Fri, 6 Jan 2012 13:36:52 -0800
Drew Kutcharian  wrote:
> Bryce, 
> 
> I'm not sure about ZooKeeper, but I know if you have a partition
> between HazelCast nodes, than the nodes can acquire the same lock
> independently in each divided partition. How does ZooKeeper handle
> this situation?
> 
> -- Drew
> 
> 
> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
> 
> > On Fri, 6 Jan 2012 10:03:38 -0800
> > Drew Kutcharian  wrote:
> >> I know that this can be done using a lock manager such as ZooKeeper
> >> or HazelCast, but the issue with using either of them is that if
> >> ZooKeeper or HazelCast is down, then you can't be sure about the
> >> reliability of the lock. So this potentially, in the very rare
> >> instance where the lock manager is down and two users are
> >> registering with the same email, can cause major issues.
> > 
> > For most applications, if the lock managers is down, you don't
> > acquire the lock, so you don't enter the critical section. Rather
> > than allowing inconsistency, you become unavailable (at least to
> > writes that require a lock).
> > 
> > -Bryce
> 


signature.asc
Description: PGP signature


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Drew Kutcharian
Thanks everyone for the replies. Seems like there is no easy way to handle 
this. It's very surprising that no one seems to have solved such a common use 
case.

-- Drew

On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:

> That's a good question, and I'm not sure - I'm fairly new to both ZK
> and Cassandra. I found this wiki page:
> http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
> and I think the lock recipe still works, even if a stale read happens.
> Assuming that wiki page is correct.
> 
> There is still subtlety to locking with ZK though, see (Locks based
> on ephemeral nodes) from the zk mailing list in October:
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
> 
> -Bryce
> 
> On Fri, 6 Jan 2012 13:36:52 -0800
> Drew Kutcharian  wrote:
>> Bryce, 
>> 
>> I'm not sure about ZooKeeper, but I know if you have a partition
>> between HazelCast nodes, than the nodes can acquire the same lock
>> independently in each divided partition. How does ZooKeeper handle
>> this situation?
>> 
>> -- Drew
>> 
>> 
>> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>> 
>>> On Fri, 6 Jan 2012 10:03:38 -0800
>>> Drew Kutcharian  wrote:
 I know that this can be done using a lock manager such as ZooKeeper
 or HazelCast, but the issue with using either of them is that if
 ZooKeeper or HazelCast is down, then you can't be sure about the
 reliability of the lock. So this potentially, in the very rare
 instance where the lock manager is down and two users are
 registering with the same email, can cause major issues.
>>> 
>>> For most applications, if the lock managers is down, you don't
>>> acquire the lock, so you don't enter the critical section. Rather
>>> than allowing inconsistency, you become unavailable (at least to
>>> writes that require a lock).
>>> 
>>> -Bryce
>> 



Re: java.lang.IllegalArgumentException occurred when creating a keyspcace with replication factor

2012-01-06 Thread R. Verlangen
Try this:

create keyspace testkeyspace;
update keyspace testkeyspace with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options =
{replication_factor:3};

Good luck!

2012/1/6 Sajith Kariyawasam 

> Hi all,
>
> I tried creating a keyspace with the replication factor 3, using cli
> interface ... in Cassandra 1.0.6  (earlier tried in 0.8.2 and failed too)
>
> But I'm getting an exception
>
> "java.lang.IllegalArgumentException: No enum const class
> org.apache.cassandra.cli.CliClient$AddKeyspaceArgument.REPLICATION_FACTOR"
>
> The command I used was
>
> [default@unknown] create keyspace testkeyspace with replication_factor=3;
>
> What has gone wrong  ?
>
> Many thanks in advance
> --
> Best Regards
> Sajith
>
>


Re: How to find out when a nodetool operation has ended?

2012-01-06 Thread R. Verlangen
You might consider:
- installing DataStax OpsCenter ( http://www.datastax.com/products/opscenter
 )
- starting the repair in a linux screen (so you can attach to the screen
from another location)

I prefer the OpsCener.

2012/1/6 Maxim Potekhin 

> Suppose I start a repair on one or a few nodes in my cluster,
> from an interactive machine in the office, and leave for the day
> (which is a very realistic scenario imho).
>
> Is there a way to know, from a remote machine, when a particular
> action, such as compaction or repair, has been finished?
>
> I figured that compaction stats can be mum at times, thus
> it's not a reliable indicator.
>
> Many thanks,
>
> Maxim
>
>


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Narendra Sharma
>>>It's very surprising that no one seems to have solved such a common use
case.
I would say people have solved it using RIGHT tools for the task.



On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian  wrote:

> Thanks everyone for the replies. Seems like there is no easy way to handle
> this. It's very surprising that no one seems to have solved such a common
> use case.
>
> -- Drew
>
> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
>
> > That's a good question, and I'm not sure - I'm fairly new to both ZK
> > and Cassandra. I found this wiki page:
> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
> > and I think the lock recipe still works, even if a stale read happens.
> > Assuming that wiki page is correct.
> >
> > There is still subtlety to locking with ZK though, see (Locks based
> > on ephemeral nodes) from the zk mailing list in October:
> >
> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
> >
> > -Bryce
> >
> > On Fri, 6 Jan 2012 13:36:52 -0800
> > Drew Kutcharian  wrote:
> >> Bryce,
> >>
> >> I'm not sure about ZooKeeper, but I know if you have a partition
> >> between HazelCast nodes, than the nodes can acquire the same lock
> >> independently in each divided partition. How does ZooKeeper handle
> >> this situation?
> >>
> >> -- Drew
> >>
> >>
> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
> >>
> >>> On Fri, 6 Jan 2012 10:03:38 -0800
> >>> Drew Kutcharian  wrote:
>  I know that this can be done using a lock manager such as ZooKeeper
>  or HazelCast, but the issue with using either of them is that if
>  ZooKeeper or HazelCast is down, then you can't be sure about the
>  reliability of the lock. So this potentially, in the very rare
>  instance where the lock manager is down and two users are
>  registering with the same email, can cause major issues.
> >>>
> >>> For most applications, if the lock managers is down, you don't
> >>> acquire the lock, so you don't enter the critical section. Rather
> >>> than allowing inconsistency, you become unavailable (at least to
> >>> writes that require a lock).
> >>>
> >>> -Bryce
> >>
>
>


-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com *
*http://narendrasharma.blogspot.com/*


Re: How to find out when a nodetool operation has ended?

2012-01-06 Thread Maxim Potekhin

Thanks, so I take it there is no solution outside of Opcenter.

I mean of course I can redirect the output, with additional timestamps 
if needed,
to a log file -- which I can access remotely. I just thought there would 
be some "status"

command by chance, to tell me what maintenance the node is doing. Too bad
there is not!

Maxim


On 1/6/2012 5:40 PM, R. Verlangen wrote:

You might consider:
- installing DataStax OpsCenter ( 
http://www.datastax.com/products/opscenter )
- starting the repair in a linux screen (so you can attach to the 
screen from another location)






Re: What is the future of supercolumns ?

2012-01-06 Thread Aklin_81
Any comments please ?

On Thu, Jan 5, 2012 at 11:07 AM, Aklin_81  wrote:
> I have seen supercolumns usage been discouraged most of the times.
> However sometimes the supercolumns seem to fit the scenario most
> appropriately not only in terms of how the data is stored but also in
> terms of how is it retrieved. Some of the queries supported by SCs are
> uniquely capable of doing the task which no other alternative schema
> could do.(Like recently I asked about getting the equivalent of
> retrieving a list of (full)supercolumns by name, through use of
> composite columns, unfortunately there was no way to do this without
> reading lots of extra columns).
>
> So I am really confused whether:
>
> 1. Should I really not use the supercolumns for any case at all,
> however appropriate, or I just need to be just careful while realizing
> that supercolumns fit my use case appropriately or what!?
>
> 2. Are there any performance concerns with supercolumns even in the
> cases where they are used most appropriately. Like when you need to
> retrieve the entire supercolumns everytime & max. no of subcolumns
> vary between 0-10.
> (I don't write all the subcolumns inside supercolumn, at once though!
> Does this also matter?)
>
> 3. What is their future? Are they going to be deprecated or may be
> enhanced later?


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Drew Kutcharian
So what are the common RIGHT solutions/tools for this?


On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:

> >>>It's very surprising that no one seems to have solved such a common use 
> >>>case.
> I would say people have solved it using RIGHT tools for the task.
> 
> 
> 
> On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian  wrote:
> Thanks everyone for the replies. Seems like there is no easy way to handle 
> this. It's very surprising that no one seems to have solved such a common use 
> case.
> 
> -- Drew
> 
> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
> 
> > That's a good question, and I'm not sure - I'm fairly new to both ZK
> > and Cassandra. I found this wiki page:
> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
> > and I think the lock recipe still works, even if a stale read happens.
> > Assuming that wiki page is correct.
> >
> > There is still subtlety to locking with ZK though, see (Locks based
> > on ephemeral nodes) from the zk mailing list in October:
> > http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
> >
> > -Bryce
> >
> > On Fri, 6 Jan 2012 13:36:52 -0800
> > Drew Kutcharian  wrote:
> >> Bryce,
> >>
> >> I'm not sure about ZooKeeper, but I know if you have a partition
> >> between HazelCast nodes, than the nodes can acquire the same lock
> >> independently in each divided partition. How does ZooKeeper handle
> >> this situation?
> >>
> >> -- Drew
> >>
> >>
> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
> >>
> >>> On Fri, 6 Jan 2012 10:03:38 -0800
> >>> Drew Kutcharian  wrote:
>  I know that this can be done using a lock manager such as ZooKeeper
>  or HazelCast, but the issue with using either of them is that if
>  ZooKeeper or HazelCast is down, then you can't be sure about the
>  reliability of the lock. So this potentially, in the very rare
>  instance where the lock manager is down and two users are
>  registering with the same email, can cause major issues.
> >>>
> >>> For most applications, if the lock managers is down, you don't
> >>> acquire the lock, so you don't enter the critical section. Rather
> >>> than allowing inconsistency, you become unavailable (at least to
> >>> writes that require a lock).
> >>>
> >>> -Bryce
> >>
> 
> 
> 
> 
> -- 
> Narendra Sharma
> Software Engineer
> http://www.aeris.com
> http://narendrasharma.blogspot.com/
> 
> 



Re: OutOfMemory Errors with Cassandra 1.0.5

2012-01-06 Thread Caleb Rackliffe
I saw this article - http://comments.gmane.org/gmane.comp.db.cassandra.user/2225

I'm using the Hector client (for connection pooling), with ~3200 threads active 
according to JConsole.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Date: Fri, 6 Jan 2012 15:40:26 -0500
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: OutOfMemory Errors with Cassandra 1.0.5

One other item…

java –version

java version "1.7.0_01"
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Fri, 6 Jan 2012 15:28:30 -0500
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: OutOfMemory Errors with Cassandra 1.0.5

Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:6880,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:691)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:943)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1325)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(ParallelCompactionIterable.java:190)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced(ParallelCompactionIterable.java:164)
at 
org.apache.cassandra.db.compaction.ParallelCompactionIte

Re: OutOfMemory Errors with Cassandra 1.0.5 (fixed)

2012-01-06 Thread Caleb Rackliffe
Okay, it looks like I was slightly underestimating the number of connections 
open on the cluster.  This probably won't be a problem after I tighten up the 
Hector pool maximums.

Sorry for the spam…

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Fri, 6 Jan 2012 20:13:37 -0500
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: OutOfMemory Errors with Cassandra 1.0.5

I saw this article - http://comments.gmane.org/gmane.comp.db.cassandra.user/2225

I'm using the Hector client (for connection pooling), with ~3200 threads active 
according to JConsole.

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com

From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Date: Fri, 6 Jan 2012 15:40:26 -0500
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: OutOfMemory Errors with Cassandra 1.0.5

One other item…

java –version

java version "1.7.0_01"
Java(TM) SE Runtime Environment (build 1.7.0_01-b08)
Java HotSpot(TM) 64-Bit Server VM (build 21.1-b02, mixed mode)

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com


From: Caleb Rackliffe mailto:ca...@steelhouse.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Fri, 6 Jan 2012 15:28:30 -0500
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: OutOfMemory Errors with Cassandra 1.0.5

Hi Everybody,

I have a 10-node cluster running 1.0.5.  The hardware/configuration for each 
box looks like this:

Hardware: 4 GB RAM, 400 GB SATAII HD for commitlog, 50 GB SATAIII SSD for data 
directory, 1 GB SSD swap partition
OS: CentOS 6, vm.swapiness = 0
Cassandra: disk access mode = standard, max memtable size = 128 MB, max new 
heap = 800 MB, max heap = 2 GB, stack size = 128k

I explicitly didn't put JNA on the classpath because I had a hard time figuring 
out how much native memory it would actually need.

After a node runs for a couple of days, my swap partition is almost completely 
full, and even though the resident size of my Java process is right under 3 GB, 
I get this sequence in the logs, with death coming on a failure to allocate 
another thread…

 WARN [pool-1-thread-1] 2012-01-05 09:06:38,078 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 65.58206914005034
 WARN [pool-1-thread-1] 2012-01-05 09:08:14,405 Memtable.java (line 174) 
setting live ratio to maximum of 64 instead of 1379.0945945945946
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,593 GCInspector.java (line 146) 
Heap is 0.7523060581548427 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-05 09:08:31,611 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [pool-1-thread-1] 2012-01-05 13:45:29,934 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.004297106677189052
 WARN [pool-1-thread-1] 2012-01-06 02:23:18,175 Memtable.java (line 169) 
setting live ratio to minimum of 1.0 instead of 0.0018187309961539236
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,202 GCInspector.java (line 146) 
Heap is 0.7635993298476305 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 06:10:05,203 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,588 GCInspector.java (line 146) 
Heap is 0.7617639564886326 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-01-06 14:59:49,612 StorageService.java (line 
2535) Flushing CFS(Keyspace='Users', ColumnFamily='CounterCF') to relieve 
memory pressure
ERROR [CompactionExecutor:6880] 2012-01-06 19:45:49,336 
AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
Thread[CompactionExecutor:6880,1,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
   

How does Cassandra decide when to do a minor compaction?

2012-01-06 Thread Maxim Potekhin

The subject says it all -- pointers appreciated.

Thanks

Maxim



Re: What is the future of supercolumns ?

2012-01-06 Thread Terje Marthinussen
Please realize that I do not make any decisions here and I am not part of the 
core Cassandra developer team.

What has been said before is that they will most likely go away and at least 
under the hood be replaced by composite columns.

Jonathan have however stated that he would like the supercolumn API/abstraction 
to remain at least for backwards compatibility.

Please understand that under the hood, supercolumns are merely groups of 
columns serialized as a single block of data. 


The fact that there is a specialized and hardcoded way to serialize these 
column groups into supercolumns is a problem however and they should probably 
go away to make space for a more generic implementation allowing more flexible 
data structures and less code specific for one special data structure.

Today there are tons of extra code to deal with the slight difference in 
serialization and features of supercolumns vs columns and hopefully most of 
that could go away if things got structured a bit different.

I also hope that we keep APIs to allow simple access to groups of key/value 
pairs to simplify application logic as working with just columns can add a lot 
of application code which should not be needed.

If you almost always need all or mostly all of the columns in a supercolumn, 
and you normally update all of them at the same time, they will most likely be 
faster than normal columns.

Processing wise, you will actually do a bit more work on 
serialization/deserialization of SC's but the I/O part will usually be better 
grouped/require less operations.

I think we did some benchmarks on some heavy use cases with ~30 small columns 
per SC some time back and I think we ended up with  SCs being 10-20% faster.


Terje

On Jan 5, 2012, at 2:37 PM, Aklin_81 wrote:

> I have seen supercolumns usage been discouraged most of the times.
> However sometimes the supercolumns seem to fit the scenario most
> appropriately not only in terms of how the data is stored but also in
> terms of how is it retrieved. Some of the queries supported by SCs are
> uniquely capable of doing the task which no other alternative schema
> could do.(Like recently I asked about getting the equivalent of
> retrieving a list of (full)supercolumns by name, through use of
> composite columns, unfortunately there was no way to do this without
> reading lots of extra columns).
> 
> So I am really confused whether:
> 
> 1. Should I really not use the supercolumns for any case at all,
> however appropriate, or I just need to be just careful while realizing
> that supercolumns fit my use case appropriately or what!?
> 
> 2. Are there any performance concerns with supercolumns even in the
> cases where they are used most appropriately. Like when you need to
> retrieve the entire supercolumns everytime & max. no of subcolumns
> vary between 0-10.
> (I don't write all the subcolumns inside supercolumn, at once though!
> Does this also matter?)
> 
> 3. What is their future? Are they going to be deprecated or may be
> enhanced later?



Re: What is the future of supercolumns ?

2012-01-06 Thread Aklin_81
I read entire columns inside the supercolumns at any time but as for
writing them, I write the columns at different times. I don't have the
need to update them except that die after their TTL period of 60 days.
But since they are going to be deprecated, I don't know if it would be
really advisable to use them right now.

I believe if it was possible to do wildchard querying for a list of
column names then the supercolumns use cases may be easily replaced by
normal columns. Could it practically possible, in future ?

On Sat, Jan 7, 2012 at 8:05 AM, Terje Marthinussen
 wrote:
> Please realize that I do not make any decisions here and I am not part of the 
> core Cassandra developer team.
>
> What has been said before is that they will most likely go away and at least 
> under the hood be replaced by composite columns.
>
> Jonathan have however stated that he would like the supercolumn 
> API/abstraction to remain at least for backwards compatibility.
>
> Please understand that under the hood, supercolumns are merely groups of 
> columns serialized as a single block of data.
>
>
> The fact that there is a specialized and hardcoded way to serialize these 
> column groups into supercolumns is a problem however and they should probably 
> go away to make space for a more generic implementation allowing more 
> flexible data structures and less code specific for one special data 
> structure.
>
> Today there are tons of extra code to deal with the slight difference in 
> serialization and features of supercolumns vs columns and hopefully most of 
> that could go away if things got structured a bit different.
>
> I also hope that we keep APIs to allow simple access to groups of key/value 
> pairs to simplify application logic as working with just columns can add a 
> lot of application code which should not be needed.
>
> If you almost always need all or mostly all of the columns in a supercolumn, 
> and you normally update all of them at the same time, they will most likely 
> be faster than normal columns.
>
> Processing wise, you will actually do a bit more work on 
> serialization/deserialization of SC's but the I/O part will usually be better 
> grouped/require less operations.
>
> I think we did some benchmarks on some heavy use cases with ~30 small columns 
> per SC some time back and I think we ended up with  SCs being 10-20% faster.
>
>
> Terje
>
> On Jan 5, 2012, at 2:37 PM, Aklin_81 wrote:
>
>> I have seen supercolumns usage been discouraged most of the times.
>> However sometimes the supercolumns seem to fit the scenario most
>> appropriately not only in terms of how the data is stored but also in
>> terms of how is it retrieved. Some of the queries supported by SCs are
>> uniquely capable of doing the task which no other alternative schema
>> could do.(Like recently I asked about getting the equivalent of
>> retrieving a list of (full)supercolumns by name, through use of
>> composite columns, unfortunately there was no way to do this without
>> reading lots of extra columns).
>>
>> So I am really confused whether:
>>
>> 1. Should I really not use the supercolumns for any case at all,
>> however appropriate, or I just need to be just careful while realizing
>> that supercolumns fit my use case appropriately or what!?
>>
>> 2. Are there any performance concerns with supercolumns even in the
>> cases where they are used most appropriately. Like when you need to
>> retrieve the entire supercolumns everytime & max. no of subcolumns
>> vary between 0-10.
>> (I don't write all the subcolumns inside supercolumn, at once though!
>> Does this also matter?)
>>
>> 3. What is their future? Are they going to be deprecated or may be
>> enhanced later?
>


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Narendra Sharma
Instead of trying to solve the generic problem of uniqueness, I would focus
on the specific problem.

For eg lets consider your usecase of user registration with email address
as key. You can do following:
1. Create CF (Users) where row key is UUID and has user info specific
columns.
2. Whenever user registers create a row in this CF with user status flag as
waiting for confirmation.
3. Send email to the user's email address with link that contains the UUID
(or encrypted UUID)
4. When user clicks on the link, use the UUID (or decrypted UUID) to lookup
user
5. If the user exists with given UUID and status as waiting for
confirmation then update the status  and create a entry in another CF
(EmailUUIDIndex) representing email address to UUID mapping.
6. For authentication you can lookup in the index to get UUID and proceed.
7. If a malicious user registers with someone else's email id then he will
never be able to confirm and will never have an entry in EmailUUIDIndex. As
a additional check if the entry for email id exists in EmailUUIDIndex then
the request for registration can be rejected right away.

Make sense?

-Naren

On Fri, Jan 6, 2012 at 4:00 PM, Drew Kutcharian  wrote:

> So what are the common RIGHT solutions/tools for this?
>
>
> On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:
>
> >>>It's very surprising that no one seems to have solved such a common use
> case.
> I would say people have solved it using RIGHT tools for the task.
>
>
>
> On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian  wrote:
>
>> Thanks everyone for the replies. Seems like there is no easy way to
>> handle this. It's very surprising that no one seems to have solved such a
>> common use case.
>>
>> -- Drew
>>
>> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
>>
>> > That's a good question, and I'm not sure - I'm fairly new to both ZK
>> > and Cassandra. I found this wiki page:
>> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
>> > and I think the lock recipe still works, even if a stale read happens.
>> > Assuming that wiki page is correct.
>> >
>> > There is still subtlety to locking with ZK though, see (Locks based
>> > on ephemeral nodes) from the zk mailing list in October:
>> >
>> http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
>> >
>> > -Bryce
>> >
>> > On Fri, 6 Jan 2012 13:36:52 -0800
>> > Drew Kutcharian  wrote:
>> >> Bryce,
>> >>
>> >> I'm not sure about ZooKeeper, but I know if you have a partition
>> >> between HazelCast nodes, than the nodes can acquire the same lock
>> >> independently in each divided partition. How does ZooKeeper handle
>> >> this situation?
>> >>
>> >> -- Drew
>> >>
>> >>
>> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>> >>
>> >>> On Fri, 6 Jan 2012 10:03:38 -0800
>> >>> Drew Kutcharian  wrote:
>>  I know that this can be done using a lock manager such as ZooKeeper
>>  or HazelCast, but the issue with using either of them is that if
>>  ZooKeeper or HazelCast is down, then you can't be sure about the
>>  reliability of the lock. So this potentially, in the very rare
>>  instance where the lock manager is down and two users are
>>  registering with the same email, can cause major issues.
>> >>>
>> >>> For most applications, if the lock managers is down, you don't
>> >>> acquire the lock, so you don't enter the critical section. Rather
>> >>> than allowing inconsistency, you become unavailable (at least to
>> >>> writes that require a lock).
>> >>>
>> >>> -Bryce
>> >>
>>
>>
>
>
> --
> Narendra Sharma
> Software Engineer
> *http://www.aeris.com *
> *http://narendrasharma.blogspot.com/*
>
>
>
>


-- 
Narendra Sharma
Software Engineer
*http://www.aeris.com *
*http://narendrasharma.blogspot.com/*


Re: How to reliably achieve unique constraints with Cassandra?

2012-01-06 Thread Drew Kutcharian
It makes great sense. You're a genius!!


On Jan 6, 2012, at 10:43 PM, Narendra Sharma wrote:

> Instead of trying to solve the generic problem of uniqueness, I would focus 
> on the specific problem. 
> 
> For eg lets consider your usecase of user registration with email address as 
> key. You can do following:
> 1. Create CF (Users) where row key is UUID and has user info specific columns.
> 2. Whenever user registers create a row in this CF with user status flag as 
> waiting for confirmation.
> 3. Send email to the user's email address with link that contains the UUID 
> (or encrypted UUID)
> 4. When user clicks on the link, use the UUID (or decrypted UUID) to lookup 
> user
> 5. If the user exists with given UUID and status as waiting for confirmation 
> then update the status  and create a entry in another CF (EmailUUIDIndex) 
> representing email address to UUID mapping.
> 6. For authentication you can lookup in the index to get UUID and proceed.
> 7. If a malicious user registers with someone else's email id then he will 
> never be able to confirm and will never have an entry in EmailUUIDIndex. As a 
> additional check if the entry for email id exists in EmailUUIDIndex then the 
> request for registration can be rejected right away.
> 
> Make sense?
> 
> -Naren
> 
> On Fri, Jan 6, 2012 at 4:00 PM, Drew Kutcharian  wrote:
> So what are the common RIGHT solutions/tools for this?
> 
> 
> On Jan 6, 2012, at 2:46 PM, Narendra Sharma wrote:
> 
>> >>>It's very surprising that no one seems to have solved such a common use 
>> >>>case.
>> I would say people have solved it using RIGHT tools for the task.
>> 
>> 
>> 
>> On Fri, Jan 6, 2012 at 2:35 PM, Drew Kutcharian  wrote:
>> Thanks everyone for the replies. Seems like there is no easy way to handle 
>> this. It's very surprising that no one seems to have solved such a common 
>> use case.
>> 
>> -- Drew
>> 
>> On Jan 6, 2012, at 2:11 PM, Bryce Allen wrote:
>> 
>> > That's a good question, and I'm not sure - I'm fairly new to both ZK
>> > and Cassandra. I found this wiki page:
>> > http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios
>> > and I think the lock recipe still works, even if a stale read happens.
>> > Assuming that wiki page is correct.
>> >
>> > There is still subtlety to locking with ZK though, see (Locks based
>> > on ephemeral nodes) from the zk mailing list in October:
>> > http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0
>> >
>> > -Bryce
>> >
>> > On Fri, 6 Jan 2012 13:36:52 -0800
>> > Drew Kutcharian  wrote:
>> >> Bryce,
>> >>
>> >> I'm not sure about ZooKeeper, but I know if you have a partition
>> >> between HazelCast nodes, than the nodes can acquire the same lock
>> >> independently in each divided partition. How does ZooKeeper handle
>> >> this situation?
>> >>
>> >> -- Drew
>> >>
>> >>
>> >> On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote:
>> >>
>> >>> On Fri, 6 Jan 2012 10:03:38 -0800
>> >>> Drew Kutcharian  wrote:
>>  I know that this can be done using a lock manager such as ZooKeeper
>>  or HazelCast, but the issue with using either of them is that if
>>  ZooKeeper or HazelCast is down, then you can't be sure about the
>>  reliability of the lock. So this potentially, in the very rare
>>  instance where the lock manager is down and two users are
>>  registering with the same email, can cause major issues.
>> >>>
>> >>> For most applications, if the lock managers is down, you don't
>> >>> acquire the lock, so you don't enter the critical section. Rather
>> >>> than allowing inconsistency, you become unavailable (at least to
>> >>> writes that require a lock).
>> >>>
>> >>> -Bryce
>> >>
>> 
>> 
>> 
>> 
>> -- 
>> Narendra Sharma
>> Software Engineer
>> http://www.aeris.com
>> http://narendrasharma.blogspot.com/
>> 
>> 
> 
> 
> 
> 
> -- 
> Narendra Sharma
> Software Engineer
> http://www.aeris.com
> http://narendrasharma.blogspot.com/
> 
>