Re: Keys for deleted rows visible in CLI

2011-12-14 Thread Radim Kolar

Dne 14.12.2011 1:15, Maxim Potekhin napsal(a):

Thanks. It could be hidden from a human operator, I suppose :)

I agree. Open JIRA for it.


Re: configurable bloom filters (like hbase)

2011-12-14 Thread Radim Kolar

Dne 11.11.2011 7:55, Radim Kolar napsal(a):
i have problem with large CF (about 200 billions entries per node). 
While i can configure index_interval to lower memory requirements, i 
still have to stick with huge bloom filters.


Ideal would be to have bloom filters configurable like in hbase. 
Cassandra standard is about 1.05% false possitive but in my case i 
would be fine even with 20% false positive rate. Data are not often 
read back. Most of them will be never read before they expire via TTL.
anybody other has problem that bloom filters are using too much memory 
in applications which do not needs to read written data often?


I am looking at bloom filters memory used and it would be ideal to have 
in cassandra-1.1 ability to shrink bloom filters to about 1/10 of their 
size. Is possible to code something like this: save bloom filters to 
disk as usual but during load, transform them into something smaller at 
cost increasing FP rate?


Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread Bart Swedrowski
Anyone?

On 12 December 2011 15:25, Bart Swedrowski  wrote:

> Hello everyone,
>
> I seem to have came across rather weird (at least for me!) problem /
> behaviour with Cassandra.
>
> I am running a 4-nodes cluster on Cassandra 0.8.7.  For the keyspace in
> question, I have RF=3, SimpleStrategy with multiple ColumnFamilies inside
> the KeySpace.  On of the ColumnFamilies however seems to have data
> distributed across only 3 out of 4 nodes.
>
> The data on the cluster beside the problematic ColumnFamily seems to be
> more or less equal and even.
>
> # nodetool -h localhost ring
> Address DC  RackStatus State   Load
>  OwnsToken
>
>  127605887595351923798765477786913079296
> 192.168.81.2datacenter1 rack1   Up Normal  7.27 GB
> 25.00%  0
> 192.168.81.3datacenter1 rack1   Up Normal  7.74 GB
> 25.00%  42535295865117307932921825928971026432
> 192.168.81.4datacenter1 rack1   Up Normal  7.38 GB
> 25.00%  85070591730234615865843651857942052864
> 192.168.81.5datacenter1 rack1   Up Normal  7.32 GB
> 25.00%  127605887595351923798765477786913079296
>
> Schema for the relevant bits of the keyspace is as follows:
>
> [default@A] show schema;
> create keyspace A
>   with placement_strategy = 'SimpleStrategy'
>   and strategy_options = [{replication_factor : 3}];
> [...]
>  create column family UserDetails
>   with column_type = 'Standard'
>   and comparator = 'IntegerType'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'BytesType'
>   and memtable_operations = 0.571875
>   and memtable_throughput = 122
>   and memtable_flush_after = 1440
>   and rows_cached = 0.0
>   and row_cache_save_period = 0
>   and keys_cached = 20.0
>   and key_cache_save_period = 14400
>   and read_repair_chance = 1.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider';
>
> And now the symptoms - output of 'nodetool -h localhost cfstats' on each
> node.  Please note the figures on node1.
>
> *node1*
> Column Family: UserDetails
> SSTable count: 0
> Space used (live): 0
> Space used (total): 0
> Number of Keys (estimate): 0
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 0
> Read Latency: NaN ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 0
> Key cache hit rate: NaN
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> *node2*
> Column Family: UserDetails
> SSTable count: 3
> Space used (live): 112952788
> Space used (total): 164953743
> Number of Keys (estimate): 384
> Memtable Columns Count: 159419
> Memtable Data Size: 74910890
> Memtable Switch Count: 59
> Read Count: 135307426
> Read Latency: 25.900 ms.
> Write Count: 3474673
> Write Latency: 0.040 ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 120
> Key cache hit rate: 0.71684189041
> Row cache: disabled
> Compacted row minimum size: 42511
> Compacted row maximum size: 74975550
> Compacted row mean size: 42364305
>
> *node3*
> Column Family: UserDetails
> SSTable count: 3
> Space used (live): 112953137
> Space used (total): 112953137
> Number of Keys (estimate): 384
> Memtable Columns Count: 159421
> Memtable Data Size: 74693445
> Memtable Switch Count: 56
> Read Count: 135304486
> Read Latency: 25.552 ms.
> Write Count: 3474616
> Write Latency: 0.036 ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 109
> Key cache hit rate: 0.716840888175
> Row cache: disabled
> Compacted row minimum size: 42511
> Compacted row maximum size: 74975550
> Compacted row mean size: 42364305
>
> *node4*
> Column Family: UserDetails
> SSTable count: 3
> Space used (live): 117070926
> Space used (total): 119479484
> Number of Keys (estimate): 384
> Memtable Columns Count: 159979
> Memtable Data Size: 75029672
> Memtable Switch Count: 60
> Read Count: 135294878
> Read Latency: 19.455 ms.
> Write Count: 3474982
> Write Latency: 0.028 ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 119
> Key cache hit rate: 0.752235777154
> Row cache: disabled
> Compacted row minimum size: 2346800
> Compacted row maximum size: 62479625
> Compacted row mean size: 42591803
>
> When I go to 'data' directory on node1 there is no files regarding the
> UserDetails ColumnFamily.
>
> I tried performing manual repair in hope it will heal the situation,
> however without any luck.
>
> # nodetool -h localhost repair A UserDetails
>  INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges.
>  INFO 15:19:54,647 Sending AEService tree for # manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2,
> (A,UserDetails),
> (85070591730234615865843651857942052864,127605887595351923798765477786913079296]>
>  INFO 15:19:54,742 Endpo

Counters and Top 10

2011-12-14 Thread cbert...@libero.it
Hi all,
I'm using Cassandra in production for a small social network (~10.000 people).
Now I have to assign some "credits" to each user operation (login, write post 
and so on) and then beeing capable of providing in each moment the top 10 of 
the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new 
version in order to use Counters for the user points but ... what about the top 
10?
I was thinking about a specific ROW that always keeps the 10 most active users 
... but I think it would be heavy (to write and to handle in thread-safe mode) 
... can counters provide something like a "value ordered list"?

Thanks for any help. 
Best regards,

Carlo




Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread igor
Do you use randompartitiner? What nodetool getendpoints show for several random 
keys?
 

-Original Message-
From: Bart Swedrowski 
To: user@cassandra.apache.org
Sent: Wed, 14 Dec 2011 12:56
Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes

Anyone?

On 12 December 2011 15:25, Bart Swedrowski  wrote:

> Hello everyone,
>
> I seem to have came across rather weird (at least for me!) problem /
> behaviour with Cassandra.
>
> I am running a 4-nodes cluster on Cassandra 0.8.7.  For the keyspace in
> question, I have RF=3, SimpleStrategy with multiple ColumnFamilies inside
> the KeySpace.  On of the ColumnFamilies however seems to have data
> distributed across only 3 out of 4 nodes.
>
> The data on the cluster beside the problematic ColumnFamily seems to be
> more or less equal and even.
>
> # nodetool -h localhost ring
> Address DC  RackStatus State   Load
>  OwnsToken
>
>  127605887595351923798765477786913079296
> 192.168.81.2datacenter1 rack1   Up Normal  7.27 GB
> 25.00%  0
> 192.168.81.3datacenter1 rack1   Up Normal  7.74 GB
> 25.00%  42535295865117307932921825928971026432
> 192.168.81.4datacenter1 rack1   Up Normal  7.38 GB
> 25.00%  85070591730234615865843651857942052864
> 192.168.81.5datacenter1 rack1   Up Normal  7.32 GB
> 25.00%  127605887595351923798765477786913079296
>
> Schema for the relevant bits of the keyspace is as follows:
>
> [default@A] show schema;
> create keyspace A
>   with placement_strategy = 'SimpleStrategy'
>   and strategy_options = [{replication_factor : 3}];
> [...]
>  create column family UserDetails
>   with column_type = 'Standard'
>   and comparator = 'IntegerType'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'BytesType'
>   and memtable_operations = 0.571875
>   and memtable_throughput = 122
>   and memtable_flush_after = 1440
>   and rows_cached = 0.0
>   and row_cache_save_period = 0
>   and keys_cached = 20.0
>   and key_cache_save_period = 14400
>   and read_repair_chance = 1.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and row_cache_provider = 'ConcurrentLinkedHashCacheProvider';
>
> And now the symptoms - output of 'nodetool -h localhost cfstats' on each
> node.  Please note the figures on node1.
>
> *node1*
> Column Family: UserDetails
> SSTable count: 0
> Space used (live): 0
> Space used (total): 0
> Number of Keys (estimate): 0
> Memtable Columns Count: 0
> Memtable Data Size: 0
> Memtable Switch Count: 0
> Read Count: 0
> Read Latency: NaN ms.
> Write Count: 0
> Write Latency: NaN ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 0
> Key cache hit rate: NaN
> Row cache: disabled
> Compacted row minimum size: 0
> Compacted row maximum size: 0
> Compacted row mean size: 0
>
> *node2*
> Column Family: UserDetails
> SSTable count: 3
> Space used (live): 112952788
> Space used (total): 164953743
> Number of Keys (estimate): 384
> Memtable Columns Count: 159419
> Memtable Data Size: 74910890
> Memtable Switch Count: 59
> Read Count: 135307426
> Read Latency: 25.900 ms.
> Write Count: 3474673
> Write Latency: 0.040 ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 120
> Key cache hit rate: 0.71684189041
> Row cache: disabled
> Compacted row minimum size: 42511
> Compacted row maximum size: 74975550
> Compacted row mean size: 42364305
>
> *node3*
> Column Family: UserDetails
> SSTable count: 3
> Space used (live): 112953137
> Space used (total): 112953137
> Number of Keys (estimate): 384
> Memtable Columns Count: 159421
> Memtable Data Size: 74693445
> Memtable Switch Count: 56
> Read Count: 135304486
> Read Latency: 25.552 ms.
> Write Count: 3474616
> Write Latency: 0.036 ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 109
> Key cache hit rate: 0.716840888175
> Row cache: disabled
> Compacted row minimum size: 42511
> Compacted row maximum size: 74975550
> Compacted row mean size: 42364305
>
> *node4*
> Column Family: UserDetails
> SSTable count: 3
> Space used (live): 117070926
> Space used (total): 119479484
> Number of Keys (estimate): 384
> Memtable Columns Count: 159979
> Memtable Data Size: 75029672
> Memtable Switch Count: 60
> Read Count: 135294878
> Read Latency: 19.455 ms.
> Write Count: 3474982
> Write Latency: 0.028 ms.
> Pending Tasks: 0
> Key cache capacity: 20
> Key cache size: 119
> Key cache hit rate: 0.752235777154
> Row cache: disabled
> Compacted row minimum size: 2346800
> Compacted row maximum size: 62479625
> Compacted row mean size: 42591803
>
> When I go to 'data' directory on node1 there is no files regarding the
> UserDetails ColumnFamily.
>
> I tried performing manual repair in hope it will heal the situation,
> however without any luck.
>
> # nodetool -h localhost repair A UserDetails
>  INFO 15:19:54,611 Starting repair command #8,

Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread Bart Swedrowski
On 14 December 2011 13:02,  wrote:

> Do you use randompartitiner? What nodetool getendpoints show for several
> random keys
>
Yes, randompartitioner it is.

Thanks for hint re 'nodetool getendpoints'.  I have queried few, and to my
surprise, 192.168.82.2 (node1) is showing up as a endpoint for few of them:

bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4547246
192.168.81.3
192.168.81.4
192.168.81.5
bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549279
192.168.81.5
192.168.81.2
192.168.81.3
bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549749
192.168.81.2
192.168.81.3
192.168.81.4
bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4545027
192.168.81.5
192.168.81.2
192.168.81.3

Any idea why the hell those are not stored on node1, though?

bart@node1:~$ nodetool -h localhost cfstats
[…]
Column Family: UserDetails
SSTable count: 0
 Space used (live): 0
Space used (total): 0
Number of Keys (estimate): 0
 Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
 Read Count: 0
Read Latency: NaN ms.
Write Count: 0
 Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 20
 Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
 Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0


Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread Bart Swedrowski
On 14 December 2011 14:45, Bart Swedrowski  wrote:

> I have queried few, and to my surprise, 192.168.82.2 (node1)


The IP is supposed to be 192.168.81.2


Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread igor
No idea, try to check logs for errors, and increase verbosity level on that 
node.

 

-Original Message-
From: Bart Swedrowski 
To: user@cassandra.apache.org
Sent: Wed, 14 Dec 2011 16:45
Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes

On 14 December 2011 13:02,  wrote:

> Do you use randompartitiner? What nodetool getendpoints show for several
> random keys
>
Yes, randompartitioner it is.

Thanks for hint re 'nodetool getendpoints'.  I have queried few, and to my
surprise, 192.168.82.2 (node1) is showing up as a endpoint for few of them:

bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4547246
192.168.81.3
192.168.81.4
192.168.81.5
bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549279
192.168.81.5
192.168.81.2
192.168.81.3
bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549749
192.168.81.2
192.168.81.3
192.168.81.4
bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4545027
192.168.81.5
192.168.81.2
192.168.81.3

Any idea why the hell those are not stored on node1, though?

bart@node1:~$ nodetool -h localhost cfstats
[…]
Column Family: UserDetails
SSTable count: 0
 Space used (live): 0
Space used (total): 0
Number of Keys (estimate): 0
 Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
 Read Count: 0
Read Latency: NaN ms.
Write Count: 0
 Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 20
 Key cache size: 0
Key cache hit rate: NaN
Row cache: disabled
 Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0


Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread Bart Swedrowski
On 14 December 2011 14:58,  wrote:

> No idea, try to check logs for errors, and increase verbosity level on
> that node.
>
No errors at all, few warnings about HEAP size, that's it.

Okay, thanks.

Anyone else have got any ideas on how to push this forward?


Cassandra C client implementation

2011-12-14 Thread Vlad Paiu
Hello,

I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
an application written entirelly in C, so C++ is not an option.

Is there any C client library for cassandra ?

 I have also tried to generate thrift glibc code for Cassandra, but on 
wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. 

Can anybody suggest a C client library for Cassandra or provide some working 
examples for Thrift in C ?

Thanks and Regards,
Vlad

Re: Cassandra C client implementation

2011-12-14 Thread i
Try libcassandra, but it doesn't support connection pooling

--Original Message--
From: Vlad Paiu
To: user@cassandra.apache.org
ReplyTo: user@cassandra.apache.org
Subject: Cassandra C client implementation
Sent: Dec 14, 2011 11:11 PM

Hello,

I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
an application written entirelly in C, so C++ is not an option.

Is there any C client library for cassandra ?

 I have also tried to generate thrift glibc code for Cassandra, but on 
wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. 

Can anybody suggest a C client library for Cassandra or provide some working 
examples for Thrift in C ?

Thanks and Regards,
Vlad
Best Regards,
Yi "Steve" Yang
~~~
+1-401-441-5086
+86-13910771510

Sent via BlackBerry?0?3 from China Mobile

Re: Cassandra C client implementation

2011-12-14 Thread i
BTW please use 
https://github.com/eyealike/libcassandra


Best Regards,
Yi "Steve" Yang
~~~
+1-401-441-5086
+86-13910771510

Sent via BlackBerry® from China Mobile

-Original Message-
From: i...@iyyang.com
Date: Wed, 14 Dec 2011 15:52:56 
To: 
Reply-To: i...@iyyang.com
Subject: Re: Cassandra C client implementation

Try libcassandra, but it doesn't support connection pooling

--Original Message--
From: Vlad Paiu
To: user@cassandra.apache.org
ReplyTo: user@cassandra.apache.org
Subject: Cassandra C client implementation
Sent: Dec 14, 2011 11:11 PM

Hello,

I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
an application written entirelly in C, so C++ is not an option.

Is there any C client library for cassandra ?

 I have also tried to generate thrift glibc code for Cassandra, but on 
wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. 

Can anybody suggest a C client library for Cassandra or provide some working 
examples for Thrift in C ?

Thanks and Regards,
Vlad
Best Regards,
Yi "Steve" Yang
~~~
+1-401-441-5086
+86-13910771510

Sent via BlackBerry® from China Mobile

Counters != Counts

2011-12-14 Thread Alain RODRIGUEZ
Hi everybody.

I'm using a lot of counters to make statistics on a 4 nodes cluster (ec2
m1.small) with phpcassa (cassandra v1.0.2).

I store some events and increment counters at the same time.

Counters give me over-counts compared with the count of every corresponding
events.

I sure that my non-counters counts are good.

I'm not sure why these over-counts happen, but I heard that recovering from
commitlogs can produce this.
I have some timeouts on phpcassa which are written in my apache logs while
a compaction is running. However I am always able to write at Quorum, so I
guess I shouldn't have to recover from cassandra commitlogs.

Where can these over-counts come from ?

Alain


Re: Keys for deleted rows visible in CLI

2011-12-14 Thread Brandon Williams
http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Wed, Dec 14, 2011 at 4:36 AM, Radim Kolar  wrote:
> Dne 14.12.2011 1:15, Maxim Potekhin napsal(a):
>
>> Thanks. It could be hidden from a human operator, I suppose :)
>
> I agree. Open JIRA for it.


Re: configurable bloom filters (like hbase)

2011-12-14 Thread Brandon Williams
https://issues.apache.org/jira/browse/CASSANDRA-3497

On Wed, Dec 14, 2011 at 4:52 AM, Radim Kolar  wrote:
> Dne 11.11.2011 7:55, Radim Kolar napsal(a):
>
>> i have problem with large CF (about 200 billions entries per node). While
>> i can configure index_interval to lower memory requirements, i still have to
>> stick with huge bloom filters.
>>
>> Ideal would be to have bloom filters configurable like in hbase. Cassandra
>> standard is about 1.05% false possitive but in my case i would be fine even
>> with 20% false positive rate. Data are not often read back. Most of them
>> will be never read before they expire via TTL.
>
> anybody other has problem that bloom filters are using too much memory in
> applications which do not needs to read written data often?
>
> I am looking at bloom filters memory used and it would be ideal to have in
> cassandra-1.1 ability to shrink bloom filters to about 1/10 of their size.
> Is possible to code something like this: save bloom filters to disk as usual
> but during load, transform them into something smaller at cost increasing FP
> rate?


Re: Cassandra C client implementation

2011-12-14 Thread Vlad Paiu
Hello,

Thanks for your answer.
Unfortunately libcassandra is C++ , I'm looking for something written in ANSI C.

I've searched alot and my guess is glibc thrift is my only option, but I could 
not find even one example onto how to make a connection & some queries to 
Cassandra using glibc thrift.
Does anyone have experience/some examples for this ?

Regards,
Vlad


i...@iyyang.com wrote:

>BTW please use 
>https://github.com/eyealike/libcassandra
>
>
>Best Regards,
>Yi "Steve" Yang
>~~~
>+1-401-441-5086
>+86-13910771510
>
>Sent via BlackBerry® from China Mobile
>
>-Original Message-
>From: i...@iyyang.com
>Date: Wed, 14 Dec 2011 15:52:56 
>To: 
>Reply-To: i...@iyyang.com
>Subject: Re: Cassandra C client implementation
>
>Try libcassandra, but it doesn't support connection pooling
>
>--Original Message--
>From: Vlad Paiu
>To: user@cassandra.apache.org
>ReplyTo: user@cassandra.apache.org
>Subject: Cassandra C client implementation
>Sent: Dec 14, 2011 11:11 PM
>
>Hello,
>
>I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
>an application written entirelly in C, so C++ is not an option.
>
>Is there any C client library for cassandra ?
>
> I have also tried to generate thrift glibc code for Cassandra, but on 
> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. 
>
>Can anybody suggest a C client library for Cassandra or provide some working 
>examples for Thrift in C ?
>
>Thanks and Regards,
>Vlad
>Best Regards,
>Yi "Steve" Yang
>~~~
>+1-401-441-5086
>+86-13910771510
>
>Sent via BlackBerry® from China Mobile

Re: commit log size

2011-12-14 Thread Maxim Potekhin

Alexandru, Jeremiah --

what setting needs to be tweaked, and what's the recommended value?

I observed similar behavior this morning.

Maxim


On 11/28/2011 2:53 PM, Jeremiah Jordan wrote:
Yes, the low volume memtables are causing the problem.  Lower the 
thresholds for those tables if you don't want the commit logs to go 
crazy.


-Jeremiah

On 11/28/2011 11:11 AM, Alexandru Dan Sicoe wrote:

Hello everyone,

4 node Cassandra 0.8.5 cluster with RF=2, replica placement strategy 
= SimpleStartegy, write consistency level = ANY, 
memtable_flush_after_mins =1440; memtable_operations_in_millions=0.1; 
memtable_throughput_in_mb = 40; max_compaction_threshold =32; 
min_compaction_threshold =4;


I have one keyspace with 1 CF for all the data and 3 other small CFs 
for metadata. I am using Datastax OpsCenter to monitor my cluster so 
there is another keyspace for monitoring.


Everything works ok, the only thing I've noticed is this morning the 
commitlog of one node was 52GB, one was 25 GB and the others were 
around 3 GB. I left everything untouched and looked a couple of hours 
later and the 52GB one is now about 3GB and the 25 GB one is now 29 
GB and the other two about the same as before.


Are my commit logs growing because of small memtables which don't get 
flushed because they don't reach the operations and throughput 
limits? Then why do only some nodes exhibit this behaviour?


It would be interesting to understand how to control the size of the 
commitlog also to know how to size my commitlog disks!


Thanks,
Alex




Re: Keys for deleted rows visible in CLI

2011-12-14 Thread Maxim Potekhin
Thanks, it makes perfect sense now. Well an option in cassandra could 
make it optional
as far as display it concerned, w/o performance hit -- of course this is 
all unimportant.


Thanks again

Maxim


On 12/14/2011 11:30 AM, Brandon Williams wrote:

http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Wed, Dec 14, 2011 at 4:36 AM, Radim Kolar  wrote:

Dne 14.12.2011 1:15, Maxim Potekhin napsal(a):


Thanks. It could be hidden from a human operator, I suppose :)

I agree. Open JIRA for it.




RE: Cassandra C client implementation

2011-12-14 Thread Don Smith
VIrgil apparently lets you access cassandra via a RESTful interface:

   http://code.google.com/a/apache-extras.org/p/virgil/ 

Depending on your performance needs and the maturity of virgil's code (I think 
it's alpha), that may work.

You could always fork a java process and pipe to it.

 Don

From: Vlad Paiu [vladp...@opensips.org]
Sent: Wednesday, December 14, 2011 8:33 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra C client implementation

Hello,

Thanks for your answer.
Unfortunately libcassandra is C++ , I'm looking for something written in ANSI C.

I've searched alot and my guess is glibc thrift is my only option, but I could 
not find even one example onto how to make a connection & some queries to 
Cassandra using glibc thrift.
Does anyone have experience/some examples for this ?

Regards,
Vlad


i...@iyyang.com wrote:

>BTW please use
>https://github.com/eyealike/libcassandra
>
>
>Best Regards,
>Yi "Steve" Yang
>~~~
>+1-401-441-5086
>+86-13910771510
>
>Sent via BlackBerry® from China Mobile
>
>-Original Message-
>From: i...@iyyang.com
>Date: Wed, 14 Dec 2011 15:52:56
>To: 
>Reply-To: i...@iyyang.com
>Subject: Re: Cassandra C client implementation
>
>Try libcassandra, but it doesn't support connection pooling
>
>--Original Message--
>From: Vlad Paiu
>To: user@cassandra.apache.org
>ReplyTo: user@cassandra.apache.org
>Subject: Cassandra C client implementation
>Sent: Dec 14, 2011 11:11 PM
>
>Hello,
>
>I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
>an application written entirelly in C, so C++ is not an option.
>
>Is there any C client library for cassandra ?
>
> I have also tried to generate thrift glibc code for Cassandra, but on 
> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C.
>
>Can anybody suggest a C client library for Cassandra or provide some working 
>examples for Thrift in C ?
>
>Thanks and Regards,
>Vlad
>Best Regards,
>Yi "Steve" Yang
>~~~
>+1-401-441-5086
>+86-13910771510
>
>Sent via BlackBerry® from China Mobile


Re: Cassandra C client implementation

2011-12-14 Thread Jeremiah Jordan

If you are OK linking to a C++ based library you can look at:
https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie
It is wrapper code around libcassandra which exports a C++ interface.
If you look at the function names etc in the other languages, just use 
the similar functions from the c_glib thrift...
If you are going to mess with using the c_glib thrift, make sure to 
check out the JIRA for it, it is new and has some issues...

https://issues.apache.org/jira/browse/THRIFT/component/12313854


On 12/14/2011 09:11 AM, Vlad Paiu wrote:

Hello,

I am trying to integrate some Cassandra related ops ( insert, get, etc ) into 
an application written entirelly in C, so C++ is not an option.

Is there any C client library for cassandra ?

  I have also tried to generate thrift glibc code for Cassandra, but on 
wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C.

Can anybody suggest a C client library for Cassandra or provide some working 
examples for Thrift in C ?

Thanks and Regards,
Vlad


[RELEASE] Apache Cassandra 1.0.6 released

2011-12-14 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.0.6.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. As always, please pay
attention to the release notes[2] and Let us know[3] if you were to encounter
any problem.

Have fun!

[1]: http://goo.gl/Pl1TE (CHANGES.txt)
[2]: http://goo.gl/9xHEC (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: One ColumnFamily places data on only 3 out of 4 nodes

2011-12-14 Thread Mohit Anchlia
> bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4545027
> 192.168.81.5
> 192.168.81.2
> 192.168.81.3

Can you see what happens if you stop C* say on node .5 and write and
read at quorum?

On Wed, Dec 14, 2011 at 7:06 AM, Bart Swedrowski  wrote:
>
>
> On 14 December 2011 14:58,  wrote:
>>
>> No idea, try to check logs for errors, and increase verbosity level on
>> that node.
>
> No errors at all, few warnings about HEAP size, that's it.
>
> Okay, thanks.
>
> Anyone else have got any ideas on how to push this forward?


Re: Cassandra C client implementation

2011-12-14 Thread Vlad Paiu
Hello,

Thanks very much for your suggestions.
Libcassie seems nice but doesn't seem like it's actively maintained and i'm not 
sure if it's compatible with latest Cassandra versions. Will give it a try 
though.

I was looking through the generated thrift .c files and I can't seem to find 
what function to call to init a connection to a Cassandra instance. Any ideas ?

Thanks and Regards,
Vlad

Jeremiah Jordan  wrote:

>If you are OK linking to a C++ based library you can look at:
>https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie
>It is wrapper code around libcassandra which exports a C++ interface.
>If you look at the function names etc in the other languages, just use 
>the similar functions from the c_glib thrift...
>If you are going to mess with using the c_glib thrift, make sure to 
>check out the JIRA for it, it is new and has some issues...
>https://issues.apache.org/jira/browse/THRIFT/component/12313854
>
>
>On 12/14/2011 09:11 AM, Vlad Paiu wrote:
>> Hello,
>>
>> I am trying to integrate some Cassandra related ops ( insert, get, etc ) 
>> into an application written entirelly in C, so C++ is not an option.
>>
>> Is there any C client library for cassandra ?
>>
>>   I have also tried to generate thrift glibc code for Cassandra, but on 
>> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C.
>>
>> Can anybody suggest a C client library for Cassandra or provide some working 
>> examples for Thrift in C ?
>>
>> Thanks and Regards,
>> Vlad


Re: 1.0.3 CLI oddities

2011-12-14 Thread Janne Jalkanen

Correct. 1.0.6 fixes this for me.

/Janne

On 12 Dec 2011, at 02:57, Chris Burroughs wrote:

> Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3558 and the
> other tickets reference there.
> 
> On 11/28/2011 05:05 AM, Janne Jalkanen wrote:
>> Hi!
>> 
>> (Asked this on IRC too, but didn't get anyone to respond, so here goes...)
>> 
>> Is it just me, or are these real bugs? 
>> 
>> On 1.0.3, from CLI: "update column family XXX with gc_grace = 36000;" just 
>> says "null" with nothing logged.  Previous value is the default.
>> 
>> Also, on 1.0.3, "update column family XXX with 
>> compression_options={sstable_compression:SnappyCompressor,chunk_length_kb:64};"
>>  returns "Internal error processing system_update_column_family" and log 
>> says "Invalid negative or null chunk_length_kb" (stack trace below)
>> 
>> Setting the compression options worked on 1.0.0 when I was testing (though 
>> my 64 kB became 64 MB, but I believe this was fixed in 1.0.3.)
>> 
>> Did the syntax change between 1.0.0 and 1.0.3? Or am I doing something 
>> wrong? 
>> 
>> The database was upgraded from 0.6.13 to 1.0.0, then scrubbed, then 
>> compression options set to some CFs, then upgraded to 1.0.3 and trying to 
>> set compression on other CFs.
>> 
>> Stack trace:
>> 
>> ERROR [pool-2-thread-68] 2011-11-28 09:59:26,434 Cassandra.java (line 4038) 
>> Internal error processing system_update_column_family
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
>> java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
>> Invalid negative or null chunk_length_kb
>>  at 
>> org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:898)
>>  at 
>> org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1089)
>>  at 
>> org.apache.cassandra.thrift.Cassandra$Processor$system_update_column_family.process(Cassandra.java:4032)
>>  at 
>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
>>  at 
>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>  at java.lang.Thread.run(Thread.java:680)
>> Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
>> org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
>> chunk_length_kb
>>  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>  at 
>> org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:890)
>>  ... 7 more
>> Caused by: java.io.IOException: 
>> org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
>> chunk_length_kb
>>  at 
>> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
>>  at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
>>  at 
>> org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>  ... 3 more
>> Caused by: org.apache.cassandra.config.ConfigurationException: Invalid 
>> negative or null chunk_length_kb
>>  at 
>> org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167)
>>  at 
>> org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52)
>>  at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796)
>>  at 
>> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74)
>>  ... 7 more
>> ERROR [MigrationStage:1] 2011-11-28 09:59:26,434 
>> AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
>> Thread[MigrationStage:1,5,main]
>> java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
>> Invalid negative or null chunk_length_kb
>>  at 
>> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
>>  at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
>>  at 
>> org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
>>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>  at java.lang.Thread.run(Thread.java:680)
>> Caused by: org.apache.cassandra.config.ConfigurationException: Invalid

[RELEASE] Apache Cassandra 0.8.9 released

2011-12-14 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.8.9.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a maintenance/bug fix release[1]. Please pay attention to the
release notes[2] before upgrading and let us know[3] if you were to encounter
any problem.

Have fun!


[1]: http://goo.gl/Kx7d0 (CHANGES.txt)
[2]: http://goo.gl/Tv2NW (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Asymmetric load

2011-12-14 Thread Maxim Potekhin

What could be the reason I see unequal loads on a 3-node cluster?
This all started happening during repairs (which again are not going 
smoothly).


Maxim



Crazy compactionstats

2011-12-14 Thread Maxim Potekhin

Hello

I ran repair like this:

nohup repair.sh &

where repair.sh contains simply nodetool repair plus timestamp.

The process dies while dumping this:
Exception in thread "main" java.io.IOException: Repair command #1: some 
repair session(s) failed (see log for details).
at 
org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at 
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at 
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303)

at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)


I still see pending tasks in nodetool compactionstats, and their number 
goes into hundreds which I haven't seen before.

What's going on?

Thanks

Maxim



Re: Cassandra C client implementation

2011-12-14 Thread Vlad Paiu
Hi,

Just tried libcassie and seems it's not compatible with latest cassandra, as 
even simple inserts and fetches fail with InvalidRequestException...

So can anybody please provide a very simple example in C for connecting & 
fetching columns with thrift ?

Regards,
Vlad

Vlad Paiu  wrote:

>Hello,
>
>Thanks very much for your suggestions.
>Libcassie seems nice but doesn't seem like it's actively maintained and i'm 
>not sure if it's compatible with latest Cassandra versions. Will give it a try 
>though.
>
>I was looking through the generated thrift .c files and I can't seem to find 
>what function to call to init a connection to a Cassandra instance. Any ideas ?
>
>Thanks and Regards,
>Vlad
>
>Jeremiah Jordan  wrote:
>
>>If you are OK linking to a C++ based library you can look at:
>>https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie
>>It is wrapper code around libcassandra which exports a C++ interface.
>>If you look at the function names etc in the other languages, just use 
>>the similar functions from the c_glib thrift...
>>If you are going to mess with using the c_glib thrift, make sure to 
>>check out the JIRA for it, it is new and has some issues...
>>https://issues.apache.org/jira/browse/THRIFT/component/12313854
>>
>>
>>On 12/14/2011 09:11 AM, Vlad Paiu wrote:
>>> Hello,
>>>
>>> I am trying to integrate some Cassandra related ops ( insert, get, etc ) 
>>> into an application written entirelly in C, so C++ is not an option.
>>>
>>> Is there any C client library for cassandra ?
>>>
>>>   I have also tried to generate thrift glibc code for Cassandra, but on 
>>> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C.
>>>
>>> Can anybody suggest a C client library for Cassandra or provide some 
>>> working examples for Thrift in C ?
>>>
>>> Thanks and Regards,
>>> Vlad


Re: Cassandra C client implementation

2011-12-14 Thread Eric Tamme

On 12/14/2011 04:18 PM, Vlad Paiu wrote:

Hi,

Just tried libcassie and seems it's not compatible with latest cassandra, as 
even simple inserts and fetches fail with InvalidRequestException...

So can anybody please provide a very simple example in C for connecting&  
fetching columns with thrift ?

Regards,
Vlad

Vlad Paiu  wrote:



Vlad,

We have written a specific cassandra db module for usrloc with opensips 
and have open sourced it on github.  We use the thrift generated c++ 
bindings and extern stuff to c.  I spoke to bogdan about this a while 
ago, and gave him the github link, but here it is for your reference   
https://github.com/junction/db_jnctn_usrloc


Hopefully that helps.  I idle in #opensips too,  just ask about 
cassandra in there and I'll probably see it.


- Eric Tamme



Re: Cassandra C client implementation

2011-12-14 Thread Vlad Paiu
Hello Eric,

We have that, thanks alot for the contribution.
The idea is to not play around with including C++ code in a C app, if there's 
an alternative ( the thrift g_libc ).

Unfortunately, since thrift does not generate a skeleton for the glibc code, I 
don't know how to find out what the API functions are called, and guessing them 
is not going that good :)

I'll wait a little longer & see if anybody can help with the C thrift, or at 
least tell me it's not working. :)

Regards,
Vlad

Eric Tamme  wrote:

>On 12/14/2011 04:18 PM, Vlad Paiu wrote:
>> Hi,
>>
>> Just tried libcassie and seems it's not compatible with latest cassandra, as 
>> even simple inserts and fetches fail with InvalidRequestException...
>>
>> So can anybody please provide a very simple example in C for connecting&  
>> fetching columns with thrift ?
>>
>> Regards,
>> Vlad
>>
>> Vlad Paiu  wrote:
>>
>
>Vlad,
>
>We have written a specific cassandra db module for usrloc with opensips 
>and have open sourced it on github.  We use the thrift generated c++ 
>bindings and extern stuff to c.  I spoke to bogdan about this a while 
>ago, and gave him the github link, but here it is for your reference   
>https://github.com/junction/db_jnctn_usrloc
>
>Hopefully that helps.  I idle in #opensips too,  just ask about 
>cassandra in there and I'll probably see it.
>
>- Eric Tamme
>


Re: Crazy compactionstats

2011-12-14 Thread Peter Schuller
> Exception in thread "main" java.io.IOException: Repair command #1: some
> repair session(s) failed (see log for details).

For why repair failed you unfortunately need to log at logs as it suggests.

> I still see pending tasks in nodetool compactionstats, and their number goes
> into hundreds which I haven't seen before.
> What's going on?

The compactions "pending" is not very useful. It just says how many
tasks are pending that MIGHT compact. Typically it will either be 0,
or it will be steadily increasing while compactions are happening
until suddenly snapping back to 0 again once compactions catch up.

Whether or not non-zero is a problem depends on the Cassandra version,
how many concurrent compactors you are running, and your column
families/data sizes/flushing speeds etc. (Sorry, kind of a long story)

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Cassandra C client implementation

2011-12-14 Thread Mina Naguib

Hi Vlad

I'm the author of libcassie.

For what it's worth, it's in production where I work, consuming a heavily-used 
cassandra 0.7.9 cluster.

We do have plans to upgrade the cluster to 1.x, to benefit from all the 
improvements, CQL, etc... but that includes revising all our clients (across 
several programming languages).

So, it's definitely on my todo list to address our C clients by either 
upgrading libcassie, or possibly completely rewriting it.

Currently it's a wrapper around the C++ parent project libcassandra.  I haven't 
been fond of having that many layered abstractions, and the thrift Glib2 
interface has definitely piqued my interest, so I'm leaning towards a complete 
rewrite.

While we're at it, it would also be nice to have features like asynchronous 
modes for popular event loops, connection pooling, etc.

Unfortunately, I have no milestones set for any of this, nor the time 
(currently) to experiment and proof-of-concept it.

I'd be curious to hear from other C hackers whether they've experimented with 
the thrift Glib2 interface and gotten a "hello world" to work against cassandra 
1.x.  Perhaps there's room for some code sharing/collaboration on a new library 
to supersede the existing libcassie+libcassandra.


On 2011-12-14, at 5:16 PM, Vlad Paiu wrote:

> Hello Eric,
> 
> We have that, thanks alot for the contribution.
> The idea is to not play around with including C++ code in a C app, if there's 
> an alternative ( the thrift g_libc ).
> 
> Unfortunately, since thrift does not generate a skeleton for the glibc code, 
> I don't know how to find out what the API functions are called, and guessing 
> them is not going that good :)
> 
> I'll wait a little longer & see if anybody can help with the C thrift, or at 
> least tell me it's not working. :)
> 
> Regards,
> Vlad
> 
> Eric Tamme  wrote:
> 
>> On 12/14/2011 04:18 PM, Vlad Paiu wrote:
>>> Hi,
>>> 
>>> Just tried libcassie and seems it's not compatible with latest cassandra, 
>>> as even simple inserts and fetches fail with InvalidRequestException...
>>> 
>>> So can anybody please provide a very simple example in C for connecting&  
>>> fetching columns with thrift ?
>>> 
>>> Regards,
>>> Vlad
>>> 
>>> Vlad Paiu  wrote:
>>> 
>> 
>> Vlad,
>> 
>> We have written a specific cassandra db module for usrloc with opensips 
>> and have open sourced it on github.  We use the thrift generated c++ 
>> bindings and extern stuff to c.  I spoke to bogdan about this a while 
>> ago, and gave him the github link, but here it is for your reference   
>> https://github.com/junction/db_jnctn_usrloc
>> 
>> Hopefully that helps.  I idle in #opensips too,  just ask about 
>> cassandra in there and I'll probably see it.
>> 
>> - Eric Tamme
>> 



tmp files in /var/lib/cassandra/data

2011-12-14 Thread Ramesh Natarajan
We are using leveled compaction running cassandra 1.0.6.  I checked
the data directory (/var/lib/cassandra/data) and i see  these 0 bytes
tmp files.
What are these files?

thanks
Ramesh

-rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Index.db
-rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Data.db
-rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Index.db
-rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Data.db
-rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-213-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-213-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-217-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-217-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-230-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-230-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-235-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-235-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-249-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-249-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-253-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-253-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:28 uid-tmp-hc-257-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:28 uid-tmp-hc-257-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:35 uid-tmp-hc-270-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:35 uid-tmp-hc-270-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:36 uid-tmp-hc-275-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:36 uid-tmp-hc-275-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-288-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-288-Index.db
-rw-r--r-- 1 root root0 Dec 14 16:10 uid-tmp-hc-28-Data.db
-rw-r--r-- 1 root root0 Dec 14 16:10 uid-tmp-hc-28-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-293-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-293-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-307-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-307-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-310-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-310-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-315-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-315-Index.db
-rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-328-Data.db
-rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-328-Index.db
-rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-333-Data.db
-rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-333-Index.db
-rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-347-Data.db
-rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-347-Index.db
-rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-353-Data.db
-rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-353-Index.db
-rw-r--r-- 1 root root0 Dec 14 19:09 uid-tmp-hc-357-Data.db
-rw-r--r-- 1 root root0 Dec 14 19:09 uid-tmp-hc-357-Index.db
-rw-r--r-- 1 root root0 Dec 14 19:17 uid-tmp-hc-370-Data.db
-rw-r--r-- 1 root root0 Dec 14 19:17 uid-tmp-hc-370-Index.db
-rw-r--r-- 1 root root 

RE: tmp files in /var/lib/cassandra/data

2011-12-14 Thread Bryce Godfrey
I'm seeing this also, and my nodes have started crashing with "too many open 
file errors".  Running lsof I see lots of these open tmp files.

java   8185  root  911u  REG   8,32 38  
129108266 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268721-CompressionInfo.db
java   8185  root  912u  REG   8,32  0  
155320741 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1092-Data.db
java   8185  root  913u  REG   8,32  0  
155320742 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Index.db
java   8185  root  914u  REG   8,32  0  
155320743 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Data.db
java   8185  root  916u  REG   8,32  0  
155320754 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1113-Data.db
java   8185  root  918u  REG   8,32  0  
155320744 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Index.db
java   8185  root  919u  REG   8,32  0  
155320745 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Data.db
java   8185  root  920u  REG   8,32  0  
155320755 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Index.db
java   8185  root  921u  REG   8,32  0  
129108272 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-Data.db
java   8185  root  922u  REG   8,32 38  
129108273 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-CompressionInfo.db
java   8185  root  923u  REG   8,32  0  
155320756 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Data.db
java   8185  root  929u  REG   8,32 38  
129108262 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268822-CompressionInfo.db
java   8185  root  947u  REG   8,32  0  
129108284 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-Data.db
java   8185  root  948u  REG   8,32 38  
129108285 
/opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-CompressionInfo.db
java   8185  root  954u  REG   8,32  0  
155320746 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Index.db
java   8185  root  955u  REG   8,32  0  
155320747 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Data.db

Going to try rolling back to 1.0.5 for the time being even though I was hoping 
to use one of the fixes in 1.0.6

-Original Message-
From: Ramesh Natarajan [mailto:rames...@gmail.com] 
Sent: Wednesday, December 14, 2011 6:03 PM
To: user@cassandra.apache.org
Subject: tmp files in /var/lib/cassandra/data

We are using leveled compaction running cassandra 1.0.6.  I checked the data 
directory (/var/lib/cassandra/data) and i see  these 0 bytes tmp files.
What are these files?

thanks
Ramesh

-rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Index.db
-rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Data.db
-rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Index.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Data.db
-rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Index.db
-rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Data.db
-rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Index.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Data.db
-rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Index.db
-r

Best way to implement indexing for high-cardinality values?

2011-12-14 Thread Maxim Potekhin

I now have a CF with extremely skinny rows (in the current implementation),
and the application will want to query by more than one column values.
Problem is that the values in a lot of cases will be high cardinality.
One other factor is that I want to rotate data in and our of the system
in one day buckets -- LILO in effect. The date will be on of the columns
as well.

I had 9 indexes in mind, but I think I can pare it down to 5. At least 
one of the
column I will need to query by, has values that are guaranteed to be 
unique --
there are effectively two ways to identify data for very different part 
of the

complete system. Indexing on that would be bad, wouldn't it?

Any advice would be appreciated.

Thanks

Maxim



Re: tmp files in /var/lib/cassandra/data

2011-12-14 Thread Ramesh Natarajan
yep, so far it looks like a file descriptor leak.  Not sure if gc or
some other event like compaction would close these files..

[root@CAP-VM-1 ~]# ls -al /proc/31134/fd   | grep MSA | wc -l
540
[root@CAP-VM-1 ~]# ls -al /proc/31134/fd   | grep MSA | wc -l
542
[root@CAP-VM-1 ~]# ls -al /proc/31134/fd   | grep MSA | wc -l
554
[root@CAP-VM-1 ~]# ls -al /proc/31134/fd   | grep MSA | wc -l
558



On Wed, Dec 14, 2011 at 8:28 PM, Bryce Godfrey
 wrote:
> I'm seeing this also, and my nodes have started crashing with "too many open 
> file errors".  Running lsof I see lots of these open tmp files.
>
> java       8185      root  911u      REG               8,32         38  
> 129108266 
> /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268721-CompressionInfo.db
> java       8185      root  912u      REG               8,32          0  
> 155320741 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1092-Data.db
> java       8185      root  913u      REG               8,32          0  
> 155320742 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Index.db
> java       8185      root  914u      REG               8,32          0  
> 155320743 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Data.db
> java       8185      root  916u      REG               8,32          0  
> 155320754 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1113-Data.db
> java       8185      root  918u      REG               8,32          0  
> 155320744 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Index.db
> java       8185      root  919u      REG               8,32          0  
> 155320745 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Data.db
> java       8185      root  920u      REG               8,32          0  
> 155320755 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Index.db
> java       8185      root  921u      REG               8,32          0  
> 129108272 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-Data.db
> java       8185      root  922u      REG               8,32         38  
> 129108273 
> /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-CompressionInfo.db
> java       8185      root  923u      REG               8,32          0  
> 155320756 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Data.db
> java       8185      root  929u      REG               8,32         38  
> 129108262 
> /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268822-CompressionInfo.db
> java       8185      root  947u      REG               8,32          0  
> 129108284 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-Data.db
> java       8185      root  948u      REG               8,32         38  
> 129108285 
> /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-CompressionInfo.db
> java       8185      root  954u      REG               8,32          0  
> 155320746 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Index.db
> java       8185      root  955u      REG               8,32          0  
> 155320747 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Data.db
>
> Going to try rolling back to 1.0.5 for the time being even though I was 
> hoping to use one of the fixes in 1.0.6
>
> -Original Message-
> From: Ramesh Natarajan [mailto:rames...@gmail.com]
> Sent: Wednesday, December 14, 2011 6:03 PM
> To: user@cassandra.apache.org
> Subject: tmp files in /var/lib/cassandra/data
>
> We are using leveled compaction running cassandra 1.0.6.  I checked the data 
> directory (/var/lib/cassandra/data) and i see  these 0 bytes tmp files.
> What are these files?
>
> thanks
> Ramesh
>
> -rw-r--r-- 1 root root        0 Dec 14 17:15 uid-tmp-hc-106-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:15 uid-tmp-hc-106-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:23 uid-tmp-hc-117-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:23 uid-tmp-hc-117-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 15:51 uid-tmp-hc-11-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 15:51 uid-tmp-hc-11-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:31 uid-tmp-hc-129-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:31 uid-tmp-hc-129-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:40 uid-tmp-hc-142-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:40 uid-tmp-hc-142-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:40 uid-tmp-hc-145-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:40 uid-tmp-hc-145-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:47 uid-tmp-hc-158-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:47 uid-tmp-hc-158-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:47 uid-tmp-hc-162-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:47 uid-tmp-hc-162-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:55 uid-tmp-hc-175-Data.db
> -rw-r--r-- 1 root root        0 Dec 14 17:55 uid-tmp-hc-175-Index.db
> -rw-r--r-- 1 root root        0 Dec 14 17:55 uid-tmp-hc-179-Data.db
> -rw-r--r-- 1 root roo