Re: Keys for deleted rows visible in CLI
Dne 14.12.2011 1:15, Maxim Potekhin napsal(a): Thanks. It could be hidden from a human operator, I suppose :) I agree. Open JIRA for it.
Re: configurable bloom filters (like hbase)
Dne 11.11.2011 7:55, Radim Kolar napsal(a): i have problem with large CF (about 200 billions entries per node). While i can configure index_interval to lower memory requirements, i still have to stick with huge bloom filters. Ideal would be to have bloom filters configurable like in hbase. Cassandra standard is about 1.05% false possitive but in my case i would be fine even with 20% false positive rate. Data are not often read back. Most of them will be never read before they expire via TTL. anybody other has problem that bloom filters are using too much memory in applications which do not needs to read written data often? I am looking at bloom filters memory used and it would be ideal to have in cassandra-1.1 ability to shrink bloom filters to about 1/10 of their size. Is possible to code something like this: save bloom filters to disk as usual but during load, transform them into something smaller at cost increasing FP rate?
Re: One ColumnFamily places data on only 3 out of 4 nodes
Anyone? On 12 December 2011 15:25, Bart Swedrowski wrote: > Hello everyone, > > I seem to have came across rather weird (at least for me!) problem / > behaviour with Cassandra. > > I am running a 4-nodes cluster on Cassandra 0.8.7. For the keyspace in > question, I have RF=3, SimpleStrategy with multiple ColumnFamilies inside > the KeySpace. On of the ColumnFamilies however seems to have data > distributed across only 3 out of 4 nodes. > > The data on the cluster beside the problematic ColumnFamily seems to be > more or less equal and even. > > # nodetool -h localhost ring > Address DC RackStatus State Load > OwnsToken > > 127605887595351923798765477786913079296 > 192.168.81.2datacenter1 rack1 Up Normal 7.27 GB > 25.00% 0 > 192.168.81.3datacenter1 rack1 Up Normal 7.74 GB > 25.00% 42535295865117307932921825928971026432 > 192.168.81.4datacenter1 rack1 Up Normal 7.38 GB > 25.00% 85070591730234615865843651857942052864 > 192.168.81.5datacenter1 rack1 Up Normal 7.32 GB > 25.00% 127605887595351923798765477786913079296 > > Schema for the relevant bits of the keyspace is as follows: > > [default@A] show schema; > create keyspace A > with placement_strategy = 'SimpleStrategy' > and strategy_options = [{replication_factor : 3}]; > [...] > create column family UserDetails > with column_type = 'Standard' > and comparator = 'IntegerType' > and default_validation_class = 'BytesType' > and key_validation_class = 'BytesType' > and memtable_operations = 0.571875 > and memtable_throughput = 122 > and memtable_flush_after = 1440 > and rows_cached = 0.0 > and row_cache_save_period = 0 > and keys_cached = 20.0 > and key_cache_save_period = 14400 > and read_repair_chance = 1.0 > and gc_grace = 864000 > and min_compaction_threshold = 4 > and max_compaction_threshold = 32 > and replicate_on_write = true > and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'; > > And now the symptoms - output of 'nodetool -h localhost cfstats' on each > node. Please note the figures on node1. > > *node1* > Column Family: UserDetails > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > Number of Keys (estimate): 0 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 0 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 0 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > *node2* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112952788 > Space used (total): 164953743 > Number of Keys (estimate): 384 > Memtable Columns Count: 159419 > Memtable Data Size: 74910890 > Memtable Switch Count: 59 > Read Count: 135307426 > Read Latency: 25.900 ms. > Write Count: 3474673 > Write Latency: 0.040 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 120 > Key cache hit rate: 0.71684189041 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node3* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112953137 > Space used (total): 112953137 > Number of Keys (estimate): 384 > Memtable Columns Count: 159421 > Memtable Data Size: 74693445 > Memtable Switch Count: 56 > Read Count: 135304486 > Read Latency: 25.552 ms. > Write Count: 3474616 > Write Latency: 0.036 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 109 > Key cache hit rate: 0.716840888175 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node4* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 117070926 > Space used (total): 119479484 > Number of Keys (estimate): 384 > Memtable Columns Count: 159979 > Memtable Data Size: 75029672 > Memtable Switch Count: 60 > Read Count: 135294878 > Read Latency: 19.455 ms. > Write Count: 3474982 > Write Latency: 0.028 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 119 > Key cache hit rate: 0.752235777154 > Row cache: disabled > Compacted row minimum size: 2346800 > Compacted row maximum size: 62479625 > Compacted row mean size: 42591803 > > When I go to 'data' directory on node1 there is no files regarding the > UserDetails ColumnFamily. > > I tried performing manual repair in hope it will heal the situation, > however without any luck. > > # nodetool -h localhost repair A UserDetails > INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges. > INFO 15:19:54,647 Sending AEService tree for # manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, > (A,UserDetails), > (85070591730234615865843651857942052864,127605887595351923798765477786913079296]> > INFO 15:19:54,742 Endpo
Counters and Top 10
Hi all, I'm using Cassandra in production for a small social network (~10.000 people). Now I have to assign some "credits" to each user operation (login, write post and so on) and then beeing capable of providing in each moment the top 10 of the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new version in order to use Counters for the user points but ... what about the top 10? I was thinking about a specific ROW that always keeps the 10 most active users ... but I think it would be heavy (to write and to handle in thread-safe mode) ... can counters provide something like a "value ordered list"? Thanks for any help. Best regards, Carlo
Re: One ColumnFamily places data on only 3 out of 4 nodes
Do you use randompartitiner? What nodetool getendpoints show for several random keys? -Original Message- From: Bart Swedrowski To: user@cassandra.apache.org Sent: Wed, 14 Dec 2011 12:56 Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes Anyone? On 12 December 2011 15:25, Bart Swedrowski wrote: > Hello everyone, > > I seem to have came across rather weird (at least for me!) problem / > behaviour with Cassandra. > > I am running a 4-nodes cluster on Cassandra 0.8.7. For the keyspace in > question, I have RF=3, SimpleStrategy with multiple ColumnFamilies inside > the KeySpace. On of the ColumnFamilies however seems to have data > distributed across only 3 out of 4 nodes. > > The data on the cluster beside the problematic ColumnFamily seems to be > more or less equal and even. > > # nodetool -h localhost ring > Address DC RackStatus State Load > OwnsToken > > 127605887595351923798765477786913079296 > 192.168.81.2datacenter1 rack1 Up Normal 7.27 GB > 25.00% 0 > 192.168.81.3datacenter1 rack1 Up Normal 7.74 GB > 25.00% 42535295865117307932921825928971026432 > 192.168.81.4datacenter1 rack1 Up Normal 7.38 GB > 25.00% 85070591730234615865843651857942052864 > 192.168.81.5datacenter1 rack1 Up Normal 7.32 GB > 25.00% 127605887595351923798765477786913079296 > > Schema for the relevant bits of the keyspace is as follows: > > [default@A] show schema; > create keyspace A > with placement_strategy = 'SimpleStrategy' > and strategy_options = [{replication_factor : 3}]; > [...] > create column family UserDetails > with column_type = 'Standard' > and comparator = 'IntegerType' > and default_validation_class = 'BytesType' > and key_validation_class = 'BytesType' > and memtable_operations = 0.571875 > and memtable_throughput = 122 > and memtable_flush_after = 1440 > and rows_cached = 0.0 > and row_cache_save_period = 0 > and keys_cached = 20.0 > and key_cache_save_period = 14400 > and read_repair_chance = 1.0 > and gc_grace = 864000 > and min_compaction_threshold = 4 > and max_compaction_threshold = 32 > and replicate_on_write = true > and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'; > > And now the symptoms - output of 'nodetool -h localhost cfstats' on each > node. Please note the figures on node1. > > *node1* > Column Family: UserDetails > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > Number of Keys (estimate): 0 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 0 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 0 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > *node2* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112952788 > Space used (total): 164953743 > Number of Keys (estimate): 384 > Memtable Columns Count: 159419 > Memtable Data Size: 74910890 > Memtable Switch Count: 59 > Read Count: 135307426 > Read Latency: 25.900 ms. > Write Count: 3474673 > Write Latency: 0.040 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 120 > Key cache hit rate: 0.71684189041 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node3* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112953137 > Space used (total): 112953137 > Number of Keys (estimate): 384 > Memtable Columns Count: 159421 > Memtable Data Size: 74693445 > Memtable Switch Count: 56 > Read Count: 135304486 > Read Latency: 25.552 ms. > Write Count: 3474616 > Write Latency: 0.036 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 109 > Key cache hit rate: 0.716840888175 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node4* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 117070926 > Space used (total): 119479484 > Number of Keys (estimate): 384 > Memtable Columns Count: 159979 > Memtable Data Size: 75029672 > Memtable Switch Count: 60 > Read Count: 135294878 > Read Latency: 19.455 ms. > Write Count: 3474982 > Write Latency: 0.028 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 119 > Key cache hit rate: 0.752235777154 > Row cache: disabled > Compacted row minimum size: 2346800 > Compacted row maximum size: 62479625 > Compacted row mean size: 42591803 > > When I go to 'data' directory on node1 there is no files regarding the > UserDetails ColumnFamily. > > I tried performing manual repair in hope it will heal the situation, > however without any luck. > > # nodetool -h localhost repair A UserDetails > INFO 15:19:54,611 Starting repair command #8,
Re: One ColumnFamily places data on only 3 out of 4 nodes
On 14 December 2011 13:02, wrote: > Do you use randompartitiner? What nodetool getendpoints show for several > random keys > Yes, randompartitioner it is. Thanks for hint re 'nodetool getendpoints'. I have queried few, and to my surprise, 192.168.82.2 (node1) is showing up as a endpoint for few of them: bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4547246 192.168.81.3 192.168.81.4 192.168.81.5 bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549279 192.168.81.5 192.168.81.2 192.168.81.3 bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549749 192.168.81.2 192.168.81.3 192.168.81.4 bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4545027 192.168.81.5 192.168.81.2 192.168.81.3 Any idea why the hell those are not stored on node1, though? bart@node1:~$ nodetool -h localhost cfstats […] Column Family: UserDetails SSTable count: 0 Space used (live): 0 Space used (total): 0 Number of Keys (estimate): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 20 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0
Re: One ColumnFamily places data on only 3 out of 4 nodes
On 14 December 2011 14:45, Bart Swedrowski wrote: > I have queried few, and to my surprise, 192.168.82.2 (node1) The IP is supposed to be 192.168.81.2
Re: One ColumnFamily places data on only 3 out of 4 nodes
No idea, try to check logs for errors, and increase verbosity level on that node. -Original Message- From: Bart Swedrowski To: user@cassandra.apache.org Sent: Wed, 14 Dec 2011 16:45 Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes On 14 December 2011 13:02, wrote: > Do you use randompartitiner? What nodetool getendpoints show for several > random keys > Yes, randompartitioner it is. Thanks for hint re 'nodetool getendpoints'. I have queried few, and to my surprise, 192.168.82.2 (node1) is showing up as a endpoint for few of them: bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4547246 192.168.81.3 192.168.81.4 192.168.81.5 bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549279 192.168.81.5 192.168.81.2 192.168.81.3 bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4549749 192.168.81.2 192.168.81.3 192.168.81.4 bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4545027 192.168.81.5 192.168.81.2 192.168.81.3 Any idea why the hell those are not stored on node1, though? bart@node1:~$ nodetool -h localhost cfstats […] Column Family: UserDetails SSTable count: 0 Space used (live): 0 Space used (total): 0 Number of Keys (estimate): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 20 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0
Re: One ColumnFamily places data on only 3 out of 4 nodes
On 14 December 2011 14:58, wrote: > No idea, try to check logs for errors, and increase verbosity level on > that node. > No errors at all, few warnings about HEAP size, that's it. Okay, thanks. Anyone else have got any ideas on how to push this forward?
Cassandra C client implementation
Hello, I am trying to integrate some Cassandra related ops ( insert, get, etc ) into an application written entirelly in C, so C++ is not an option. Is there any C client library for cassandra ? I have also tried to generate thrift glibc code for Cassandra, but on wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. Can anybody suggest a C client library for Cassandra or provide some working examples for Thrift in C ? Thanks and Regards, Vlad
Re: Cassandra C client implementation
Try libcassandra, but it doesn't support connection pooling --Original Message-- From: Vlad Paiu To: user@cassandra.apache.org ReplyTo: user@cassandra.apache.org Subject: Cassandra C client implementation Sent: Dec 14, 2011 11:11 PM Hello, I am trying to integrate some Cassandra related ops ( insert, get, etc ) into an application written entirelly in C, so C++ is not an option. Is there any C client library for cassandra ? I have also tried to generate thrift glibc code for Cassandra, but on wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. Can anybody suggest a C client library for Cassandra or provide some working examples for Thrift in C ? Thanks and Regards, Vlad Best Regards, Yi "Steve" Yang ~~~ +1-401-441-5086 +86-13910771510 Sent via BlackBerry?0?3 from China Mobile
Re: Cassandra C client implementation
BTW please use https://github.com/eyealike/libcassandra Best Regards, Yi "Steve" Yang ~~~ +1-401-441-5086 +86-13910771510 Sent via BlackBerry® from China Mobile -Original Message- From: i...@iyyang.com Date: Wed, 14 Dec 2011 15:52:56 To: Reply-To: i...@iyyang.com Subject: Re: Cassandra C client implementation Try libcassandra, but it doesn't support connection pooling --Original Message-- From: Vlad Paiu To: user@cassandra.apache.org ReplyTo: user@cassandra.apache.org Subject: Cassandra C client implementation Sent: Dec 14, 2011 11:11 PM Hello, I am trying to integrate some Cassandra related ops ( insert, get, etc ) into an application written entirelly in C, so C++ is not an option. Is there any C client library for cassandra ? I have also tried to generate thrift glibc code for Cassandra, but on wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. Can anybody suggest a C client library for Cassandra or provide some working examples for Thrift in C ? Thanks and Regards, Vlad Best Regards, Yi "Steve" Yang ~~~ +1-401-441-5086 +86-13910771510 Sent via BlackBerry® from China Mobile
Counters != Counts
Hi everybody. I'm using a lot of counters to make statistics on a 4 nodes cluster (ec2 m1.small) with phpcassa (cassandra v1.0.2). I store some events and increment counters at the same time. Counters give me over-counts compared with the count of every corresponding events. I sure that my non-counters counts are good. I'm not sure why these over-counts happen, but I heard that recovering from commitlogs can produce this. I have some timeouts on phpcassa which are written in my apache logs while a compaction is running. However I am always able to write at Quorum, so I guess I shouldn't have to recover from cassandra commitlogs. Where can these over-counts come from ? Alain
Re: Keys for deleted rows visible in CLI
http://wiki.apache.org/cassandra/FAQ#range_ghosts On Wed, Dec 14, 2011 at 4:36 AM, Radim Kolar wrote: > Dne 14.12.2011 1:15, Maxim Potekhin napsal(a): > >> Thanks. It could be hidden from a human operator, I suppose :) > > I agree. Open JIRA for it.
Re: configurable bloom filters (like hbase)
https://issues.apache.org/jira/browse/CASSANDRA-3497 On Wed, Dec 14, 2011 at 4:52 AM, Radim Kolar wrote: > Dne 11.11.2011 7:55, Radim Kolar napsal(a): > >> i have problem with large CF (about 200 billions entries per node). While >> i can configure index_interval to lower memory requirements, i still have to >> stick with huge bloom filters. >> >> Ideal would be to have bloom filters configurable like in hbase. Cassandra >> standard is about 1.05% false possitive but in my case i would be fine even >> with 20% false positive rate. Data are not often read back. Most of them >> will be never read before they expire via TTL. > > anybody other has problem that bloom filters are using too much memory in > applications which do not needs to read written data often? > > I am looking at bloom filters memory used and it would be ideal to have in > cassandra-1.1 ability to shrink bloom filters to about 1/10 of their size. > Is possible to code something like this: save bloom filters to disk as usual > but during load, transform them into something smaller at cost increasing FP > rate?
Re: Cassandra C client implementation
Hello, Thanks for your answer. Unfortunately libcassandra is C++ , I'm looking for something written in ANSI C. I've searched alot and my guess is glibc thrift is my only option, but I could not find even one example onto how to make a connection & some queries to Cassandra using glibc thrift. Does anyone have experience/some examples for this ? Regards, Vlad i...@iyyang.com wrote: >BTW please use >https://github.com/eyealike/libcassandra > > >Best Regards, >Yi "Steve" Yang >~~~ >+1-401-441-5086 >+86-13910771510 > >Sent via BlackBerry® from China Mobile > >-Original Message- >From: i...@iyyang.com >Date: Wed, 14 Dec 2011 15:52:56 >To: >Reply-To: i...@iyyang.com >Subject: Re: Cassandra C client implementation > >Try libcassandra, but it doesn't support connection pooling > >--Original Message-- >From: Vlad Paiu >To: user@cassandra.apache.org >ReplyTo: user@cassandra.apache.org >Subject: Cassandra C client implementation >Sent: Dec 14, 2011 11:11 PM > >Hello, > >I am trying to integrate some Cassandra related ops ( insert, get, etc ) into >an application written entirelly in C, so C++ is not an option. > >Is there any C client library for cassandra ? > > I have also tried to generate thrift glibc code for Cassandra, but on > wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. > >Can anybody suggest a C client library for Cassandra or provide some working >examples for Thrift in C ? > >Thanks and Regards, >Vlad >Best Regards, >Yi "Steve" Yang >~~~ >+1-401-441-5086 >+86-13910771510 > >Sent via BlackBerry® from China Mobile
Re: commit log size
Alexandru, Jeremiah -- what setting needs to be tweaked, and what's the recommended value? I observed similar behavior this morning. Maxim On 11/28/2011 2:53 PM, Jeremiah Jordan wrote: Yes, the low volume memtables are causing the problem. Lower the thresholds for those tables if you don't want the commit logs to go crazy. -Jeremiah On 11/28/2011 11:11 AM, Alexandru Dan Sicoe wrote: Hello everyone, 4 node Cassandra 0.8.5 cluster with RF=2, replica placement strategy = SimpleStartegy, write consistency level = ANY, memtable_flush_after_mins =1440; memtable_operations_in_millions=0.1; memtable_throughput_in_mb = 40; max_compaction_threshold =32; min_compaction_threshold =4; I have one keyspace with 1 CF for all the data and 3 other small CFs for metadata. I am using Datastax OpsCenter to monitor my cluster so there is another keyspace for monitoring. Everything works ok, the only thing I've noticed is this morning the commitlog of one node was 52GB, one was 25 GB and the others were around 3 GB. I left everything untouched and looked a couple of hours later and the 52GB one is now about 3GB and the 25 GB one is now 29 GB and the other two about the same as before. Are my commit logs growing because of small memtables which don't get flushed because they don't reach the operations and throughput limits? Then why do only some nodes exhibit this behaviour? It would be interesting to understand how to control the size of the commitlog also to know how to size my commitlog disks! Thanks, Alex
Re: Keys for deleted rows visible in CLI
Thanks, it makes perfect sense now. Well an option in cassandra could make it optional as far as display it concerned, w/o performance hit -- of course this is all unimportant. Thanks again Maxim On 12/14/2011 11:30 AM, Brandon Williams wrote: http://wiki.apache.org/cassandra/FAQ#range_ghosts On Wed, Dec 14, 2011 at 4:36 AM, Radim Kolar wrote: Dne 14.12.2011 1:15, Maxim Potekhin napsal(a): Thanks. It could be hidden from a human operator, I suppose :) I agree. Open JIRA for it.
RE: Cassandra C client implementation
VIrgil apparently lets you access cassandra via a RESTful interface: http://code.google.com/a/apache-extras.org/p/virgil/ Depending on your performance needs and the maturity of virgil's code (I think it's alpha), that may work. You could always fork a java process and pipe to it. Don From: Vlad Paiu [vladp...@opensips.org] Sent: Wednesday, December 14, 2011 8:33 AM To: user@cassandra.apache.org Subject: Re: Cassandra C client implementation Hello, Thanks for your answer. Unfortunately libcassandra is C++ , I'm looking for something written in ANSI C. I've searched alot and my guess is glibc thrift is my only option, but I could not find even one example onto how to make a connection & some queries to Cassandra using glibc thrift. Does anyone have experience/some examples for this ? Regards, Vlad i...@iyyang.com wrote: >BTW please use >https://github.com/eyealike/libcassandra > > >Best Regards, >Yi "Steve" Yang >~~~ >+1-401-441-5086 >+86-13910771510 > >Sent via BlackBerry® from China Mobile > >-Original Message- >From: i...@iyyang.com >Date: Wed, 14 Dec 2011 15:52:56 >To: >Reply-To: i...@iyyang.com >Subject: Re: Cassandra C client implementation > >Try libcassandra, but it doesn't support connection pooling > >--Original Message-- >From: Vlad Paiu >To: user@cassandra.apache.org >ReplyTo: user@cassandra.apache.org >Subject: Cassandra C client implementation >Sent: Dec 14, 2011 11:11 PM > >Hello, > >I am trying to integrate some Cassandra related ops ( insert, get, etc ) into >an application written entirelly in C, so C++ is not an option. > >Is there any C client library for cassandra ? > > I have also tried to generate thrift glibc code for Cassandra, but on > wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. > >Can anybody suggest a C client library for Cassandra or provide some working >examples for Thrift in C ? > >Thanks and Regards, >Vlad >Best Regards, >Yi "Steve" Yang >~~~ >+1-401-441-5086 >+86-13910771510 > >Sent via BlackBerry® from China Mobile
Re: Cassandra C client implementation
If you are OK linking to a C++ based library you can look at: https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie It is wrapper code around libcassandra which exports a C++ interface. If you look at the function names etc in the other languages, just use the similar functions from the c_glib thrift... If you are going to mess with using the c_glib thrift, make sure to check out the JIRA for it, it is new and has some issues... https://issues.apache.org/jira/browse/THRIFT/component/12313854 On 12/14/2011 09:11 AM, Vlad Paiu wrote: Hello, I am trying to integrate some Cassandra related ops ( insert, get, etc ) into an application written entirelly in C, so C++ is not an option. Is there any C client library for cassandra ? I have also tried to generate thrift glibc code for Cassandra, but on wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. Can anybody suggest a C client library for Cassandra or provide some working examples for Thrift in C ? Thanks and Regards, Vlad
[RELEASE] Apache Cassandra 1.0.6 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.0.6. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is maintenance/bug fix release[1]. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Have fun! [1]: http://goo.gl/Pl1TE (CHANGES.txt) [2]: http://goo.gl/9xHEC (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: One ColumnFamily places data on only 3 out of 4 nodes
> bart@node1:~$ nodetool -h localhost getendpoints A UserDetails 4545027 > 192.168.81.5 > 192.168.81.2 > 192.168.81.3 Can you see what happens if you stop C* say on node .5 and write and read at quorum? On Wed, Dec 14, 2011 at 7:06 AM, Bart Swedrowski wrote: > > > On 14 December 2011 14:58, wrote: >> >> No idea, try to check logs for errors, and increase verbosity level on >> that node. > > No errors at all, few warnings about HEAP size, that's it. > > Okay, thanks. > > Anyone else have got any ideas on how to push this forward?
Re: Cassandra C client implementation
Hello, Thanks very much for your suggestions. Libcassie seems nice but doesn't seem like it's actively maintained and i'm not sure if it's compatible with latest Cassandra versions. Will give it a try though. I was looking through the generated thrift .c files and I can't seem to find what function to call to init a connection to a Cassandra instance. Any ideas ? Thanks and Regards, Vlad Jeremiah Jordan wrote: >If you are OK linking to a C++ based library you can look at: >https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie >It is wrapper code around libcassandra which exports a C++ interface. >If you look at the function names etc in the other languages, just use >the similar functions from the c_glib thrift... >If you are going to mess with using the c_glib thrift, make sure to >check out the JIRA for it, it is new and has some issues... >https://issues.apache.org/jira/browse/THRIFT/component/12313854 > > >On 12/14/2011 09:11 AM, Vlad Paiu wrote: >> Hello, >> >> I am trying to integrate some Cassandra related ops ( insert, get, etc ) >> into an application written entirelly in C, so C++ is not an option. >> >> Is there any C client library for cassandra ? >> >> I have also tried to generate thrift glibc code for Cassandra, but on >> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. >> >> Can anybody suggest a C client library for Cassandra or provide some working >> examples for Thrift in C ? >> >> Thanks and Regards, >> Vlad
Re: 1.0.3 CLI oddities
Correct. 1.0.6 fixes this for me. /Janne On 12 Dec 2011, at 02:57, Chris Burroughs wrote: > Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3558 and the > other tickets reference there. > > On 11/28/2011 05:05 AM, Janne Jalkanen wrote: >> Hi! >> >> (Asked this on IRC too, but didn't get anyone to respond, so here goes...) >> >> Is it just me, or are these real bugs? >> >> On 1.0.3, from CLI: "update column family XXX with gc_grace = 36000;" just >> says "null" with nothing logged. Previous value is the default. >> >> Also, on 1.0.3, "update column family XXX with >> compression_options={sstable_compression:SnappyCompressor,chunk_length_kb:64};" >> returns "Internal error processing system_update_column_family" and log >> says "Invalid negative or null chunk_length_kb" (stack trace below) >> >> Setting the compression options worked on 1.0.0 when I was testing (though >> my 64 kB became 64 MB, but I believe this was fixed in 1.0.3.) >> >> Did the syntax change between 1.0.0 and 1.0.3? Or am I doing something >> wrong? >> >> The database was upgraded from 0.6.13 to 1.0.0, then scrubbed, then >> compression options set to some CFs, then upgraded to 1.0.3 and trying to >> set compression on other CFs. >> >> Stack trace: >> >> ERROR [pool-2-thread-68] 2011-11-28 09:59:26,434 Cassandra.java (line 4038) >> Internal error processing system_update_column_family >> java.lang.RuntimeException: java.util.concurrent.ExecutionException: >> java.io.IOException: org.apache.cassandra.config.ConfigurationException: >> Invalid negative or null chunk_length_kb >> at >> org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:898) >> at >> org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1089) >> at >> org.apache.cassandra.thrift.Cassandra$Processor$system_update_column_family.process(Cassandra.java:4032) >> at >> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) >> at >> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:680) >> Caused by: java.util.concurrent.ExecutionException: java.io.IOException: >> org.apache.cassandra.config.ConfigurationException: Invalid negative or null >> chunk_length_kb >> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) >> at java.util.concurrent.FutureTask.get(FutureTask.java:83) >> at >> org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:890) >> ... 7 more >> Caused by: java.io.IOException: >> org.apache.cassandra.config.ConfigurationException: Invalid negative or null >> chunk_length_kb >> at >> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78) >> at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156) >> at >> org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> ... 3 more >> Caused by: org.apache.cassandra.config.ConfigurationException: Invalid >> negative or null chunk_length_kb >> at >> org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167) >> at >> org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52) >> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796) >> at >> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74) >> ... 7 more >> ERROR [MigrationStage:1] 2011-11-28 09:59:26,434 >> AbstractCassandraDaemon.java (line 133) Fatal exception in thread >> Thread[MigrationStage:1,5,main] >> java.io.IOException: org.apache.cassandra.config.ConfigurationException: >> Invalid negative or null chunk_length_kb >> at >> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78) >> at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156) >> at >> org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:680) >> Caused by: org.apache.cassandra.config.ConfigurationException: Invalid
[RELEASE] Apache Cassandra 0.8.9 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 0.8.9. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is a maintenance/bug fix release[1]. Please pay attention to the release notes[2] before upgrading and let us know[3] if you were to encounter any problem. Have fun! [1]: http://goo.gl/Kx7d0 (CHANGES.txt) [2]: http://goo.gl/Tv2NW (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Asymmetric load
What could be the reason I see unequal loads on a 3-node cluster? This all started happening during repairs (which again are not going smoothly). Maxim
Crazy compactionstats
Hello I ran repair like this: nohup repair.sh & where repair.sh contains simply nodetool repair plus timestamp. The process dies while dumping this: Exception in thread "main" java.io.IOException: Repair command #1: some repair session(s) failed (see log for details). at org.apache.cassandra.service.StorageService.forceTableRepair(StorageService.java:1613) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I still see pending tasks in nodetool compactionstats, and their number goes into hundreds which I haven't seen before. What's going on? Thanks Maxim
Re: Cassandra C client implementation
Hi, Just tried libcassie and seems it's not compatible with latest cassandra, as even simple inserts and fetches fail with InvalidRequestException... So can anybody please provide a very simple example in C for connecting & fetching columns with thrift ? Regards, Vlad Vlad Paiu wrote: >Hello, > >Thanks very much for your suggestions. >Libcassie seems nice but doesn't seem like it's actively maintained and i'm >not sure if it's compatible with latest Cassandra versions. Will give it a try >though. > >I was looking through the generated thrift .c files and I can't seem to find >what function to call to init a connection to a Cassandra instance. Any ideas ? > >Thanks and Regards, >Vlad > >Jeremiah Jordan wrote: > >>If you are OK linking to a C++ based library you can look at: >>https://github.com/minaguib/libcassandra/tree/kickstart-libcassie-0.7/libcassie >>It is wrapper code around libcassandra which exports a C++ interface. >>If you look at the function names etc in the other languages, just use >>the similar functions from the c_glib thrift... >>If you are going to mess with using the c_glib thrift, make sure to >>check out the JIRA for it, it is new and has some issues... >>https://issues.apache.org/jira/browse/THRIFT/component/12313854 >> >> >>On 12/14/2011 09:11 AM, Vlad Paiu wrote: >>> Hello, >>> >>> I am trying to integrate some Cassandra related ops ( insert, get, etc ) >>> into an application written entirelly in C, so C++ is not an option. >>> >>> Is there any C client library for cassandra ? >>> >>> I have also tried to generate thrift glibc code for Cassandra, but on >>> wiki.apache.org/cassandra/ThriftExamples I cannot find an example for C. >>> >>> Can anybody suggest a C client library for Cassandra or provide some >>> working examples for Thrift in C ? >>> >>> Thanks and Regards, >>> Vlad
Re: Cassandra C client implementation
On 12/14/2011 04:18 PM, Vlad Paiu wrote: Hi, Just tried libcassie and seems it's not compatible with latest cassandra, as even simple inserts and fetches fail with InvalidRequestException... So can anybody please provide a very simple example in C for connecting& fetching columns with thrift ? Regards, Vlad Vlad Paiu wrote: Vlad, We have written a specific cassandra db module for usrloc with opensips and have open sourced it on github. We use the thrift generated c++ bindings and extern stuff to c. I spoke to bogdan about this a while ago, and gave him the github link, but here it is for your reference https://github.com/junction/db_jnctn_usrloc Hopefully that helps. I idle in #opensips too, just ask about cassandra in there and I'll probably see it. - Eric Tamme
Re: Cassandra C client implementation
Hello Eric, We have that, thanks alot for the contribution. The idea is to not play around with including C++ code in a C app, if there's an alternative ( the thrift g_libc ). Unfortunately, since thrift does not generate a skeleton for the glibc code, I don't know how to find out what the API functions are called, and guessing them is not going that good :) I'll wait a little longer & see if anybody can help with the C thrift, or at least tell me it's not working. :) Regards, Vlad Eric Tamme wrote: >On 12/14/2011 04:18 PM, Vlad Paiu wrote: >> Hi, >> >> Just tried libcassie and seems it's not compatible with latest cassandra, as >> even simple inserts and fetches fail with InvalidRequestException... >> >> So can anybody please provide a very simple example in C for connecting& >> fetching columns with thrift ? >> >> Regards, >> Vlad >> >> Vlad Paiu wrote: >> > >Vlad, > >We have written a specific cassandra db module for usrloc with opensips >and have open sourced it on github. We use the thrift generated c++ >bindings and extern stuff to c. I spoke to bogdan about this a while >ago, and gave him the github link, but here it is for your reference >https://github.com/junction/db_jnctn_usrloc > >Hopefully that helps. I idle in #opensips too, just ask about >cassandra in there and I'll probably see it. > >- Eric Tamme >
Re: Crazy compactionstats
> Exception in thread "main" java.io.IOException: Repair command #1: some > repair session(s) failed (see log for details). For why repair failed you unfortunately need to log at logs as it suggests. > I still see pending tasks in nodetool compactionstats, and their number goes > into hundreds which I haven't seen before. > What's going on? The compactions "pending" is not very useful. It just says how many tasks are pending that MIGHT compact. Typically it will either be 0, or it will be steadily increasing while compactions are happening until suddenly snapping back to 0 again once compactions catch up. Whether or not non-zero is a problem depends on the Cassandra version, how many concurrent compactors you are running, and your column families/data sizes/flushing speeds etc. (Sorry, kind of a long story) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Cassandra C client implementation
Hi Vlad I'm the author of libcassie. For what it's worth, it's in production where I work, consuming a heavily-used cassandra 0.7.9 cluster. We do have plans to upgrade the cluster to 1.x, to benefit from all the improvements, CQL, etc... but that includes revising all our clients (across several programming languages). So, it's definitely on my todo list to address our C clients by either upgrading libcassie, or possibly completely rewriting it. Currently it's a wrapper around the C++ parent project libcassandra. I haven't been fond of having that many layered abstractions, and the thrift Glib2 interface has definitely piqued my interest, so I'm leaning towards a complete rewrite. While we're at it, it would also be nice to have features like asynchronous modes for popular event loops, connection pooling, etc. Unfortunately, I have no milestones set for any of this, nor the time (currently) to experiment and proof-of-concept it. I'd be curious to hear from other C hackers whether they've experimented with the thrift Glib2 interface and gotten a "hello world" to work against cassandra 1.x. Perhaps there's room for some code sharing/collaboration on a new library to supersede the existing libcassie+libcassandra. On 2011-12-14, at 5:16 PM, Vlad Paiu wrote: > Hello Eric, > > We have that, thanks alot for the contribution. > The idea is to not play around with including C++ code in a C app, if there's > an alternative ( the thrift g_libc ). > > Unfortunately, since thrift does not generate a skeleton for the glibc code, > I don't know how to find out what the API functions are called, and guessing > them is not going that good :) > > I'll wait a little longer & see if anybody can help with the C thrift, or at > least tell me it's not working. :) > > Regards, > Vlad > > Eric Tamme wrote: > >> On 12/14/2011 04:18 PM, Vlad Paiu wrote: >>> Hi, >>> >>> Just tried libcassie and seems it's not compatible with latest cassandra, >>> as even simple inserts and fetches fail with InvalidRequestException... >>> >>> So can anybody please provide a very simple example in C for connecting& >>> fetching columns with thrift ? >>> >>> Regards, >>> Vlad >>> >>> Vlad Paiu wrote: >>> >> >> Vlad, >> >> We have written a specific cassandra db module for usrloc with opensips >> and have open sourced it on github. We use the thrift generated c++ >> bindings and extern stuff to c. I spoke to bogdan about this a while >> ago, and gave him the github link, but here it is for your reference >> https://github.com/junction/db_jnctn_usrloc >> >> Hopefully that helps. I idle in #opensips too, just ask about >> cassandra in there and I'll probably see it. >> >> - Eric Tamme >>
tmp files in /var/lib/cassandra/data
We are using leveled compaction running cassandra 1.0.6. I checked the data directory (/var/lib/cassandra/data) and i see these 0 bytes tmp files. What are these files? thanks Ramesh -rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Data.db -rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Index.db -rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Data.db -rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Index.db -rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Data.db -rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Index.db -rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Data.db -rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Index.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Data.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Index.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Data.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Index.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Data.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Index.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Data.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Index.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Data.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Index.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Data.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Index.db -rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Data.db -rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Index.db -rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-213-Data.db -rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-213-Index.db -rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-217-Data.db -rw-r--r-- 1 root root0 Dec 14 18:11 uid-tmp-hc-217-Index.db -rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-230-Data.db -rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-230-Index.db -rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-235-Data.db -rw-r--r-- 1 root root0 Dec 14 18:19 uid-tmp-hc-235-Index.db -rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-249-Data.db -rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-249-Index.db -rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-253-Data.db -rw-r--r-- 1 root root0 Dec 14 18:27 uid-tmp-hc-253-Index.db -rw-r--r-- 1 root root0 Dec 14 18:28 uid-tmp-hc-257-Data.db -rw-r--r-- 1 root root0 Dec 14 18:28 uid-tmp-hc-257-Index.db -rw-r--r-- 1 root root0 Dec 14 18:35 uid-tmp-hc-270-Data.db -rw-r--r-- 1 root root0 Dec 14 18:35 uid-tmp-hc-270-Index.db -rw-r--r-- 1 root root0 Dec 14 18:36 uid-tmp-hc-275-Data.db -rw-r--r-- 1 root root0 Dec 14 18:36 uid-tmp-hc-275-Index.db -rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-288-Data.db -rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-288-Index.db -rw-r--r-- 1 root root0 Dec 14 16:10 uid-tmp-hc-28-Data.db -rw-r--r-- 1 root root0 Dec 14 16:10 uid-tmp-hc-28-Index.db -rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-293-Data.db -rw-r--r-- 1 root root0 Dec 14 18:44 uid-tmp-hc-293-Index.db -rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-307-Data.db -rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-307-Index.db -rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-310-Data.db -rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-310-Index.db -rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-315-Data.db -rw-r--r-- 1 root root0 Dec 14 18:52 uid-tmp-hc-315-Index.db -rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-328-Data.db -rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-328-Index.db -rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-333-Data.db -rw-r--r-- 1 root root0 Dec 14 19:00 uid-tmp-hc-333-Index.db -rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-347-Data.db -rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-347-Index.db -rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-353-Data.db -rw-r--r-- 1 root root0 Dec 14 19:08 uid-tmp-hc-353-Index.db -rw-r--r-- 1 root root0 Dec 14 19:09 uid-tmp-hc-357-Data.db -rw-r--r-- 1 root root0 Dec 14 19:09 uid-tmp-hc-357-Index.db -rw-r--r-- 1 root root0 Dec 14 19:17 uid-tmp-hc-370-Data.db -rw-r--r-- 1 root root0 Dec 14 19:17 uid-tmp-hc-370-Index.db -rw-r--r-- 1 root root
RE: tmp files in /var/lib/cassandra/data
I'm seeing this also, and my nodes have started crashing with "too many open file errors". Running lsof I see lots of these open tmp files. java 8185 root 911u REG 8,32 38 129108266 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268721-CompressionInfo.db java 8185 root 912u REG 8,32 0 155320741 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1092-Data.db java 8185 root 913u REG 8,32 0 155320742 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Index.db java 8185 root 914u REG 8,32 0 155320743 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Data.db java 8185 root 916u REG 8,32 0 155320754 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1113-Data.db java 8185 root 918u REG 8,32 0 155320744 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Index.db java 8185 root 919u REG 8,32 0 155320745 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Data.db java 8185 root 920u REG 8,32 0 155320755 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Index.db java 8185 root 921u REG 8,32 0 129108272 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-Data.db java 8185 root 922u REG 8,32 38 129108273 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-CompressionInfo.db java 8185 root 923u REG 8,32 0 155320756 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Data.db java 8185 root 929u REG 8,32 38 129108262 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268822-CompressionInfo.db java 8185 root 947u REG 8,32 0 129108284 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-Data.db java 8185 root 948u REG 8,32 38 129108285 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-CompressionInfo.db java 8185 root 954u REG 8,32 0 155320746 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Index.db java 8185 root 955u REG 8,32 0 155320747 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Data.db Going to try rolling back to 1.0.5 for the time being even though I was hoping to use one of the fixes in 1.0.6 -Original Message- From: Ramesh Natarajan [mailto:rames...@gmail.com] Sent: Wednesday, December 14, 2011 6:03 PM To: user@cassandra.apache.org Subject: tmp files in /var/lib/cassandra/data We are using leveled compaction running cassandra 1.0.6. I checked the data directory (/var/lib/cassandra/data) and i see these 0 bytes tmp files. What are these files? thanks Ramesh -rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Data.db -rw-r--r-- 1 root root0 Dec 14 17:15 uid-tmp-hc-106-Index.db -rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Data.db -rw-r--r-- 1 root root0 Dec 14 17:23 uid-tmp-hc-117-Index.db -rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Data.db -rw-r--r-- 1 root root0 Dec 14 15:51 uid-tmp-hc-11-Index.db -rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Data.db -rw-r--r-- 1 root root0 Dec 14 17:31 uid-tmp-hc-129-Index.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Data.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-142-Index.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Data.db -rw-r--r-- 1 root root0 Dec 14 17:40 uid-tmp-hc-145-Index.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Data.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-158-Index.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Data.db -rw-r--r-- 1 root root0 Dec 14 17:47 uid-tmp-hc-162-Index.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Data.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-175-Index.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Data.db -rw-r--r-- 1 root root0 Dec 14 17:55 uid-tmp-hc-179-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-193-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-197-Index.db -rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Data.db -rw-r--r-- 1 root root0 Dec 14 16:02 uid-tmp-hc-19-Index.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Data.db -rw-r--r-- 1 root root0 Dec 14 18:03 uid-tmp-hc-200-Index.db -r
Best way to implement indexing for high-cardinality values?
I now have a CF with extremely skinny rows (in the current implementation), and the application will want to query by more than one column values. Problem is that the values in a lot of cases will be high cardinality. One other factor is that I want to rotate data in and our of the system in one day buckets -- LILO in effect. The date will be on of the columns as well. I had 9 indexes in mind, but I think I can pare it down to 5. At least one of the column I will need to query by, has values that are guaranteed to be unique -- there are effectively two ways to identify data for very different part of the complete system. Indexing on that would be bad, wouldn't it? Any advice would be appreciated. Thanks Maxim
Re: tmp files in /var/lib/cassandra/data
yep, so far it looks like a file descriptor leak. Not sure if gc or some other event like compaction would close these files.. [root@CAP-VM-1 ~]# ls -al /proc/31134/fd | grep MSA | wc -l 540 [root@CAP-VM-1 ~]# ls -al /proc/31134/fd | grep MSA | wc -l 542 [root@CAP-VM-1 ~]# ls -al /proc/31134/fd | grep MSA | wc -l 554 [root@CAP-VM-1 ~]# ls -al /proc/31134/fd | grep MSA | wc -l 558 On Wed, Dec 14, 2011 at 8:28 PM, Bryce Godfrey wrote: > I'm seeing this also, and my nodes have started crashing with "too many open > file errors". Running lsof I see lots of these open tmp files. > > java 8185 root 911u REG 8,32 38 > 129108266 > /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268721-CompressionInfo.db > java 8185 root 912u REG 8,32 0 > 155320741 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1092-Data.db > java 8185 root 913u REG 8,32 0 > 155320742 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Index.db > java 8185 root 914u REG 8,32 0 > 155320743 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1097-Data.db > java 8185 root 916u REG 8,32 0 > 155320754 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1113-Data.db > java 8185 root 918u REG 8,32 0 > 155320744 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Index.db > java 8185 root 919u REG 8,32 0 > 155320745 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1102-Data.db > java 8185 root 920u REG 8,32 0 > 155320755 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Index.db > java 8185 root 921u REG 8,32 0 > 129108272 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-Data.db > java 8185 root 922u REG 8,32 38 > 129108273 > /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268781-CompressionInfo.db > java 8185 root 923u REG 8,32 0 > 155320756 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1118-Data.db > java 8185 root 929u REG 8,32 38 > 129108262 > /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268822-CompressionInfo.db > java 8185 root 947u REG 8,32 0 > 129108284 /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-Data.db > java 8185 root 948u REG 8,32 38 > 129108285 > /opt/cassandra/data/MonitoringData/Properties-tmp-hc-268854-CompressionInfo.db > java 8185 root 954u REG 8,32 0 > 155320746 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Index.db > java 8185 root 955u REG 8,32 0 > 155320747 /opt/cassandra/data/system/HintsColumnFamily-tmp-hc-1107-Data.db > > Going to try rolling back to 1.0.5 for the time being even though I was > hoping to use one of the fixes in 1.0.6 > > -Original Message- > From: Ramesh Natarajan [mailto:rames...@gmail.com] > Sent: Wednesday, December 14, 2011 6:03 PM > To: user@cassandra.apache.org > Subject: tmp files in /var/lib/cassandra/data > > We are using leveled compaction running cassandra 1.0.6. I checked the data > directory (/var/lib/cassandra/data) and i see these 0 bytes tmp files. > What are these files? > > thanks > Ramesh > > -rw-r--r-- 1 root root 0 Dec 14 17:15 uid-tmp-hc-106-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:15 uid-tmp-hc-106-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:23 uid-tmp-hc-117-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:23 uid-tmp-hc-117-Index.db > -rw-r--r-- 1 root root 0 Dec 14 15:51 uid-tmp-hc-11-Data.db > -rw-r--r-- 1 root root 0 Dec 14 15:51 uid-tmp-hc-11-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:31 uid-tmp-hc-129-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:31 uid-tmp-hc-129-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:40 uid-tmp-hc-142-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:40 uid-tmp-hc-142-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:40 uid-tmp-hc-145-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:40 uid-tmp-hc-145-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:47 uid-tmp-hc-158-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:47 uid-tmp-hc-158-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:47 uid-tmp-hc-162-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:47 uid-tmp-hc-162-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:55 uid-tmp-hc-175-Data.db > -rw-r--r-- 1 root root 0 Dec 14 17:55 uid-tmp-hc-175-Index.db > -rw-r--r-- 1 root root 0 Dec 14 17:55 uid-tmp-hc-179-Data.db > -rw-r--r-- 1 root roo