Re: Tree Search in Cassandra
I am not worried about getting the occasional wrong result - if I were, I couldn't use Cassandra. I am only worried about breaking the index as a whole. If concurrent changes to the tree happen to modify the same record, I don't mind if one of them "wins" as long as the result is a working tree. On Tue, Jun 8, 2010 at 1:20 AM, Tatu Saloranta wrote: > On Mon, Jun 7, 2010 at 3:09 PM, Ian Soboroff wrote: > > I was going to say, if ordered trees are your problem, Cassandra is not > your > > solution. Try building something with Berkeley DB. > > Also -- while there are no official plans for this, there have been > discussions on Voldemort list, wrt. possible future work to make some > use of their pluggable backends. > The most commonly used configuration is that of using BDBs; and > supposedly it is not totally out of question to consider adding > specific backend-dependant functionality in future. > So it might make sense to ping Voldemort-ians too; it is another > actively-developed distributed key/value store implementation, with > slightly different trade-offs. > > -+ Tatu +- >
Re: Using Cassandra via the Erlang Thrift Client API (HOW ??)
Greetings, I am also exploring erlang and cassandra via thrift.. but when inserting i've encountered this error. (t...@ubuntu)11> thrift_client:call(C,'insert',[ "Keyspace1","1",#columnPath{column_family="Standard1", column="email"}, "t...@example.com",1,1 ]). =ERROR REPORT 8-Jun-2010::15:07:58 === ** Generic server <0.118.0> terminating ** Last message in was {call,insert, ["Keyspace1","1", {columnPath,"Standard1",undefined,"email"}, "t...@example.com",1,1]} ** When Server state == {state,cassandra_thrift, {protocol,thrift_binary_protocol, {binary_protocol, {transport,thrift_buffered_transport,<0.119.0>}, true,true}}, 0} ** Reason for termination == ** {'module could not be loaded', [{cassandra_thrift,function_info,[insert,params_type]}, {thrift_client,send_function_call,3}, {thrift_client,'-handle_call/3-fun-0-',3}, {thrift_client,catch_function_exceptions,2}, {thrift_client,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]} ** exception exit: undef in function cassandra_thrift:function_info/2 called as cassandra_thrift:function_info(insert,params_type) in call from thrift_client:send_function_call/3 in call from thrift_client:'-handle_call/3-fun-0-'/3 in call from thrift_client:catch_function_exceptions/2 in call from thrift_client:handle_call/3 in call from gen_server:handle_msg/5 in call from proc_lib:init_p_do_apply/3 Is there anyone who encountered the problem above. :) thanks in advanced :) - Niel Riddle :) -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-via-the-Erlang-Thrift-Client-API-HOW-tp4672926p5152514.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Tree Search in Cassandra
On Tue, Jun 8, 2010 at 12:07 AM, David Boxenhorn wrote: > I am not worried about getting the occasional wrong result - if I were, I > couldn't use Cassandra. I am only worried about breaking the index as a > whole. If concurrent changes to the tree happen to modify the same record, I > don't mind if one of them "wins" as long as the result is a working tree. Right: I would expect it not to be just an occasional missing or extra update, but rather corruption of the whole thing. The whole point of b-tree (and alike) structures is to bucket up set of things, splitting up and merging buckets. Seemingly minor flaws during that processing can FUBAR the structure itself. Or maybe I am completely misunderstanding how you were thinking of implementing this. -+ Tatu +-
Re: Tree Search in Cassandra
As I said above, I was wondering if I could come up with a robust algorithm, e.g. creating the new super columns and then attaching them at the end, which will not FUBAR my index if it fails. On Tue, Jun 8, 2010 at 10:53 AM, Tatu Saloranta wrote: > On Tue, Jun 8, 2010 at 12:07 AM, David Boxenhorn > wrote: > > I am not worried about getting the occasional wrong result - if I were, I > > couldn't use Cassandra. I am only worried about breaking the index as a > > whole. If concurrent changes to the tree happen to modify the same > record, I > > don't mind if one of them "wins" as long as the result is a working tree. > > Right: I would expect it not to be just an occasional missing or extra > update, but rather corruption of the whole thing. The whole point of > b-tree (and alike) structures is to bucket up set of things, splitting > up and merging buckets. Seemingly minor flaws during that processing > can FUBAR the structure itself. > > Or maybe I am completely misunderstanding how you were thinking of > implementing this. > > -+ Tatu +- >
Re: Cassandra on flash storage
On 08/06/10 03:17, Shuai Yuan wrote: Would you please tell the performance you measured? Although I don't have any experience relating to flash-drive, I'm very interested in switching to SSD. I don't have any benchmark around, but I can tell that those io-drives are incredibly fast, not to mention the access time, which is amazing. There are some tests in the mysqlperformance, for example: http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/ Our system is memory constrained (just 16GB per machine), so I though that the io-drives would help a lot. The random io is basically free. Regards
Duplicate a node (replication).
Hi. I have a cluster with only 1 node with a lot of datas (500 Go) . I want add a new node with the same datas (with a ReplicationFactor 2) The method normal is : stop node. add a node. change replication factor to 2. start nodes use nodetool repair But , I didn't know if this other method is valid, and if it's can be faster : stop nodes. copy all SSTables change replication factor. start nodes and use nodetool repair Have you an idea for the faster valid method ? Thx.
Re: Cassandra on flash storage
cassandra is designed to do less random i/o than b-tree based systems like tokyo cabinet. ssds are not as useful for most workloads. On Mon, Jun 7, 2010 at 8:37 AM, Héctor Izquierdo wrote: > Hi everyone. > > I wanted to know if anybody has had any experience with cassandra on flash > storage. At work we have a cluster of 6 machines running Tokyotyrant on > flash-io drives (320GB) each, but performance is not what we expected, and > we'are having some issues with replication and availability. It's also hard > to manage, and adding/removing nodes is pure hell. > > We can't afford test hardware with flash storage right now, so could > somebody share his experience? > > Thank you very much > > Héctor Izquierdo > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Handling disk-full scenarios
And three days later, AE stages are still running full-bore. So I conclude this is not a very good approach. I wonder what will happen when I lose a disk (which is essentially the same as what I did -- rm the data directory). What happens if I lose a disk while the AE stages are running? Since my RF is 3, I assume that I have data loss when three disks are gone. Not very happy. I'm going to blow away what I have, do another reload, then try dropping a disk again, just to confirm the results... I can't really believe this is how it should happen. Ian On Fri, Jun 4, 2010 at 12:50 PM, Ian Soboroff wrote: > Story continued, in hopes this experience is useful to someone... > > I shut down the node, removed the huge file, restarted the node, and told > everybody to repair. Two days later, AE stages are still running. > > Ian > > > On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis wrote: > >> this is why JBOD configuration is contraindicated for cassandra. >> http://wiki.apache.org/cassandra/CassandraHardware >> >> On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff wrote: >> > My nodes have 5 disks and are using them separately as data disks. The >> > usage on the disks is not uniform, and one is nearly full. Is there >> some >> > way to manually balance the files across the disks? Pretty much >> anything >> > done via nodetool incurs an anticompaction with obviously fails. >> system/ is >> > not the problem, it's in my data's keyspace. >> > >> > Ian >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com >> > >
Re: Perl/Thrift/Cassandra strangeness
On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook wrote: JS> The point is to get the "last" super-column. ... JS> Is the Perl Thrift client problematic, or is there something else that JS> I am missing? Try Net::Cassandra::Easy; if it does what you want, look at the debug output or trace the code to see how the predicate is specified so you can duplicate that in your own code. In general yes, the Perl Thrift interface is problematic. It's slow and semantically inconsistent. Ted
Cassandra won't start after node crash
Hello, I've had a server crash, and after rebooting I cannot start the Cassandra instance, it's a one-node cluster. I'm running cassandra 0.6.1 on Debian Linux and jre 1.6.0_12. Is my data lost, should I recreate the DB? The error message is: INFO 12:46:30,823 Auto DiskAccessMode determined to be standard INFO 12:46:31,084 Sampling index for /usr/local/cassandra/data/system/LocationInfo-9-Data.db INFO 12:46:31,084 Sampling index for /usr/local/cassandra/data/system/LocationInfo-10-Data.db INFO 12:46:31,084 Sampling index for /usr/local/cassandra/data/system/LocationInfo-11-Data.db INFO 12:46:31,135 Sampling index for /usr/local/cassandra/data/Empire/CampaignCampaignRuns-469-Data.db INFO 12:46:31,135 Sampling index for /usr/local/cassandra/data/Empire/CampaignCampaignRuns-470-Data.db INFO 12:46:31,135 Sampling index for /usr/local/cassandra/data/Empire/Open-85-Data.db INFO 12:46:35,772 Sampling index for /usr/local/cassandra/data/Empire/Open-106-Data.db INFO 12:46:36,864 Sampling index for /usr/local/cassandra/data/Empire/Open-283-Data.db INFO 12:46:37,228 Sampling index for /usr/local/cassandra/data/Empire/Open-372-Data.db INFO 12:46:37,436 Sampling index for /usr/local/cassandra/data/Empire/Open-526-Data.db INFO 12:46:37,644 Sampling index for /usr/local/cassandra/data/Empire/Open-535-Data.db INFO 12:46:37,644 Sampling index for /usr/local/cassandra/data/Empire/Open-536-Data.db INFO 12:46:37,644 Sampling index for /usr/local/cassandra/data/Empire/Open-537-Data.db ERROR 12:46:37,644 Corrupt file /usr/local/cassandra/data/Empire/Open-537-Data.db; skipped java.io.UTFDataFormatException: malformed input around byte 0 at java.io.DataInputStream.readUTF(DataInputStream.java:639) at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) at org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261) at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125) at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114) at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248) at org.apache.cassandra.db.Table.(Table.java:338) at org.apache.cassandra.db.Table.open(Table.java:199) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) INFO 12:46:37,644 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunClickStream-9-Data.db INFO 12:46:37,644 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunClickStream-454-Data.db INFO 12:46:37,696 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunOpenStream-9-Data.db INFO 12:46:37,696 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunOpenStream-14-Data.db INFO 12:46:37,696 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunOpenStream-27-Data.db INFO 12:46:37,748 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunOpenStream-456-Data.db ERROR 12:46:37,748 Corrupt file /usr/local/cassandra/data/Empire/CampaignRunOpenStream-456-Data.db; skipped java.io.UTFDataFormatException: malformed input around byte 48 at java.io.DataInputStream.readUTF(DataInputStream.java:617) at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) at org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261) at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125) at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114) at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248) at org.apache.cassandra.db.Table.(Table.java:338) at org.apache.cassandra.db.Table.open(Table.java:199) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) INFO 12:46:37,748 Sampling index for /usr/local/cassandra/data/Empire/Click-21-Data.db INFO 12:46:38,788 Sampling index for /usr/local/cassandra/data/Empire/Click-26-Data.db INFO 12:46:39,048 Sampling index for /usr/local/cassandra/data/Empire/Click-259-Data.db INFO 12:46:39,412 Sampling index for /usr/local/cassandra/data/Empire/Click-476-Data.db INFO 12:46:39,464 Sampling index for /usr/local/cassandra/data/Empire/Click-477-Data.db INFO 12:46:39,464 Sampling index for /usr/local/cassandra/data/Empire/Click-478-Data.db INFO 12:46:39,464 Sampling index for /usr/local/cassandra/data/Empire/CampaignRunUniqueOpen-9-Data.db INFO 12:46:39,464 Sampling index for /usr/local/cassandr
Re: Tree Search in Cassandra
On Tue, Jun 8, 2010 at 1:28 AM, David Boxenhorn wrote: > As I said above, I was wondering if I could come up with a robust algorithm, > e.g. creating the new super columns and then attaching them at the end, > which will not FUBAR my index if it fails. > Is this append-only? That is, never delete or insert in the middle? If so, it might be easier to build something like this. -+ Tatu +-
Re: Tree Search in Cassandra
No, there will be deletes and inserts in the middle. But I can assume that the index will only grow. There will be few deletes. On Tue, Jun 8, 2010 at 7:04 PM, Tatu Saloranta wrote: > On Tue, Jun 8, 2010 at 1:28 AM, David Boxenhorn wrote: > > As I said above, I was wondering if I could come up with a robust > algorithm, > > e.g. creating the new super columns and then attaching them at the end, > > which will not FUBAR my index if it fails. > > > > Is this append-only? That is, never delete or insert in the middle? If > so, it might be easier to build something like this. > > -+ Tatu +- >
Re: Perl/Thrift/Cassandra strangeness
I was misreading the result with the original slice range. I should have been expecting exactly 2 ColumnOrSuperColumns, which is what I got. I was erroneously expecting only 1. Thanks! Jonathan 2010/6/8 Ted Zlatanov : > On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook wrote: > > JS> The point is to get the "last" super-column. > ... > JS> Is the Perl Thrift client problematic, or is there something else that > JS> I am missing? > > Try Net::Cassandra::Easy; if it does what you want, look at the debug > output or trace the code to see how the predicate is specified so you > can duplicate that in your own code. > > In general yes, the Perl Thrift interface is problematic. It's slow and > semantically inconsistent. > > Ted > >
Re: Duplicate a node (replication).
yes, if you're going from 1 to 2 then 1. nodetool drain & stop original node 2. copy everything from *your keyspaces* in data/ directories (but not system keyspace!) to new node 3. start both nodes with replicationfactor=2 and autobootstrap=false [the default] will be faster. On Tue, Jun 8, 2010 at 7:12 AM, xavier manach wrote: > Hi. > > I have a cluster with only 1 node with a lot of datas (500 Go) . > I want add a new node with the same datas (with a ReplicationFactor 2) > > The method normal is : > stop node. > add a node. > change replication factor to 2. > start nodes > use nodetool repair > > But , I didn't know if this other method is valid, and if it's can be > faster : > stop nodes. > copy all SSTables > change replication factor. > start nodes > and > use nodetool repair > > Have you an idea for the faster valid method ? > > Thx. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Compaction bringing a node to its knees
I'm curious, did this help at all? On Sat, May 29, 2010 at 3:03 PM, Jonathan Ellis wrote: > You could try setting the compaction thread to a lower priority. You > could add a thread priority to NamedThreadPool, and pass that up from > CompactionExecutor constructor. According to > http://www.javamex.com/tutorials/threads/priority_what.shtml you have > to run as root and add a JVM option to get this to work. > > On Sat, May 29, 2010 at 2:55 PM, James Golick wrote: >> I just experienced a compaction that brought a node to 100% of its IO >> capacity and made its responses incredibly slow. >> It wasn't enough to make the node actually appear as down, though, so it >> slowed down the operation of the cluster considerably. >> The CF being compacted contains a lot of relatively wide rows (hundreds of >> thousands or millions of columns on the big end). Is that the problem? >> Any suggestions on how to minimize impact here? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Perl/Thrift/Cassandra strangeness
Possible bug... Using a slice range with the empty sentinel values, and a count of 1 sometimes yields 2 ColumnOrSuperColumns, sometimes 1. The inconsistency had lead me to believe that the count was not working, hence the additional confusion. There was a particular key which returns exactly 2 ColumnOrSuperColumns. This happened repeatedly, even when other data was inserted before or after. All of the other keys were returning the expected 1 ColumnOrSuperColumn. Once I added a 4th super column to the key in question, it started behaving the same as the others, yielding exactly 1 ColumnOrSuperColumn. here is the code: for the predicate: my $predicate = new Cassandra::SlicePredicate(); my $slice_range = new Cassandra::SliceRange(); $slice_range->{start} = ''; $slice_range->{finish} = ''; $slice_range->{reversed} = 1; $slice_range->{count} = 1; $predicate->{slice_range} = $slice_range; The columns are in the right order (reversed), so I'll get what I need by accessing only the first result in each slice. If I wanted to iterate the returned list of slices, it would manifest as a bug in my client. (Cassandra 6.1/Thrift/Perl) On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook wrote: > I was misreading the result with the original slice range. > I should have been expecting exactly 2 ColumnOrSuperColumns, which is > what I got. I was erroneously expecting only 1. > > Thanks! > Jonathan > > > 2010/6/8 Ted Zlatanov : >> On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook wrote: >> >> JS> The point is to get the "last" super-column. >> ... >> JS> Is the Perl Thrift client problematic, or is there something else that >> JS> I am missing? >> >> Try Net::Cassandra::Easy; if it does what you want, look at the debug >> output or trace the code to see how the predicate is specified so you >> can duplicate that in your own code. >> >> In general yes, the Perl Thrift interface is problematic. It's slow and >> semantically inconsistent. >> >> Ted >> >> >
Re: Perl/Thrift/Cassandra strangeness
that does sound like a bug. can you give us the data to insert that allows reproducing this? On Tue, Jun 8, 2010 at 10:20 AM, Jonathan Shook wrote: > Possible bug... > > Using a slice range with the empty sentinel values, and a count of 1 > sometimes yields 2 ColumnOrSuperColumns, sometimes 1. > The inconsistency had lead me to believe that the count was not > working, hence the additional confusion. > > There was a particular key which returns exactly 2 > ColumnOrSuperColumns. This happened repeatedly, even when other data > was inserted before or after. All of the other keys were returning the > expected 1 ColumnOrSuperColumn. > > Once I added a 4th super column to the key in question, it started > behaving the same as the others, yielding exactly 1 > ColumnOrSuperColumn. > > here is the code: for the predicate: > > my $predicate = new Cassandra::SlicePredicate(); > my $slice_range = new Cassandra::SliceRange(); > $slice_range->{start} = ''; > $slice_range->{finish} = ''; > $slice_range->{reversed} = 1; > $slice_range->{count} = 1; > $predicate->{slice_range} = $slice_range; > > The columns are in the right order (reversed), so I'll get what I need > by accessing only the first result in each slice. If I wanted to > iterate the returned list of slices, it would manifest as a bug in my > client. > > (Cassandra 6.1/Thrift/Perl) > > > On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook wrote: >> I was misreading the result with the original slice range. >> I should have been expecting exactly 2 ColumnOrSuperColumns, which is >> what I got. I was erroneously expecting only 1. >> >> Thanks! >> Jonathan >> >> >> 2010/6/8 Ted Zlatanov : >>> On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook wrote: >>> >>> JS> The point is to get the "last" super-column. >>> ... >>> JS> Is the Perl Thrift client problematic, or is there something else that >>> JS> I am missing? >>> >>> Try Net::Cassandra::Easy; if it does what you want, look at the debug >>> output or trace the code to see how the predicate is specified so you >>> can duplicate that in your own code. >>> >>> In general yes, the Perl Thrift interface is problematic. It's slow and >>> semantically inconsistent. >>> >>> Ted >>> >>> >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Perl/Thrift/Cassandra strangeness
I can't divulge this particular test data, as it was borrowed from a dataset which is not public. I will see if I can reproduce the scenario, however, using other data suitable for a bug report. On Tue, Jun 8, 2010 at 2:18 PM, Jonathan Ellis wrote: > that does sound like a bug. can you give us the data to insert that > allows reproducing this? > > On Tue, Jun 8, 2010 at 10:20 AM, Jonathan Shook wrote: >> Possible bug... >> >> Using a slice range with the empty sentinel values, and a count of 1 >> sometimes yields 2 ColumnOrSuperColumns, sometimes 1. >> The inconsistency had lead me to believe that the count was not >> working, hence the additional confusion. >> >> There was a particular key which returns exactly 2 >> ColumnOrSuperColumns. This happened repeatedly, even when other data >> was inserted before or after. All of the other keys were returning the >> expected 1 ColumnOrSuperColumn. >> >> Once I added a 4th super column to the key in question, it started >> behaving the same as the others, yielding exactly 1 >> ColumnOrSuperColumn. >> >> here is the code: for the predicate: >> >> my $predicate = new Cassandra::SlicePredicate(); >> my $slice_range = new Cassandra::SliceRange(); >> $slice_range->{start} = ''; >> $slice_range->{finish} = ''; >> $slice_range->{reversed} = 1; >> $slice_range->{count} = 1; >> $predicate->{slice_range} = $slice_range; >> >> The columns are in the right order (reversed), so I'll get what I need >> by accessing only the first result in each slice. If I wanted to >> iterate the returned list of slices, it would manifest as a bug in my >> client. >> >> (Cassandra 6.1/Thrift/Perl) >> >> >> On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook wrote: >>> I was misreading the result with the original slice range. >>> I should have been expecting exactly 2 ColumnOrSuperColumns, which is >>> what I got. I was erroneously expecting only 1. >>> >>> Thanks! >>> Jonathan >>> >>> >>> 2010/6/8 Ted Zlatanov : On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook wrote: JS> The point is to get the "last" super-column. ... JS> Is the Perl Thrift client problematic, or is there something else that JS> I am missing? Try Net::Cassandra::Easy; if it does what you want, look at the debug output or trace the code to see how the predicate is specified so you can duplicate that in your own code. In general yes, the Perl Thrift interface is problematic. It's slow and semantically inconsistent. Ted >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: Handling disk-full scenarios
Sounds like you ran into https://issues.apache.org/jira/browse/CASSANDRA-1169. The only workaround until that is fixed is to re-run repair. On Tue, Jun 8, 2010 at 7:17 AM, Ian Soboroff wrote: > And three days later, AE stages are still running full-bore. So I conclude > this is not a very good approach. > > I wonder what will happen when I lose a disk (which is essentially the same > as what I did -- rm the data directory). What happens if I lose a disk > while the AE stages are running? Since my RF is 3, I assume that I have > data loss when three disks are gone. > > Not very happy. I'm going to blow away what I have, do another reload, then > try dropping a disk again, just to confirm the results... I can't really > believe this is how it should happen. > > Ian > > On Fri, Jun 4, 2010 at 12:50 PM, Ian Soboroff wrote: >> >> Story continued, in hopes this experience is useful to someone... >> >> I shut down the node, removed the huge file, restarted the node, and told >> everybody to repair. Two days later, AE stages are still running. >> >> Ian >> >> On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis wrote: >>> >>> this is why JBOD configuration is contraindicated for cassandra. >>> http://wiki.apache.org/cassandra/CassandraHardware >>> >>> On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff wrote: >>> > My nodes have 5 disks and are using them separately as data disks. The >>> > usage on the disks is not uniform, and one is nearly full. Is there >>> > some >>> > way to manually balance the files across the disks? Pretty much >>> > anything >>> > done via nodetool incurs an anticompaction with obviously fails. >>> > system/ is >>> > not the problem, it's in my data's keyspace. >>> > >>> > Ian >>> > >>> > >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of Riptano, the source for professional Cassandra support >>> http://riptano.com >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra 0.6.2 with thrift hector-0.6.0-13
Java transports buffer internally. there is no TBufferedTransport the way there is in C#. (moving to user@) On Tue, Jun 8, 2010 at 10:31 AM, Subrata Roy wrote: > We are using Cassandra 0.6.2 with hector/thrift client, and our > application performance is really slow. We are not sure that it is > because of hector/thrift connection or not. Jonathan E. and other > people has suggested that using "TBuffferedTransport(TSocket) instead of > a TSocket directly" performance has drastically improved. If that is > the case, why TBuffferedTransport (TSocket) is not added as part of the > hector client by default? Is there any technical reason not to add as > part of the Cassandra 0.6.2/ hector-0.6.0-13 release? > > > > Thanks in advance for your valuable input. > > > > Regards > > Subrata > > > > /// Code snippet from Hector client: CassandraClientFactory.java > > > > private Cassandra.Client createThriftClient(String url, int port) > > throws TTransportException , TException { > > log.debug("Creating a new thrift connection to {}:{}", url, port); > > TTransport tr; > > if (useThriftFramedTransport) { > > tr = new TFramedTransport(new TSocket(url, port, timeout)); > > } else { > > tr = new TSocket(url, port, timeout); --- change to > TBuffferedTransport () for better performance > > } > > TProtocol proto = new TBinaryProtocol(tr); > > Cassandra.Client client = new Cassandra.Client(proto); > > try { > > tr.open(); > > } catch (TTransportException e) { > > // Thrift exceptions aren't very good in reporting, so we have to > catch the exception here and > > // add details to it. > > log.error("Unable to open transport to " + url + ":" + port, e); > > clientMonitor.incCounter(Counter.CONNECT_ERROR); > > throw new TTransportException("Unable to open transport to " + url > + ":" + port + " , " + > > e.getLocalizedMessage(), e); > > } > > return client; > > } > > > > > > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra won't start after node crash
Sounds like you had some bad hardware take down your index files. (Cassandra fsyncs them after writing them and before renaming them to being live, so if it's missing pieces then it's always been hardware at fault that I have seen. You could try rebuilding your index files from the data files, but they may be toast, too. So: step 1, run bin/sstable2json to make sure your data files are actually okay. Step 2, rebuild your index files from your data files. I can never muster up the energy to make an index rebuilder in Java. So here's one in Python. (I recommend testing this on a sstable + index pair that are known to be good, before trusting it to rebuild a damaged index. In particular I think it might be broken with a 32bit python instead of 64bit. Works On My Machine!) # usage: buildindex import sys, struct, stat, os infname, outfname = sys.argv[1:3] if '-Data' not in infname: raise Exception('%s does not look like a Cassandra data filename' % infname) inf = open(infname, 'r') outf = open(outfname, 'w') fsize = os.stat(infname)[stat.ST_SIZE] while inf.tell() < fsize: # read current row key and write index entry dataposition = inf.tell() keysize, = struct.unpack('>H', inf.read(2)) key = inf.read(keysize) outf.write(struct.pack('>H', keysize)) outf.write(key) outf.write(struct.pack('>q', dataposition)) # skip to the next row datasize, = struct.unpack('>i', inf.read(4)) inf.seek(inf.tell() + datasize) On Tue, Jun 8, 2010 at 8:56 AM, Lucas Di Pentima wrote: > Hello, > > I've had a server crash, and after rebooting I cannot start the Cassandra > instance, it's a one-node cluster. I'm running cassandra 0.6.1 on Debian > Linux and jre 1.6.0_12. > > Is my data lost, should I recreate the DB? > > The error message is: > > > INFO 12:46:30,823 Auto DiskAccessMode determined to be standard > INFO 12:46:31,084 Sampling index for > /usr/local/cassandra/data/system/LocationInfo-9-Data.db > INFO 12:46:31,084 Sampling index for > /usr/local/cassandra/data/system/LocationInfo-10-Data.db > INFO 12:46:31,084 Sampling index for > /usr/local/cassandra/data/system/LocationInfo-11-Data.db > INFO 12:46:31,135 Sampling index for > /usr/local/cassandra/data/Empire/CampaignCampaignRuns-469-Data.db > INFO 12:46:31,135 Sampling index for > /usr/local/cassandra/data/Empire/CampaignCampaignRuns-470-Data.db > INFO 12:46:31,135 Sampling index for > /usr/local/cassandra/data/Empire/Open-85-Data.db > INFO 12:46:35,772 Sampling index for > /usr/local/cassandra/data/Empire/Open-106-Data.db > INFO 12:46:36,864 Sampling index for > /usr/local/cassandra/data/Empire/Open-283-Data.db > INFO 12:46:37,228 Sampling index for > /usr/local/cassandra/data/Empire/Open-372-Data.db > INFO 12:46:37,436 Sampling index for > /usr/local/cassandra/data/Empire/Open-526-Data.db > INFO 12:46:37,644 Sampling index for > /usr/local/cassandra/data/Empire/Open-535-Data.db > INFO 12:46:37,644 Sampling index for > /usr/local/cassandra/data/Empire/Open-536-Data.db > INFO 12:46:37,644 Sampling index for > /usr/local/cassandra/data/Empire/Open-537-Data.db > ERROR 12:46:37,644 Corrupt file > /usr/local/cassandra/data/Empire/Open-537-Data.db; skipped > java.io.UTFDataFormatException: malformed input around byte 0 > at java.io.DataInputStream.readUTF(DataInputStream.java:639) > at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887) > at > org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261) > at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125) > at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114) > at > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248) > at org.apache.cassandra.db.Table.(Table.java:338) > at org.apache.cassandra.db.Table.open(Table.java:199) > at > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91) > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) > INFO 12:46:37,644 Sampling index for > /usr/local/cassandra/data/Empire/CampaignRunClickStream-9-Data.db > INFO 12:46:37,644 Sampling index for > /usr/local/cassandra/data/Empire/CampaignRunClickStream-454-Data.db > INFO 12:46:37,696 Sampling index for > /usr/local/cassandra/data/Empire/CampaignRunOpenStream-9-Data.db > INFO 12:46:37,696 Sampling index for > /usr/local/cassandra/data/Empire/CampaignRunOpenStream-14-Data.db > INFO 12:46:37,696 Sampling index for > /usr/local/cassandra/data/Empire/CampaignRunOpenStream-27-Data.db > INFO 12:46:37,748 Sampling index for > /usr/local/cassandra/data/Empire/CampaignRunOpenStream-456-Data.db > ERROR 12:46:37,748 Corrupt file > /usr/local/cassandra/data/Empire/Campaig
Re: Getting keys in a range sorted with respect to last access time
On Mon, Jun 7, 2010 at 9:04 AM, Utku Can Topçu wrote: > Hey All, > > First of all I'll start with some questions on the default behavior of > get_range_slices method defined in the thrift API. > > Given a keyrange with start-key "kstart" and end-key "kend", assuming > kstart * Is it true that I'll get the range [kstart,kend) (kstart inclusive, kend > exclusive)? [start, end] > * What's the default order of the rows in the result list? (assuming I am > using an OPP) lexically by unicode code point > * (How) can we reverse the sorting order? write your own ReversedOPP. but maybe you mean "how do we scan in reversed order," in which case the answer is, "extend ColumnFamilyStore.getRangeRows" (not for the faint of heart, but not impossible). > * What would be the behavior in the case kstart>kend? Will I get an empty > result list? pretty sure it will error out. easy to verify experimentally. > Secondly, I have use case where I need to access the latest updated rows? > How can this be possible? Writing a new partitioner? No. You'd want to maintain a separate row containing the most recent updates. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Performance Characteristics of CASSANDRA-16 (Memory Efficient Compactions)
of course. compaction is always O(N) with the size of the data On Mon, Jun 7, 2010 at 9:51 AM, Jeremy Davis wrote: > Reads, ok.. What about Compactions? Is the cost of compacting going to be > ever increasing with the number of columns? > > > > On Sat, Jun 5, 2010 at 7:30 AM, Jonathan Ellis wrote: >> >> #16 is very simple: it allows you to make very large rows. That is all. >> >> Other things being equal, doing reads from really big rows will be >> slower (since the row index will take longer to read) and this patch >> does not change this. >> >> On Fri, Jun 4, 2010 at 5:47 PM, Jeremy Davis >> wrote: >> > >> > https://issues.apache.org/jira/browse/CASSANDRA-16 >> > >> > Can someone (Jonathan?) help me understand the performance >> > characteristics >> > of this patch? >> > Specifically: If I have an open ended CF, and I keep inserting with ever >> > increasing column names (for example current Time), will things >> > generally >> > work out ok performance wise? Or will I pay some ever increasing penalty >> > with the number of entries? >> > >> > My assumption is that you have bucketed things up for me by column name >> > order, and as long as I don't delete/modify/create a column in one of >> > the >> > old buckets, then things will work out ok. Or is this not at all what is >> > going on? >> > >> > Thanks, >> > -JD >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra on flash storage
But I think use SSD can boost read performance, this is the main problem now for us to use Cassandra. On Tue, Jun 8, 2010 at 10:16 PM, Jonathan Ellis wrote: > cassandra is designed to do less random i/o than b-tree based systems > like tokyo cabinet. ssds are not as useful for most workloads. > > On Mon, Jun 7, 2010 at 8:37 AM, Héctor Izquierdo > wrote: > > Hi everyone. > > > > I wanted to know if anybody has had any experience with cassandra on > flash > > storage. At work we have a cluster of 6 machines running Tokyotyrant on > > flash-io drives (320GB) each, but performance is not what we expected, > and > > we'are having some issues with replication and availability. It's also > hard > > to manage, and adding/removing nodes is pure hell. > > > > We can't afford test hardware with flash storage right now, so could > > somebody share his experience? > > > > Thank you very much > > > > Héctor Izquierdo > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: Re: Range search on keys not working?
what's the mean of opp? And How can i make the "start" and "finish" useful and make sense? 2010-06-09 9527 发件人: Ben Browning 发送时间: 2010-06-02 21:08:57 收件人: user 抄送: 主题: Re: Range search on keys not working? They exist because when using OPP they are useful and make sense. On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn wrote: > So why do the "start" and "finish" range parameters exist? > > On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning wrote: >> >> Martin, >> >> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller >> wrote: >> > I think you can specify an end key, but it should be a key which does >> > exist >> > in your column family. >> >> >> Logically, it doesn't make sense to ever specify an end key with >> random partitioner. If you specified a start key of "aaa" and and end >> key of "aac" you might get back as results "aaa", "zfc", "hik", etc. >> And, even if you have a key of "aab" it might not show up. Key ranges >> only make sense with order-preserving partitioner. The only time to >> ever use a key range with random partitioner is when you want to >> iterate over all keys in the CF. >> >> Ben >> >> >> > But maybe I'm off the track here and someone else here knows more about >> > this >> > key range stuff. >> > >> > Martin >> > >> > >> > From: David Boxenhorn [mailto:da...@lookin2.com] >> > Sent: Wednesday, June 02, 2010 2:30 PM >> > To: user@cassandra.apache.org >> > Subject: Re: Range search on keys not working? >> > >> > In other words, I should check the values as I iterate, and stop >> > iterating >> > when I get out of range? >> > >> > I'll try that! >> > >> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller >> > wrote: >> >> >> >> When not using OOP, you should not use something like 'CATEGORY/' as >> >> the >> >> end key. >> >> Use the empty string as the end key and limit the number of returned >> >> keys, >> >> as you did with >> >> the 'max' value. >> >> >> >> If I understand correctly, the end key is used to generate an end token >> >> by >> >> hashing it, and >> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/' >> >> as >> >> for >> >> hash('CATEGORY') and hash('CATEGORY/'). >> >> >> >> At least, this was the explanation I gave myself when I had the same >> >> problem. >> >> >> >> The solution is to iterate through the keys by always using the last >> >> key >> >> returned as the >> >> start key for the next call to get_range_slices, and the to drop the >> >> first >> >> element from >> >> the result. >> >> >> >> HTH, >> >> Martin >> >> >> >> >> >> From: David Boxenhorn [mailto:da...@lookin2.com] >> >> Sent: Wednesday, June 02, 2010 2:01 PM >> >> To: user@cassandra.apache.org >> >> Subject: Re: Range search on keys not working? >> >> >> >> The previous thread where we discussed this is called, "key is sorted?" >> >> >> >> >> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn >> >> wrote: >> >>> >> >>> I'm not using OPP. But I was assured on earlier threads (I asked >> >>> several >> >>> times to be sure) that it would work as stated below: the results >> >>> would not >> >>> be ordered, but they would be correct. >> >>> >> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt >> >>> wrote: >> >> Sounds like you are not using an order preserving partitioner? >> >> On Wed, Jun 2, 2010 at 13:48, David Boxenhorn >> wrote: >> > Range search on keys is not working for me. I was assured in >> > earlier >> > threads >> > that range search would work, but the results would not be ordered. >> > >> > I'm trying to get all the rows that start with "CATEGORY." >> > >> > I'm doing: >> > >> > String start = "CATEGORY."; >> > . >> > . >> > . >> > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start, >> > "CATEGORY/", max) >> > . >> > . >> > . >> > >> > in a loop, setting start to the last key each time - but I'm >> > getting >> > rows >> > that don't start with "CATEGORY."!! >> > >> > How do I get all rows that start with "CATEGORY."? >> >>> >> >> >> > >> > > > __ Information from ESET NOD32 Antivirus, version of virus signature database 5164 (20100601) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
Data loss and corruption
Hi all, We're starting to prototype Cassandra for use in a production system and became concerned about data corruption after reading the excellent article: http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/ where Evan Weaver writes: "Cassandra is an alpha product and could, theoretically, lose your data. In particular, if you change the schema specified in the storage-conf.xml file, you must follow these instructions carefully, or corruption will occur (this is going to be fixed). Also, the on-disk storage format is subject to change, making upgrading a bit difficult." Is database corruption a well-known or common problem with Cassandra? What sources of information would you recommend to help devise a strategy to minimize corruption risk, and to detect and recover when corruption does occur? Thanks, Hector Urroz
Seeds and AutoBoostrap
Hi, Just a quick question on seed nodes and auto bootstrap. Am I correct in that a seed node won't be able to AutoBootstrap? And if so, will a seed node newly added to an existing cluster then not take long time before it actually starts getting any work to it? I mean, if it doesn't start with moving some data to itself, it will have to wait until new data comes in and is determined to live on that new node. Correct? /Per