DataStax Brisk
How far behind is Brisk from the Cassandra release cycle? If 0.8.1 of Cassandra was released yesterday, when ( if it isn't already ) will the Brisk distribution implement 0.8.1? -sd -- Sasha Dolgy sasha.do...@gmail.com
word_count example in Cassandra 0.8.0
Hello, I am running into the following problem: I am running a single node cassandra setup (out of the box so to speak) and was trying out the code in apache-cassandra-0.8.0-src/examples/hadoop_word_count. The bin/word_count_setup seems to work fine as cassandra-cli reports that there are 1000 rows when I do list input_words limit 2000; (after connecting via connect 127.0.01/9160 and use wordcount;) However, after running bin/word_count it seems the reducer is not writing into cassandra as list output_words returns 0 rows. When setting the output reducer to filesystem I get results in /tmp/word_count0 /tmp/word_count1 etc . Has anybody observed the same problem and has an idea what might be wrong? Thanks. -- Markus
Re : Re : Re : get_range_slices result
what i want is that i get the records in the same order wich they were inserted. how can i get this using any type of comparator type if there is a code java for this it can be useful. De : aaron morton À : user@cassandra.apache.org Envoyé le : Mardi 28 Juin 2011 12h40 Objet : Re: Re : Re : get_range_slices result First thing is you really should upgrade from 0.6, the current release is 0.8. Info on time uuid's http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java If you are using a higher level client like Hector or Pelops it will take care of encoding for you. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 28 Jun 2011, at 22:20, karim abbouh wrote: can i have an example for using TimeUUIDType as comparator in a client java code. > > > > > >De : karim abbouh >À : "user@cassandra.apache.org" >Envoyé le : Lundi 27 Juin 2011 17h59 >Objet : Re : Re : get_range_slices result > > >i used TimeUUIDType as type in storage-conf.xml file > > > > >and i used it as comparator in my java code, >but in the execution i get exception : > >Erreur --java.io.UnsupportedEncodingException: TimeUUIDType > > > > >how can i write it? > > >BR > > > > > >De : David Boxenhorn >À : user@cassandra.apache.org >Cc : karim abbouh >Envoyé le : Vendredi 24 Juin 2011 11h25 >Objet : Re: Re : get_range_slices result > >You can get the best of both worlds by repeating the key in a column, >and creating a secondary index on that column. > >On Fri, Jun 24, 2011 at 1:16 PM, Sylvain Lebresne wrote: >> On Fri, Jun 24, 2011 at 10:21 AM, karim abbouh wrote: >>> i want get_range_slices() function returns records sorted(orded) by the >>> key(rowId) used during the insertion. >>> is it possible? >> >> You will have to use the OrderPreservingPartitioner. This is no >> without inconvenience however. >> See for instance >> http://wiki.apache.org/cassandra/StorageConfiguration#line-100 or >> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ >> that give more details on the pros and cons (the short version being >> that the main advantage of >> OrderPreservingPartitioner is what you're asking for, but it's main >> drawback is that load-balancing >> the cluster will likely be very very hard). >> >> In general the advice is to stick with RandomPartitioner and design a >> data model that avoids needing >> range slices (or at least needing that the result is sorted). This is >> very often not too hard and more >> efficient, and much more simpler than to deal with the load balancing >> problems of OrderPreservingPartitioner. >> >> -- >> Sylvain >> >>> >>> >>> De : aaron morton >>> À : user@cassandra.apache.org >>> Envoyé le : Jeudi 23 Juin 2011 20h30 >>> Objet : Re: get_range_slices result >>> >>> Not sure what your question is. >>> Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp >>> Cheers >>> - >>> Aaron Morton >>> Freelance Cassandra Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> On 23 Jun 2011, at 21:59, karim abbouh wrote: >>> >>> how can get_range_slices() function returns sorting key ? >>> BR >>> >>> >>> >>> >> > > > > >
Row cache
Hi, I am running Cassandra 0.7.4 and I monitor the nodes using JConsole. I am trying to figure out the location Cassandra read the returned rows and there are few strange things... 1. I am reading few rows (using Hector) and the org.apache.cassandra.db.ColumnFamilies...ReadCount remains 0 - It remains 0 with MEMTable and after flushing the MEMTable. 2. The column family is configured to run with row-cache and key-cache and although I am reading the same row over and over the row-cache size/requests remains 0. The key-cache size/requests attributes are changed. Why Cassandra does not cache a row that was requested few times? What the ReadCount attribute in ColumnFamilies indicates and why it remains zero. How can I know from where Cassandra read a row (from MEMTable,RowCache or SSTable)? does the following correct? In read operation Cassandra looks for the row in the MEMTable - if not found it looks in the row-cache - if not found it looks in SSTable (after looking in the key-cache to optimize the access to the SSTable)? 10x
Re: hadoop results
I think I'll do the former, thanks! On Wed, Jun 29, 2011 at 11:16 PM, aaron morton wrote: > How about get_slice() with reversed == true and count = 1 to get the > highest time UUID ? > > Or you can also store a column with a magic name that have the value of the > timeuuid that is the current metric to use. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 30 Jun 2011, at 06:35, William Oberman wrote: > > > I'll start with my question: given a CF with comparator TimeUUIDType, > what is the most efficient way to get the greatest column's value? > > > > Context: I've been running cassandra for a couple of months now, so > obviously it's time to start layering more on top :-) In my test > environment, I managed to get pig/hadoop running, and developed a few > scripts to collect metrics I've been missing since I switched from MySQL to > cassandra (including the ever useful "select count(*) from table" > equivalent). > > > > I was hoping to dump the results of this processing back into cassandra > for use in other tools/processes. My initial thought was: new CF called > "stats" with comparator TimeUUIDType. The basic idea being I'd store: > > stat_name -> time stat was computed (as UUID) -> value > > That way I can also see a historical perspective of any given stat for > auditing (and for cumulative stats to see trends). The stat_name itself is > a URI that is composed of "what" and any constraints on the "what" > (including an optional time range, if the stat supports it). E.g. > ClassOfSomething/ID/MetricName/OptionalTimeRange (or something, still > deciding on the format of the URI). But, right now, the only way I know to > get the "current" stat value would be to iterate over all columns (the > TimeUUIDs) and then return the last one. > > > > Thanks for any tips, > > > > will > > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com
Re: word_count example in Cassandra 0.8.0
fixed in 0.8.1. https://issues.apache.org/jira/browse/CASSANDRA-2727 On Thu, Jun 30, 2011 at 3:09 AM, Markus Mock wrote: > Hello, > I am running into the following problem: I am running a single node > cassandra setup (out of the box so to speak) and was trying out the code > in apache-cassandra-0.8.0-src/examples/hadoop_word_count. > The bin/word_count_setup seems to work fine as cassandra-cli reports that > there are 1000 rows when I do > list input_words limit 2000; > (after connecting via connect 127.0.01/9160 and use wordcount;) > However, after running bin/word_count it seems the reducer is not writing > into cassandra as list output_words returns 0 rows. > When setting the output reducer to filesystem I get results in > /tmp/word_count0 /tmp/word_count1 etc . > Has anybody observed the same problem and has an idea what might be wrong? > Thanks. > -- Markus > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
SimpleAuthenticator
Hi, I am encountering an error while trying to set up simple authentication in a test environment. *BACKGROUND* *Cassandra Version: ReleaseVersion: 0.7.2-0ubuntu4~lucid1* *OS Level: Linux cassandra1 2.6.32-32-server #62-Ubuntu SMP Wed Apr 20 22:07:43 UTC 2011 x86_64 GNU/Linux* *2 node cluster* Properties file exist in the following directory: * > /etc/cassandra/access.properties* * > /etc/cassandra/passwd.properties* The *authenticator element* in the */etc/cassandra/cassandra.yaml* file is set to: *authenticator: org.apache.cassandra.auth.SimpleAuthenticator* The *authority element* in the */etc/cassandra/cassandra.yaml *file is set to: *authority: org.apache.cassandra.auth.SimpleAuthority* The *cassandra.in.sh* file located in */usr/share/cassandra* has been updated to show the location of the properties files in the following manner: # Location of access.properties and passwd.properties JVM_OPTS=" -Dpasswd.properties=/etc/cassandra/passwd.properties -Daccess.properties=/etc/cassandra/access.properties" Also, the destination of the configuration directory: CASSANDRA_CONF=/etc/cassandra *ERROR* After setting DEBUG mode, I get the following error message in the * system.log*: INFO [main] 2011-06-30 10:12:01,365 AbstractCassandraDaemon.java (line 249) Cassandra shutting down... INFO [main] 2011-06-30 10:12:01,366 CassandraDaemon.java (line 159) Stop listening to thrift clients INFO [main] 2011-06-30 10:13:14,186 AbstractCassandraDaemon.java (line 77) Logging initialized INFO [main] 2011-06-30 10:13:14,196 AbstractCassandraDaemon.java (line 97) Heap size: 510263296/511311872 WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN [main] 2011-06-30 10:13:14,228 CLibrary.java (line 125) Unknown mlockall error 0 INFO [main] 2011-06-30 10:13:14,234 DatabaseDescriptor.java (line 121) Loading settings from file:/etc/cassandra/cassandra.yaml INFO [main] 2011-06-30 10:13:14,337 DatabaseDescriptor.java (line 181) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap ERROR [main] 2011-06-30 10:13:14,342 DatabaseDescriptor.java (line 405) Fatal configuration error org.apache.cassandra.config.ConfigurationException: When using org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties must be defined. at org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:200) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:100) at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) Data from the *output.log*: INFO 10:12:01,365 Cassandra shutting down... INFO 10:12:01,366 Stop listening to thrift clients INFO 10:13:14,186 Logging initialized INFO 10:13:14,196 Heap size: 510263296/511311872 WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN 10:13:14,228 Unknown mlockall error 0 INFO 10:13:14,234 Loading settings from file:/etc/cassandra/cassandra.yaml INFO 10:13:14,337 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap ERROR 10:13:14,342 Fatal configuration error org.apache.cassandra.config.ConfigurationException: When using org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties must be defined. at org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:200) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:100) at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) When using org.apache.cassandra.auth.S
RE: custom reconciling columns?
The reason to break it up is that the information will then be on different servers, so you can have server 1 spending time retrieving row 1, while you have server 2 retrieving row 2, and server 3 retrieving row 3... So instead of getting 3000 things from one server, you get 1000 from 3 servers in parallel... From: Yang [mailto:tedd...@gmail.com] Sent: Wednesday, June 29, 2011 12:07 AM To: user@cassandra.apache.org Subject: Re: custom reconciling columns? ok, here is the profiling result. I think this is consistent (having been trying to recover how to effectively use yourkit ...) see attached picture since I actually do not use the thrift interface, but just directly use the thrift.CassandraServer and run my code in the same JVM as cassandra, and was running the whole thing on a single box, there is no message serialization/deserialization cost. but more columns did add on to more time. the time was spent in the ConcurrentSkipListMap operations that implement the memtable. regarding breaking up the row, I'm not sure it would reduce my run time, since our requirement is to read the entire rolling window history (we already have the TTL enabled , so the history is limited to a certain length, but it is quite long: over 1000 , in some cases, can be 5000 or more ) . I think accessing roughly 1000 items is not an uncommon requirement for many applications. in our case, each column has about 30 bytes of data, besides the meta data such as ttl, timestamp. at history length of 3000, the read takes about 12ms (remember this is completely in-memory, no disk access) I just took a look at the expiring column logic, it looks that the expiration does not come into play until when the CassandraServer.internal_get()===>thriftifyColumns() gets called. so the above memtable access time is still spent. yes, then breaking up the row is going to be helpful, but only to the degree of preventing accessing expired columns (btw if this is actually built into cassandra code it would be nicer, so instead of spending multiple key lookups, I locate to the row once, and then within the row, there are different "generation" buckets, so those old generation buckets that are beyond expiration are not read ); currently just accessing the 3000 live columns is already quite slow. I'm trying to see whether there are some easy magic bullets for a drop-in replacement for concurrentSkipListMap... Yang On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall wrote: I agree with Aaron's suggestion on data model and query here. Since there is a time component, you can split the row on a fixed duration for a given user, so the row key would become userId_[timestamp rounded to day]. This provides you an easy way to roll up the information for the date ranges you need since the key suffix can be created without a read. This also benefits from spreading the read load over the cluster instead of just the replicas since you have 30 rows in this case instead of one. On Tue, Jun 28, 2011 at 5:55 PM, aaron morton wrote: > Can you provide some more info: > - how big are the rows, e.g. number of columns and column size ? > - how much data are you asking for ? > - what sort of read query are you using ? > - what sort of numbers are you seeing ? > - are you deleting columns or using TTL ? > I would consider issues with the data churn, data model and query before > looking at serialisation. > Cheers > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > On 29 Jun 2011, at 10:37, Yang wrote: > > I can see that as my user history grows, the reads time proportionally ( or > faster than linear) grows. > if my business requirements ask me to keep a month's history for each user, > it could become too slow.- I was suspecting that it's actually the > serializing and deserializing that's taking time (I can definitely it's cpu > bound) > > > On Tue, Jun 28, 2011 at 3:04 PM, aaron morton > wrote: >> >> There is no facility to do custom reconciliation for a column. An append >> style operation would run into many of the same problems as the Counter >> type, e.g. not every node may get an append and there is a chance for lost >> appends unless you go to all the trouble Counter's do. >> >> I would go with using a row for the user and columns for each item. Then >> you can have fast no look writes. >> >> What problems are you seeing with the reads ? >> >> Cheers >> >> >> - >> Aaron Morton >> Freelanc
Re: Re : get_range_slices result
It should of course be noted that how hard it is to load balance depends a lot on your dataset Some datasets load balances reasonably well even when ordered and use of the OPP is not a big problem at all (on the contrary) and in quite a few use cases with current HW, read performance really isn't your problem in any case. You may for instance find it more useful to simplify adding nodes for growing data capacity to the "end" of the token range using OPP than getting extra performance you don't really need. Terje On Fri, Jun 24, 2011 at 7:16 PM, Sylvain Lebresne wrote: > On Fri, Jun 24, 2011 at 10:21 AM, karim abbouh wrote: > > i want get_range_slices() function returns records sorted(orded) by the > > key(rowId) used during the insertion. > > is it possible? > > You will have to use the OrderPreservingPartitioner. This is no > without inconvenience however. > See for instance > http://wiki.apache.org/cassandra/StorageConfiguration#line-100 or > > http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ > that give more details on the pros and cons (the short version being > that the main advantage of > OrderPreservingPartitioner is what you're asking for, but it's main > drawback is that load-balancing > the cluster will likely be very very hard). > > In general the advice is to stick with RandomPartitioner and design a > data model that avoids needing > range slices (or at least needing that the result is sorted). This is > very often not too hard and more > efficient, and much more simpler than to deal with the load balancing > problems of OrderPreservingPartitioner. > > -- > Sylvain > > > > > > > De : aaron morton > > À : user@cassandra.apache.org > > Envoyé le : Jeudi 23 Juin 2011 20h30 > > Objet : Re: get_range_slices result > > > > Not sure what your question is. > > Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp > > Cheers > > - > > Aaron Morton > > Freelance Cassandra Developer > > @aaronmorton > > http://www.thelastpickle.com > > On 23 Jun 2011, at 21:59, karim abbouh wrote: > > > > how can get_range_slices() function returns sorting key ? > > BR > > > > > > > > >
Re: custom reconciling columns?
thanks. but then the client application has the responsibility to sort the 3 segments (assuming that I need to order the "user browsing history" in the example), I guess the total time would not be significantly different. also this results in 3 times more seeks while the original way needs only one seek. this is probably fine if my cluster is mostly idle, but if it's mostly busy, the load is going to increase. now my thinking is that the read path does not really need a map (the thrift api is a list of columns anyway, sorted), so it's a luxury to construct a map (in fact a sortedmap) in the internal process. we could very well just use a sorted list to do the read path, which would be much faster. (hacking out this idea today ...) yang On Thu, Jun 30, 2011 at 8:27 AM, Jeremiah Jordan < jeremiah.jor...@morningstar.com> wrote: > ** > The reason to break it up is that the information will then be on different > servers, so you can have server 1 spending time retrieving row 1, while you > have server 2 retrieving row 2, and server 3 retrieving row 3... So instead > of getting 3000 things from one server, you get 1000 from 3 servers in > parallel... > > -- > *From:* Yang [mailto:tedd...@gmail.com] > *Sent:* Wednesday, June 29, 2011 12:07 AM > *To:* user@cassandra.apache.org > *Subject:* Re: custom reconciling columns? > > ok, here is the profiling result. I think this is consistent (having been > trying to recover how to effectively use yourkit ...) see attached picture > > since I actually do not use the thrift interface, but just directly use the > thrift.CassandraServer and run my code in the same JVM as cassandra, > and was running the whole thing on a single box, there is no message > serialization/deserialization cost. but more columns did add on to more > time. > > the time was spent in the ConcurrentSkipListMap operations that implement > the memtable. > > > regarding breaking up the row, I'm not sure it would reduce my run time, > since our requirement is to read the entire rolling window history (we > already have > the TTL enabled , so the history is limited to a certain length, but it is > quite long: over 1000 , in some cases, can be 5000 or more ) . I think > accessing roughly 1000 items is not an uncommon requirement for many > applications. in our case, each column has about 30 bytes of data, besides > the meta data such as ttl, timestamp. > at history length of 3000, the read takes about 12ms (remember this is > completely in-memory, no disk access) > > I just took a look at the expiring column logic, it looks that the > expiration does not come into play until when the > CassandraServer.internal_get()===>thriftifyColumns() gets called. so the > above memtable access time is still spent. yes, then breaking up the row is > going to be helpful, but only to the degree of preventing accessing > expired columns (btw if this is actually built into cassandra code it > would be nicer, so instead of spending multiple key lookups, I locate to the > row once, and then within the row, there are different "generation" buckets, > so those old generation buckets that are beyond expiration are not read ); > currently just accessing the 3000 live columns is already quite slow. > > I'm trying to see whether there are some easy magic bullets for a drop-in > replacement for concurrentSkipListMap... > > Yang > > > > > On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall wrote: > >> I agree with Aaron's suggestion on data model and query here. Since >> there is a time component, you can split the row on a fixed duration >> for a given user, so the row key would become userId_[timestamp >> rounded to day]. >> >> This provides you an easy way to roll up the information for the date >> ranges you need since the key suffix can be created without a read. >> This also benefits from spreading the read load over the cluster >> instead of just the replicas since you have 30 rows in this case >> instead of one. >> >> On Tue, Jun 28, 2011 at 5:55 PM, aaron morton >> wrote: >> > Can you provide some more info: >> > - how big are the rows, e.g. number of columns and column size ? >> > - how much data are you asking for ? >> > - what sort of read query are you using ? >> > - what sort of numbers are you seeing ? >> > - are you deleting columns or using TTL ? >> > I would consider issues with the data churn, data model and query before >> > looking at serialisation. >> > Cheers >> > - >> > Aaron Morton >> > Freelance Cassandra Developer >> > @aaronmorton >> > http://www.thelastpickle.com >> > On 29 Jun 2011, at 10:37, Yang wrote: >> > >> > I can see that as my user history grows, the reads time proportionally ( >> or >> > faster than linear) grows. >> > if my business requirements ask me to keep a month's history for each >> user, >> > it could become too slow.- I was suspecting that it's actually the >> > serializing and deserializing that's taking time (I can defin
Re: Row cache
Here's my understanding of things ... (this applies only for the regular heap implementation of row cache) > Why Cassandra does not cache a row that was requested few times? What does the cache capacity read. Is it > 0? > What the ReadCount attribute in ColumnFamilies indicates and why it remains > zero. Hm I had that too one time (read count wont go up while there were reads). But I didn't have the time to debug. > How can I know from where Cassandra read a row (from MEMTable,RowCache or > SSTable)? It will always read from row cache or memtable(s) and sstable(s) jmx should tell you (hits go up) > does the following correct? In read operation Cassandra looks for the row in > the MEMTable - if not found it looks in the row-cache - if not found it looks > in SSTable (after looking in the key-cache to optimize the access to the > SSTable)? No. If row cache capacity is > 0 then a read will check if the row is in cache if not it read the entire row and cache it. Then / or if row was in cache already it will read from there and apply the respective filter to the cached CF. Writes update memtable and row cache when the row is cached. I must admit that I still dont quite understand why there's no race here. I haven't found any cache lock. So someone else should explain why a concurrent read / write cannot produce a lost update in the cached row. If capacity is 0 then it will read from the current memtable, the memtable(s) that are being flushed and all sstables that may contain the row (filtered by bloom filter) Hope that's correct and helps. Cheers, Daniel
Alternative Row Cache Implementation
Hi all - or rather devs we have been working on an alternative implementation to the existing row cache(s) We have 2 main goals: - Decrease memory -> get more rows in the cache without suffering a huge performance penalty - Reduce gc pressure This sounds a lot like we should be using the new serializing cache in 0.8. Unfortunately our workload consists of loads of updates which would invalidate the cache all the time. The second unfortunate thing is that the idea we came up with doesn't fit the new cache provider api... It looks like this: Like the serializing cache we basically only cache the serialized byte buffer. we don't serialize the bloom filter and try to do some other minor compression tricks (var ints etc not done yet). The main difference is that we don't deserialize but use the normal sstable iterators and filters as in the regular uncached case. So the read path looks like this: return filter.collectCollatedColumns(memtable iter, cached row iter) The write path is not affected. It does not update the cache During flush we merge all memtable updates with the cached rows. These are early test results: - Depending on row width and value size the serialized cache takes between 30% - 50% of memory compared with cached CF. This might be optimized further - Read times increase by 5 - 10% We haven't tested the effects on gc but hope that we will see improvements there because we only cache a fraction of objects (in terms of numbers) in old gen heap which should make gc cheaper. Of course there's also the option to use native mem like serializing cache does. We believe that this approach is quite promising but as I said it is not compatible with the current cache api. So my question is: does that sound interesting enough to open a jira or has that idea already been considered and rejected for some reason? Cheers, Daniel
Re: Alternative Row Cache Implementation
On Thu, Jun 30, 2011 at 12:44 PM, Daniel Doubleday wrote: > Hi all - or rather devs > > we have been working on an alternative implementation to the existing row > cache(s) > > We have 2 main goals: > > - Decrease memory -> get more rows in the cache without suffering a huge > performance penalty > - Reduce gc pressure > > This sounds a lot like we should be using the new serializing cache in 0.8. > Unfortunately our workload consists of loads of updates which would > invalidate the cache all the time. > > The second unfortunate thing is that the idea we came up with doesn't fit > the new cache provider api... > > It looks like this: > > Like the serializing cache we basically only cache the serialized byte > buffer. we don't serialize the bloom filter and try to do some other minor > compression tricks (var ints etc not done yet). The main difference is that > we don't deserialize but use the normal sstable iterators and filters as in > the regular uncached case. > > So the read path looks like this: > > return filter.collectCollatedColumns(memtable iter, cached row iter) > > The write path is not affected. It does not update the cache > > During flush we merge all memtable updates with the cached rows. > > These are early test results: > > - Depending on row width and value size the serialized cache takes between > 30% - 50% of memory compared with cached CF. This might be optimized further > - Read times increase by 5 - 10% > > We haven't tested the effects on gc but hope that we will see improvements > there because we only cache a fraction of objects (in terms of numbers) in > old gen heap which should make gc cheaper. Of course there's also the option > to use native mem like serializing cache does. > > We believe that this approach is quite promising but as I said it is not > compatible with the current cache api. > > So my question is: does that sound interesting enough to open a jira or has > that idea already been considered and rejected for some reason? > > Cheers, > Daniel > The problem I see with the row cache implementation is more of a JVM problem. This problem is not Cassandra localized (IMHO) as I hear Hbase people with similar large cache/ Xmx issues. Personally, I feel this is a sign of Java showing age. "Let us worry about the pointers" was a great solution when systems had 32MB memory, because the cost of walking the object graph was small and possible and small time windows. But JVM's already can not handle 13+ GB of RAM and it is quite common to see systems with 32-64GB physical memory. I am very curious to see how java is going to evolve on systems with 128GB or even higher memory. The G1 collector will help somewhat, however I do not see that really pushing Xmx higher then it is now. HBase has even went the route of using an off heap cache, https://issues.apache.org/jira/browse/HBASE-4018 , and some Jira mentions Cassandra exploring this alternative as well. Doing whatever possible to shrink the current size of item in cache is an awesome. Anything that delivers more bang for the buck is +1. However I feel that VFS cache is the only way to effectively cache large datasets. I was quite disappointed when I upped a machine from 16GB to 48 GB physical memory. I said to myself "Awesome! now I can shave off a couple of GB for larger row caches" I changed Xmx from 9GB to 13GB, upped the caches, and restarted. I found the system spending a lot of time managing heap, and also found that my compaction processes that did 200GB in 4 hours now were taking 6 or 8 hours. I had heard that JVMs "top out around 20GB" but I found they "top out" much lower. VFS cache +1
RE: bulk load
I am working on Cassandra for last 4 weeks and am trying to load large amount of data.I am trying to use the Bulk loading technique but am not clear with the process.Could some explain the process for the bulk load? Also Is the new bulk loading utility discussed in the previous posts available? Could some one help me in this regard? Priyanka -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/bulk-load-tp6505627p6534280.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: CQL injection attacks?
The CQL drivers are all still sitting on top of the execute_cql_query Thrift API method for now. On Wed, Jun 29, 2011 at 2:12 PM, wrote: > > Someone asked a while ago whether Cassandra was vulnerable to injection > attacks: > > http://stackoverflow.com/questions/5998838/nosql-injection-php-phpcassa-cassandra > > With Thrift, the answer was 'no'. > > With CQL, presumably the situation is different, at least until prepared > statements are possible (CASSANDRA-2475) ? > > Has there been any discussion on this already that someone could point me to, > please? I couldn't see anything on JIRA (searching for CQL AND injection, CQL > AND security, etc). > > Thanks. > > > This message was sent using IMP, the Internet Messaging Program. > > This email and any attachments to it may be confidential and are > intended solely for the use of the individual to whom it is addressed. > If you are not the intended recipient of this email, you must neither > take any action based upon its contents, nor copy or show it to anyone. > Please contact the sender if you believe you have received this email in > error. QinetiQ may monitor email traffic data and also the content of > email for the purposes of security. QinetiQ Limited (Registered in > England & Wales: Company Number: 3796233) Registered office: Cody Technology > Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com. >
SimpleAuthenticator
Hi, I am encountering an error while trying to set up simple authentication in a test environment. BACKGROUND (1) Cassandra Version: ReleaseVersion: 0.7.2-0ubuntu4~lucid1 (2) OS Level: Linux cassandra1 2.6.32-32-server #62-Ubuntu SMP Wed Apr 20 22:07:43 UTC 2011 x86_64 GNU/Linux 2 node cluster Properties file exist in the following directory: > /etc/cassandra/access.properties > /etc/cassandra/passwd.properties The authenticator element in the /etc/cassandra/cassandra.yamlfile is set to: authenticator: org.apache.cassandra.auth.SimpleAuthenticator The authority element in the /etc/cassandra/cassandra.yaml file is set to: authority: org.apache.cassandra.auth.SimpleAuthority The cassandra.in.shfile located in /usr/share/cassandrahas been updated to show the location of the properties files in the following manner: # Location of access.properties and passwd.properties JVM_OPTS=" -Dpasswd.properties=/etc/cassandra/passwd.properties -Daccess.properties=/etc/cassandra/access.properties" Also, the destination of the configuration directory: CASSANDRA_CONF=/etc/cassandra ERROR After setting DEBUG mode, I get the following error message in the system.log: INFO [main] 2011-06-30 10:12:01,365 AbstractCassandraDaemon.java (line 249) Cassandra shutting down... INFO [main] 2011-06-30 10:12:01,366 CassandraDaemon.java (line 159) Stop listening to thrift clients INFO [main] 2011-06-30 10:13:14,186 AbstractCassandraDaemon.java (line 77) Logging initialized INFO [main] 2011-06-30 10:13:14,196 AbstractCassandraDaemon.java (line 97) Heap size: 510263296/511311872 WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN [main] 2011-06-30 10:13:14,228 CLibrary.java (line 125) Unknown mlockall error 0 INFO [main] 2011-06-30 10:13:14,234 DatabaseDescriptor.java (line 121) Loading settings from file:/etc/cassandra/cassandra.yaml INFO [main] 2011-06-30 10:13:14,337 DatabaseDescriptor.java (line 181) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap ERROR [main] 2011-06-30 10:13:14,342 DatabaseDescriptor.java (line 405) Fatal configuration error org.apache.cassandra.config.ConfigurationException: When using org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties must be defined. at org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:200) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:100) at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) Data from the output.log: INFO 10:12:01,365 Cassandra shutting down... INFO 10:12:01,366 Stop listening to thrift clients INFO 10:13:14,186 Logging initialized INFO 10:13:14,196 Heap size: 510263296/511311872 WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later WARN 10:13:14,228 Unknown mlockall error 0 INFO 10:13:14,234 Loading settings from file:/etc/cassandra/cassandra.yaml INFO 10:13:14,337 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap ERROR 10:13:14,342 Fatal configuration error org.apache.cassandra.config.ConfigurationException: When using org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties must be defined. at org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:200) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:100) at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) When using org.apache.cassandra.auth
RE: Cassandra ACID
For your Consistency case, it is actually an ALL read that is needed, not an ALL write. ALL read, with what ever consistency level of write that you need (to support machines dyeing) is the only way to get consistent results in the face of a failed write which was at > ONE that went to one node, but not the others. From: AJ [mailto:a...@dude.podzone.net] Sent: Friday, June 24, 2011 11:28 PM To: user@cassandra.apache.org Subject: Re: Cassandra ACID Ok, here it is reworked; consider it a summary of the thread. If I left out an important point that you think is 100% correct even if you already mentioned it, then make some noise about it and provide some evidence so it's captured sufficiently. And, if you're in a debate, please try and get to a resolution; all will appreciate it. It will be evident below that Consistency is not the only thing that is "tunable", at least indirectly. Unfortunately, you still can't tunafish. Ar ar ar. Atomicity All individual writes are atomic at the row level. So, a batch mutate for one specific key will apply updates to all the columns for that one specific row atomically. If part of the single-key batch update fails, then all of the updates will be reverted since they all pertained to one key/row. Notice, I said 'reverted' not 'rolled back'. Note: atomicity and isolation are related to the topic of transactions but one does not imply the other. Even though row updates are atomic, they are not isolated from other users' updates or reads. Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Consistency Cassandra does not provide the same scope of Consistency as defined in the ACID standard. Consistency in C* does not include referential integrity since C* is not a relational database. Any referential integrity required would have to be handled by the client. Also, even though the official docs say that QUORUM writes/reads is the minimal consistency_level setting to guarantee full consistency, this assumes that the write preceding the read does not fail (see comments below). Therefore, an ALL write would be necessary prior to a QUORUM read of the same data. For a multi-dc scenario use an ALL write followed by a EACH_QUORUM read. Refs: http://wiki.apache.org/cassandra/ArchitectureOverview Isolation NOTHING is isolated; because there is no transaction support in the first place. This means that two or more clients can update the same row at the same time. Their updates of the same or different columns may be interleaved and leave the row in a state that may not make sense depending on your application. Note: this doesn't mean to say that two updates of the same column will be corrupted, obviously; columns are the smallest atomic unit ('atomic' in the more general thread-safe context). Refs: None that directly address this explicitly and clearly and in one place. Durability Updates are made highly durable at the level comparable to a DBMS by the use of the commit log. However, this requires "commitlog_sync: batch" in cassandra.yaml. For "some" performance improvement with "some" cost in durability you can specify "commitlog_sync: periodic". See discussion below for more details. Refs: Plenty + this thread. On 6/24/2011 1:46 PM, Jim Newsham wrote: On 6/23/2011 8:55 PM, AJ wrote: Can any Cassandra contributors/guru's confirm my understanding of Cassandra's degree of support for the ACID properties? I provide official references when known. Please let me know if I missed some good official documentation. Atomicity All individual writes are atomic at the row level. So, a batch mutate for one specific key will apply updates to all the columns for that one specific row atomically. If part of the single-key batch update fails, then all of the updates will be reverted since they all pertained to one key/row. Notice, I said 'reverted' not 'rolled back'. Note: atomicity and isolation are related to the topic of transactions but one does not imply the other. Even though row updates are atomic, they are not isolated from other users' updates or reads. Refs: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Consistency If you want 100% consistency, use consistency level QUORUM for both reads and writes and EACH_QUORUM in a multi-dc scenario. Refs: http://wiki.apache.org/cassandra/ArchitectureOverview This is a pretty narrow interpretation of consistency. In a traditional database, consistency prevents you from getting into a logically inconsistent state, where records in one table do not agree with records in another table. This includes referential integrity, cascading deletes, etc. It seems to me Cassandra has no support for this concept whatsoever.
Meaning of 'nodetool repair has to run within GCGraceSeconds'
I am little confused of the reason why nodetool repair has to run within GCGraceSeconds. The documentation at: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair is not very clear to me. How can a delete be 'unforgotten' if I don't run nodetool repair? (I understand that if a node is down for more than GCGraceSeconds, I should not get it up without resynching is completely. Otherwise deletes may reappear.http://wiki.apache.org/cassandra/DistributedDeletes ) But not sure how exactly nodetool repair ties into this mechanism of distributed deletes. Thanks for any clarifications.
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
On Thu, Jun 30, 2011 at 4:25 PM, A J wrote: > I am little confused of the reason why nodetool repair has to run > within GCGraceSeconds. > > The documentation at: > http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair > is not very clear to me. > > How can a delete be 'unforgotten' if I don't run nodetool repair? (I > understand that if a node is down for more than GCGraceSeconds, I > should not get it up without resynching is completely. Otherwise > deletes may reappear.http://wiki.apache.org/cassandra/DistributedDeletes > ) > But not sure how exactly nodetool repair ties into this mechanism of > distributed deletes. > > Thanks for any clarifications. > Read repair does NOT repair tombstones. Failed writes/tomstones with TimedoutException do not get hinted even if HH is on. https://issues.apache.org/jira/browse/CASSANDRA-2034. Thus tombstones can get lost. Because of this the only way to find lost tombstones is to anti-entropy repair. If you do not repair in the gc period a node could lose a tombstone and the row could be read repaired and resurrected. In our case, we are lucky, we delete rows when they get old and stale. While it is not great if a deleted row appears it is not harmful thus I can live with less repairing then most.
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
As I understand, it has to do with a node being up but missing the delete message (remember, if you apply the delete at CL.QUORUM, you can have almost half the replicas miss it and still succeed). Imagine that you have 3 nodes A, B, and C, each of which has a column 'foo' with a value 'bar'. Their state would be: A: 'foo':'bar' B: 'foo':'bar' C: 'foo':'bar' We attempt to delete column 'foo', and it succeeds on nodes A and B (meaning that we succeeded on CL.QUORUM). Unfortunately the packet going to node C runs afoul of the network gods and gets zapped in transit. The state is now: A: 'foo':deleted B: 'foo':deleted C: 'foo':'bar' If we try a read at this point, at CL.QUORUM, we are guaranteed to get at least one record that 'foo' was deleted and because of timestamps we know to tell the client as much. After GCGraceSeconds and a compaction, the state of the nodes will be: A: None B: None C: 'foo':'bar' Some time later, we attempt a read and just happen to get C's response first. The response will be that 'foo' is storing 'bar'. Not only that, but read repair happens as well, so the state will become: A: 'foo':'bar' B: 'foo':'bar' C: 'foo':'bar' We have the infamous undelete. - Original Message - From: "A J" To: user@cassandra.apache.org Sent: Thursday, June 30, 2011 8:25:29 PM Subject: Meaning of 'nodetool repair has to run within GCGraceSeconds' I am little confused of the reason why nodetool repair has to run within GCGraceSeconds. The documentation at: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair is not very clear to me. How can a delete be 'unforgotten' if I don't run nodetool repair? (I understand that if a node is down for more than GCGraceSeconds, I should not get it up without resynching is completely. Otherwise deletes may reappear.http://wiki.apache.org/cassandra/DistributedDeletes ) But not sure how exactly nodetool repair ties into this mechanism of distributed deletes. Thanks for any clarifications.
Re: SimpleAuthenticator
Found the fix myself, and wanted to share the resolution. Documentation states that the "cassandra.in.sh" file needs to be updated with the following values, if the properties files exist in the directory I've stipulated: JVM_OPTS="$JVM_OPTS -Dpasswd.properties=/etc/cassandra/passwd.properties" JVM_OPTS="$JVM_OPTS -Daccess.properties=/etc/cassandra/access.properties" Turns out that "cassandra.in.sh" was not being called at all during start up. Not sure if this is a bug or not, but to get around the issue I inserted the two lines above into the "cassandra-env.sh" file, started up the instance and ... the database comes up and I get prompted the following: root@cassandra1:/etc/cassandra# cassandra-cli Welcome to cassandra CLI. Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] connect 11.1.11.111/9160; Login failure. Did you specify 'keyspace', 'username' and 'password'? [default@unknown] -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/SimpleAuth-enticator-tp6534645p6534942.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Repair doesn't work after upgrading to 0.8.1
Hi all, I have upgraded all my cluster to 0.8.1. Today one of the disks in one of the nodes died. After replacing the disk I tried running repair, but this message appears: INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.77 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (170141183460469231731687303715884105727,28356863910078205288614550619314017621]: manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098 completed. INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.79 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (141784319550391026443072753096570088105,170141183460469231731687303715884105727]: manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf completed. INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 20:36:25,087 AntiEntropyService.java (line 782) No neighbors to repair with for sbs on (113427455640312821154458202477256070484,141784319550391026443072753096570088105]: manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a completed. What can I do?
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo wrote: > Read repair does NOT repair tombstones. It does, but you can't rely on RR to repair _all_ tombstones, because RR only happens if the row in question is requested by a client. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Thanks all ! In other words, I think it is safe to say that a node as a whole can be made consistent only on 'nodetool repair'. Has there been enough interest in providing anti-entropy without compaction as a separate operation (nodetool repair does both) ? On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: > On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo > wrote: >> Read repair does NOT repair tombstones. > > It does, but you can't rely on RR to repair _all_ tombstones, because > RR only happens if the row in question is requested by a client. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: custom reconciling columns?
ok, I kind of found the magic bullet , but you can only use it to shoot your enemy close really close range :) for read path, the thrift API already limits the output to a list of columns, so it does not make sense to use maps in the internal operations. plus the return CF on the read path is not going to be modified/shared by any other threads, so synchronization is not necessary. so the solution is to modify ColumnFamilyStore so that getTopLevelColumns takes a returnCF param, instead of always constructing it inside with ColumnFamily.create(). so only read path behavior is changed. in read path, we pass in a FastColumnFamily implementation, which uses an ArrayList internally to store sorted columns, and do binary search to insert , and merge to addAll(column). I tried out this, it's about 50% faster on rows with 3000 cols. Jonathan: do you think this is a viable approach? the only disadvantage is a slight change to getTopLevelColumns so we have 2 flavors of this method Thanks Yang On Wed, Jun 29, 2011 at 5:51 PM, Jonathan Ellis wrote: > On Tue, Jun 28, 2011 at 10:06 PM, Yang wrote: > > I'm trying to see whether there are some easy magic bullets for a drop-in > > replacement for concurrentSkipListMap... > > I'm highly interested if you find one. :) > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
It would be helpful if this was automated some how.
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: > On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo > wrote: > > Read repair does NOT repair tombstones. > > It does, but you can't rely on RR to repair _all_ tombstones, because > RR only happens if the row in question is requested by a client. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > Doh! Right. I was thinking about range scans and read repair. http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Alternative-to-repair-td6098108.html
Re: Alternative Row Cache Implementation
We had a visitor from Intel a month ago. One question from him was "What could you do if we gave you a server 2 years from now that had 16TB of memory" I went Eh... using Java? 2 years is maybe unrealistic, but you can already get some quite acceptable prices even on servers in the 100GB memory range now if you buy in larger quantities (30-50 servers and more in one go). I don't think it is unrealistic that we will start seeing high end consumer (x64) servers with TB's of memory in a few years and I really wonder were that puts java based software. Terje On Fri, Jul 1, 2011 at 2:25 AM, Edward Capriolo wrote: > > > On Thu, Jun 30, 2011 at 12:44 PM, Daniel Doubleday < > daniel.double...@gmx.net> wrote: > >> Hi all - or rather devs >> >> we have been working on an alternative implementation to the existing row >> cache(s) >> >> We have 2 main goals: >> >> - Decrease memory -> get more rows in the cache without suffering a huge >> performance penalty >> - Reduce gc pressure >> >> This sounds a lot like we should be using the new serializing cache in >> 0.8. >> Unfortunately our workload consists of loads of updates which would >> invalidate the cache all the time. >> >> The second unfortunate thing is that the idea we came up with doesn't fit >> the new cache provider api... >> >> It looks like this: >> >> Like the serializing cache we basically only cache the serialized byte >> buffer. we don't serialize the bloom filter and try to do some other minor >> compression tricks (var ints etc not done yet). The main difference is that >> we don't deserialize but use the normal sstable iterators and filters as in >> the regular uncached case. >> >> So the read path looks like this: >> >> return filter.collectCollatedColumns(memtable iter, cached row iter) >> >> The write path is not affected. It does not update the cache >> >> During flush we merge all memtable updates with the cached rows. >> >> These are early test results: >> >> - Depending on row width and value size the serialized cache takes between >> 30% - 50% of memory compared with cached CF. This might be optimized further >> - Read times increase by 5 - 10% >> >> We haven't tested the effects on gc but hope that we will see improvements >> there because we only cache a fraction of objects (in terms of numbers) in >> old gen heap which should make gc cheaper. Of course there's also the option >> to use native mem like serializing cache does. >> >> We believe that this approach is quite promising but as I said it is not >> compatible with the current cache api. >> >> So my question is: does that sound interesting enough to open a jira or >> has that idea already been considered and rejected for some reason? >> >> Cheers, >> Daniel >> > > > > The problem I see with the row cache implementation is more of a JVM > problem. This problem is not Cassandra localized (IMHO) as I hear Hbase > people with similar large cache/ Xmx issues. Personally, I feel this is a > sign of Java showing age. "Let us worry about the pointers" was a great > solution when systems had 32MB memory, because the cost of walking the > object graph was small and possible and small time windows. But JVM's > already can not handle 13+ GB of RAM and it is quite common to see systems > with 32-64GB physical memory. I am very curious to see how java is going to > evolve on systems with 128GB or even higher memory. > > The G1 collector will help somewhat, however I do not see that really > pushing Xmx higher then it is now. HBase has even went the route of using an > off heap cache, https://issues.apache.org/jira/browse/HBASE-4018 , and > some Jira mentions Cassandra exploring this alternative as well. > > Doing whatever possible to shrink the current size of item in cache is an > awesome. Anything that delivers more bang for the buck is +1. However I feel > that VFS cache is the only way to effectively cache large datasets. I was > quite disappointed when I upped a machine from 16GB to 48 GB physical > memory. I said to myself "Awesome! now I can shave off a couple of GB for > larger row caches" I changed Xmx from 9GB to 13GB, upped the caches, and > restarted. I found the system spending a lot of time managing heap, and also > found that my compaction processes that did 200GB in 4 hours now were taking > 6 or 8 hours. > > I had heard that JVMs "top out around 20GB" but I found they "top out" much > lower. VFS cache +1 > > >
Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'
Repair doesn't compact. Those are different processes already. maki On 2011/07/01, at 7:21, A J wrote: > Thanks all ! > In other words, I think it is safe to say that a node as a whole can > be made consistent only on 'nodetool repair'. > > Has there been enough interest in providing anti-entropy without > compaction as a separate operation (nodetool repair does both) ? > > > On Thu, Jun 30, 2011 at 5:27 PM, Jonathan Ellis wrote: >> On Thu, Jun 30, 2011 at 3:47 PM, Edward Capriolo >> wrote: >>> Read repair does NOT repair tombstones. >> >> It does, but you can't rely on RR to repair _all_ tombstones, because >> RR only happens if the row in question is requested by a client. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >>
Re: Alternative Row Cache Implementation
I'm interested. :) On Thu, Jun 30, 2011 at 11:44 AM, Daniel Doubleday wrote: > Hi all - or rather devs > > we have been working on an alternative implementation to the existing row > cache(s) > > We have 2 main goals: > > - Decrease memory -> get more rows in the cache without suffering a huge > performance penalty > - Reduce gc pressure > > This sounds a lot like we should be using the new serializing cache in 0.8. > Unfortunately our workload consists of loads of updates which would > invalidate the cache all the time. > > The second unfortunate thing is that the idea we came up with doesn't fit the > new cache provider api... > > It looks like this: > > Like the serializing cache we basically only cache the serialized byte > buffer. we don't serialize the bloom filter and try to do some other minor > compression tricks (var ints etc not done yet). The main difference is that > we don't deserialize but use the normal sstable iterators and filters as in > the regular uncached case. > > So the read path looks like this: > > return filter.collectCollatedColumns(memtable iter, cached row iter) > > The write path is not affected. It does not update the cache > > During flush we merge all memtable updates with the cached rows. > > These are early test results: > > - Depending on row width and value size the serialized cache takes between > 30% - 50% of memory compared with cached CF. This might be optimized further > - Read times increase by 5 - 10% > > We haven't tested the effects on gc but hope that we will see improvements > there because we only cache a fraction of objects (in terms of numbers) in > old gen heap which should make gc cheaper. Of course there's also the option > to use native mem like serializing cache does. > > We believe that this approach is quite promising but as I said it is not > compatible with the current cache api. > > So my question is: does that sound interesting enough to open a jira or has > that idea already been considered and rejected for some reason? > > Cheers, > Daniel > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: SimpleAuthenticator
cassandra.in.sh is old skool 0.6 series, 0.7 series uses cassandra-env.sh. The packages put it in /etc/cassandra. This works for me at the end of cassandra-env.sh JVM_OPTS="$JVM_OPTS -Dpasswd.properties=/etc/cassandra/passwd.properties" JVM_OPTS="$JVM_OPTS -Daccess.properties=/etc/cassandra/access.properties" btw at a minimum you should upgrade from 0.7.2 to 0.7.6-2 see https://github.com/apache/cassandra/blob/cassandra-0.7.6-2/NEWS.txt#L61 Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1 Jul 2011, at 02:20, Earl Barnes wrote: > Hi, > > I am encountering an error while trying to set up simple authentication in a > test environment. > > BACKGROUND > Cassandra Version: ReleaseVersion: 0.7.2-0ubuntu4~lucid1 > OS Level: Linux cassandra1 2.6.32-32-server #62-Ubuntu SMP Wed Apr 20 > 22:07:43 UTC 2011 x86_64 GNU/Linux > 2 node cluster > Properties file exist in the following directory: > > > /etc/cassandra/access.properties > > /etc/cassandra/passwd.properties > The authenticator element in the /etc/cassandra/cassandra.yaml file is set to: > authenticator: org.apache.cassandra.auth.SimpleAuthenticator > The authority element in the /etc/cassandra/cassandra.yaml file is set to: > authority: org.apache.cassandra.auth.SimpleAuthority > > The cassandra.in.sh file located in /usr/share/cassandra has been updated to > show the location of the properties files in the following manner: > > # Location of access.properties and passwd.properties > JVM_OPTS=" > -Dpasswd.properties=/etc/cassandra/passwd.properties > -Daccess.properties=/etc/cassandra/access.properties" > > Also, the destination of the configuration directory: > CASSANDRA_CONF=/etc/cassandra > > ERROR > After setting DEBUG mode, I get the following error message in the system.log: > > INFO [main] 2011-06-30 10:12:01,365 AbstractCassandraDaemon.java (line 249) > Cassandra shutting down... > INFO [main] 2011-06-30 10:12:01,366 CassandraDaemon.java (line 159) Stop > listening to thrift clients > INFO [main] 2011-06-30 10:13:14,186 AbstractCassandraDaemon.java (line 77) > Logging initialized > INFO [main] 2011-06-30 10:13:14,196 AbstractCassandraDaemon.java (line 97) > Heap size: 510263296/511311872 > WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version > of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later > WARN [main] 2011-06-30 10:13:14,227 CLibrary.java (line 93) Obsolete version > of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later > WARN [main] 2011-06-30 10:13:14,228 CLibrary.java (line 125) Unknown > mlockall error 0 > INFO [main] 2011-06-30 10:13:14,234 DatabaseDescriptor.java (line 121) > Loading settings from file:/etc/cassandra/cassandra.yaml > INFO [main] 2011-06-30 10:13:14,337 DatabaseDescriptor.java (line 181) > DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap > ERROR [main] 2011-06-30 10:13:14,342 DatabaseDescriptor.java (line 405) Fatal > configuration error > org.apache.cassandra.config.ConfigurationException: When using > org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties > must be defined. > at > org.apache.cassandra.auth.SimpleAuthenticator.validateConfiguration(SimpleAuthenticator.java:148) > at > org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:200) > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:100) > at > org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:217) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) > Data from the output.log: > > INFO 10:12:01,365 Cassandra shutting down... > INFO 10:12:01,366 Stop listening to thrift clients > INFO 10:13:14,186 Logging initialized > INFO 10:13:14,196 Heap size: 510263296/511311872 > WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. > Upgrade to JNA 3.2.7 or later > WARN 10:13:14,227 Obsolete version of JNA present; unable to read errno. > Upgrade to JNA 3.2.7 or later > WARN 10:13:14,228 Unknown mlockall error 0 > INFO 10:13:14,234 Loading settings from file:/etc/cassandra/cassandra.yaml > INFO 10:13:14,337 DiskAccessMode 'auto' determined to be mmap, > indexAccessMode is mmap > ERROR 10:13:14,342 Fatal configuration error > org.apache.cassandra.config.ConfigurationException: When using > org.apache.cassandra.auth.SimpleAuthenticator passwd.properties properties > must be defined. > at > org.apache.cassandra.auth.Si
Re: Repair doesn't work after upgrading to 0.8.1
This seems to be a known issue related to https://issues.apache.org/jira/browse/CASSANDRA-2818 e.g. https://issues.apache.org/jira/browse/CASSANDRA-2768 There was some discussion on the IRC list today, driftx said the simple fix was a full cluster restart. Or perhaps a rolling restart with the 2818 patch applied may work. Starting with "Dcassandra.load_ring_state=false" causes the node to rediscover the ring which may help (just a guess really). But if there is bad node start been passed around in gossip it will just get the bad state again. Anyone else ? - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1 Jul 2011, at 09:11, Héctor Izquierdo Seliva wrote: > Hi all, > > I have upgraded all my cluster to 0.8.1. Today one of the disks in one > of the nodes died. After replacing the disk I tried running repair, but > this message appears: > > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.77 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair > with for sbs on > (170141183460469231731687303715884105727,28356863910078205288614550619314017621]: > manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098 completed. > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.79 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair > with for sbs on > (141784319550391026443072753096570088105,170141183460469231731687303715884105727]: > manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf completed. > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > 20:36:25,087 AntiEntropyService.java (line 782) No neighbors to repair > with for sbs on > (113427455640312821154458202477256070484,141784319550391026443072753096570088105]: > manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a completed. > > What can I do? >
cassandra inside beanstalk?
Hello; I would like to run cassandra inside beanstalk; http://aws.amazon.com/elasticbeanstalk/ along with the distributed client application; this blog advises against it http://www.evidentsoftware.com/embedding-cassandra-within-tomcat-for-testing/ is it really so? can cassandra be tuned not to be a resource hog and co-exist peacefully with the client? can client take advantage of proximity running on the same node as cassandra? Thank you, Andrei.
Re: Repair doesn't work after upgrading to 0.8.1
This isn't 2818 -- (a) the 0.8.1 protocol is identical to 0.8.0 and (b) the whole cluster is on the same version. On Thu, Jun 30, 2011 at 9:35 PM, aaron morton wrote: > This seems to be a known issue related > to https://issues.apache.org/jira/browse/CASSANDRA-2818 e.g. https://issues.apache.org/jira/browse/CASSANDRA-2768 > There was some discussion on the IRC list today, driftx said the simple fix > was a full cluster restart. Or perhaps a rolling restart with the 2818 patch > applied may work. > Starting with "Dcassandra.load_ring_state=false" causes the node to > rediscover the ring which may help (just a guess really). But if there is > bad node start been passed around in gossip it will just get the bad state > again. > Anyone else ? > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > On 1 Jul 2011, at 09:11, Héctor Izquierdo Seliva wrote: > > Hi all, > > I have upgraded all my cluster to 0.8.1. Today one of the disks in one > of the nodes died. After replacing the disk I tried running repair, but > this message appears: > > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.77 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair > with for sbs on > (170141183460469231731687303715884105727,28356863910078205288614550619314017621]: > manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098 completed. > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.79 > from repair because it is on version 0.7 or sooner. You should consider > updating this node before running repair again. > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair > with for sbs on > (141784319550391026443072753096570088105,170141183460469231731687303715884105727]: > manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf completed. > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > 20:36:25,087 AntiEntropyService.java (line 782) No neighbors to repair > with for sbs on > (113427455640312821154458202477256070484,141784319550391026443072753096570088105]: > manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a completed. > > What can I do? > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Repair doesn't work after upgrading to 0.8.1
Unless it is a 0.8.1 RC or beta On Fri, Jul 1, 2011 at 12:57 PM, Jonathan Ellis wrote: > This isn't 2818 -- (a) the 0.8.1 protocol is identical to 0.8.0 and > (b) the whole cluster is on the same version. > > On Thu, Jun 30, 2011 at 9:35 PM, aaron morton > wrote: > > This seems to be a known issue related > > to https://issues.apache.org/jira/browse/CASSANDRA-2818 e.g. > https://issues.apache.org/jira/browse/CASSANDRA-2768 > > There was some discussion on the IRC list today, driftx said the simple > fix > > was a full cluster restart. Or perhaps a rolling restart with the 2818 > patch > > applied may work. > > Starting with "Dcassandra.load_ring_state=false" causes the node to > > rediscover the ring which may help (just a guess really). But if there is > > bad node start been passed around in gossip it will just get the bad > state > > again. > > Anyone else ? > > > > - > > Aaron Morton > > Freelance Cassandra Developer > > @aaronmorton > > http://www.thelastpickle.com > > On 1 Jul 2011, at 09:11, Héctor Izquierdo Seliva wrote: > > > > Hi all, > > > > I have upgraded all my cluster to 0.8.1. Today one of the disks in one > > of the nodes died. After replacing the disk I tried running repair, but > > this message appears: > > > > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 > > from repair because it is on version 0.7 or sooner. You should consider > > updating this node before running repair again. > > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 > > from repair because it is on version 0.7 or sooner. You should consider > > updating this node before running repair again. > > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.80 > > from repair because it is on version 0.7 or sooner. You should consider > > updating this node before running repair again. > > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > > 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.77 > > from repair because it is on version 0.7 or sooner. You should consider > > updating this node before running repair again. > > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > > 20:36:25,085 AntiEntropyService.java (line 179) Excluding /10.20.13.76 > > from repair because it is on version 0.7 or sooner. You should consider > > updating this node before running repair again. > > INFO [manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098] 2011-06-30 > > 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair > > with for sbs on > > > (170141183460469231731687303715884105727,28356863910078205288614550619314017621]: > > manual-repair-26f5a7dd-cf12-44de-9f8f-6b6335bdd098 completed. > > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > > 20:36:25,086 AntiEntropyService.java (line 179) Excluding /10.20.13.79 > > from repair because it is on version 0.7 or sooner. You should consider > > updating this node before running repair again. > > INFO [manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf] 2011-06-30 > > 20:36:25,086 AntiEntropyService.java (line 782) No neighbors to repair > > with for sbs on > > > (141784319550391026443072753096570088105,170141183460469231731687303715884105727]: > > manual-repair-bdb4055a-d370-4d2a-a1dd-70a7e4fa60cf completed. > > INFO [manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a] 2011-06-30 > > 20:36:25,087 AntiEntropyService.java (line 782) No neighbors to repair > > with for sbs on > > > (113427455640312821154458202477256070484,141784319550391026443072753096570088105]: > > manual-repair-2a11d01c-e1e4-4f1e-b8cd-00a9a3fd2f4a completed. > > > > What can I do? > > > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >