Reconfiguring nodes - getting bootstrap error
I had to reconfigure my Cassandra nodes today to allow us to use Lucandra and made the following changes: * Shutdown ALL Cassandra instances * For each node: o Added in Lucandra Keyspace o Changed the Partitioner to OrderPreservingPartitioner o Deleted the folders in my Data File Directory o Deleted the files in my Commit Log Directory * Started each node individually Now it seems that I'm getting bootstrap errors coalescing to each node starting with the first node started (Error below) I understand this is because they are not new nodes but I thought deleting the data and commit log files would correct this - as there is no data to get. Is there some other files to remove? The system had been running with the RandomPartitioner without problem for the last week. INFO 16:56:47,322 Auto DiskAccessMode determined to be mmap INFO 16:56:48,045 Saved Token not found. Using ADZAODw5LiJt6juc INFO 16:56:48,045 Saved ClusterName not found. Using MAMBO Space INFO 16:56:48,053 Creating new commitlog segment /var/cassandra/log/CommitLog-1277189808053.log INFO 16:56:48,096 Starting up server gossip INFO 16:56:48,119 Joining: getting load information INFO 16:56:48,119 Sleeping 9 ms to wait for load information... INFO 16:56:48,165 Node /172.28.1.138 is now part of the cluster INFO 16:56:48,170 Node /172.28.1.139 is now part of the cluster INFO 16:56:48,171 Node /172.28.2.136 is now part of the cluster INFO 16:56:48,172 Node /172.28.1.141 is now part of the cluster INFO 16:56:49,136 InetAddress /172.28.1.141 is now UP INFO 16:56:49,145 InetAddress /172.28.2.136 is now UP INFO 16:56:49,146 InetAddress /172.28.1.138 is now UP INFO 16:56:49,147 InetAddress /172.28.1.139 is now UP INFO 16:57:06,772 Node /172.28.2.138 is now part of the cluster INFO 16:57:07,538 InetAddress /172.28.2.138 is now UP INFO 16:57:12,179 InetAddress /172.28.1.138 is now dead. INFO 16:57:19,195 error writing to /172.28.1.138 INFO 16:57:24,205 error writing to /172.28.1.139 INFO 16:57:26,209 InetAddress /172.28.1.139 is now dead. INFO 16:57:43,242 error writing to **/172.28.1.141 INFO 16:57:50,254 InetAddress /172.28.1.141 is now dead. INFO 16:57:59,271 error writing to /172.28.2.136 INFO 16:58:05,280 InetAddress /172.28.2.136 is now dead. INFO 16:58:18,136 Joining: getting bootstrap token ERROR 16:58:18,139 Exception encountered during startup. java.lang.RuntimeException: No other nodes seen! Unable to bootstrap at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.ja va:120) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java :102) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.jav a:97) at org.apache.cassandra.service.StorageService.initServer(StorageService.ja va:356) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:9 9) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:17 7) Exception encountered during startup. java.lang.RuntimeException: No other nodes seen! Unable to bootstrap at org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.ja va:120) at org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java :102) at org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.jav a:97) at org.apache.cassandra.service.StorageService.initServer(StorageService.ja va:356) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:9 9) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:17 7) Anthony Ikeda Java Analyst/Programmer Cardlink Services Limited Level 4, 3 Rider Boulevard Rhodes NSW 2138 Web: www.cardlink.com.au | Tel: + 61 2 9646 9221 | Fax: + 61 2 9646 9283 ** This e-mail message and any attachments are intended only for the use of the addressee(s) named above and may contain information that is privileged and confidential. If you are not the intended recipient, any display, dissemination, distribution, or copying is strictly prohibited. If you believe you have received this e-mail message in error, please immediately notify the sender by replying to this e-mail message or by telephone to (02) 9646 9222. Please delete the email and any attachments and do not retain the email or any attachments in any form. **<>
Deletion and batch_mutate
Hi everyone, I'm a new user of Cassandra, and during my tests, I've encountered a problem with deleting rows from CFs. I use Cassandra 0.6.2 and coding in Java, using the native Java Thrift API. The way my application works, I need to delete multiple rows at a time (just like reads and writes). Obviously, in terms of performance, I'd rather use batch_mutate and delete several rows and not issue a remove command on each and every row. So far, all attempts doing so have failed. The following command configurations have been tested: 1. Deletion, without a Supercolumn or SlicePredicate set. I get this error: InvalidRequestException(why:A Deletion must have a SuperColumn, a SlicePredicate or both.) 2. Deletion, with a SlicePredicate set. The SlicePredicate is without column names or SliceRange set. I get this error: InvalidRequestException(why:A SlicePredicate must be given a list of Columns, a SliceRange, or both) 3. Deletion, with a SlicePredicate set. The SlicePredicate is set with SliceRange. The SliceRange is set with empty start and finish values. I get this error: InvalidRequestException(why:Deletion does not yet support SliceRange predicates.) At this point I'm left with no other alternatives (since I want to delete a whole row and not specific columns/supercolumns within a row). Using the remove command in a loop has serious implications in terms of performance. Is there any solution for this problems? Thanks, Ron
Re: java.lang.OutOfMemoryError: Map failed
> Daniel: > > Thanks. That thread helped me solve my problem. > > I was able to run a 700k MySQL record import without a single memory error. > > I changed the following sections in storage-conf.xml to fix the OutofMemory errors: > > standard > batch > 1 Going to standard mode is not very good. It gives performance penalty to about 30% on uncached reads according to my own homegrown stress test. Do you use 32-bit or 64-bit JVM ? In 32-bit JVM you have no choice but to use standard mode. If you're using 64-bit JVM and getting this error it looks like you have limited virtual memory space. If you're under linux you can fix it with command ulimit -v unlimited in the same shell just before launching cassandra node.
Re: Deletion and batch_mutate
Take a look at https://issues.apache.org/jira/browse/CASSANDRA-494 https://issues.apache.org/jira/browse/CASSANDRA-1027 On 22.06.2010 19:00, Ron wrote: > Hi everyone, > I'm a new user of Cassandra, and during my tests, I've encountered a > problem with deleting rows from CFs. > I use Cassandra 0.6.2 and coding in Java, using the native Java Thrift API. > > The way my application works, I need to delete multiple rows at a time > (just like reads and writes). > Obviously, in terms of performance, I'd rather use batch_mutate and > delete several rows and not issue a remove command on each and every row. > So far, all attempts doing so have failed. > The following command configurations have been tested: > >1. Deletion, without a Supercolumn or SlicePredicate set. I get this > error: InvalidRequestException(why:A Deletion must have a > SuperColumn, a SlicePredicate or both.) >2. Deletion, with a SlicePredicate set. The SlicePredicate is without > column names or SliceRange set. I get this error: > InvalidRequestException(why:A SlicePredicate must be given a list > of Columns, a SliceRange, or both) >3. Deletion, with a SlicePredicate set. The SlicePredicate is set > with SliceRange. The SliceRange is set with empty start and finish > values. I get this error: InvalidRequestException(why:Deletion > does not yet support SliceRange predicates.) > > At this point I'm left with no other alternatives (since I want to > delete a whole row and not specific columns/supercolumns within a row). > Using the remove command in a loop has serious implications in terms of > performance. > Is there any solution for this problems? > Thanks, > Ron
unsubscribe
unsubscribe d...@dintran.com Dean Steele Reason: too much mail volume, I would prefer an weekly case study review.
[OT] Re: unsubscribe
Hey Dean ...and everyone else not managing to unsubscribe (and sending mails to the list instead): If you don't know how to unsubscribe you can always look at the List-Unsubscribe: header of any of the list emails. These days most of the time you will find that an "-unsubscribe" suffix is used instead of sending it in the subject or body. Many of the archives provide RSS feeds for mailing list in case you just want to read. HTH cheers -- Torsten On Tue, Jun 22, 2010 at 13:10, Dean Steele wrote: > unsubscribe > d...@dintran.com > Dean Steele > Reason: too much mail volume, I would prefer an weekly case study > review. > >
OrderPreservingPartitioner and manual token assignment
Hello! I use OrderPreservingPartitioner and assign tokens manually. Questions are: 1) Why range sorted in alphabetical order, not numeric order ? It was ok with RandomPartitioner Address Status Load Range Ring 84 172.19.0.35 Up 2.47 GB 0 |<--| 172.19.0.31 Up 1.85 GB 112| ^ 172.19.0.33 Up 1.46 GB 142v | 172.19.0.30 Up 1.44 GB 28 | ^ 172.19.0.32 Up 2.63 GB 56 v | 172.19.0.34 Up 3.29 GB 84 |-->| 2) what is the token range ? For example, all our keys starts with customer number (a few digits), but number is only small part of ASCII table. What is the best way to assign tokens manually when using OrderPreservingPartitioner ? -- Best regards, Maximmailto:maxi...@trackstudio.com LinkedIn Profile: http://www.linkedin.com/in/maximkr Google Talk/Jabber: maxi...@gmail.com ICQ number: 307863079 Skype Chat: maxim.kramarenko Yahoo! Messenger: maxim_kramarenko
Re: New to cassandra
And this one is useful : https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP 2010/6/22 Shahan Khan > The wiki is a great place: > > http://wiki.apache.org/cassandra/FrontPage > > Getting Started: http://wiki.apache.org/cassandra/GettingStarted > > Cassandra interfaces with PHP via thrift > > http://wiki.apache.org/cassandra/ThriftExamples > > Shahan > > On Mon, 21 Jun 2010 15:16:51 +0530, Ajay Singh > wrote: > > Hi > > I am a php developer, I am new to cassandra. Is there any starting guide or > tutorial from where i can begin > > Thanks > Ajay > > >
Re: OrderPreservingPartitioner and manual token assignment
2010/6/22 Maxim Kramarenko : > Hello! > > I use OrderPreservingPartitioner and assign tokens manually. > > Questions are: > > 1) Why range sorted in alphabetical order, not numeric order ? > It was ok with RandomPartitioner With RandomPartitioner, tokens are md5 hashes, thus number and the comparison between two tokens is the numeric one. With OrdrerPreservingPartitioner, tokens are the keys themselves, that is to say Strings, and the comparison is (utf8) String comparison (hence the alphabetic sorting). Note that as such, when switching from RP to OPP, you most certainly don't want to keep the same tokens (as they represents very different things (md5 hahes vs string key)). > > Address Status Load Range Ring > > 84 > 172.19.0.35 Up 2.47 GB 0 |<--| > 172.19.0.31 Up 1.85 GB 112 > | ^ > 172.19.0.33 Up 1.46 GB 142 > v | > 172.19.0.30 Up 1.44 GB 28 > | ^ > 172.19.0.32 Up 2.63 GB 56 > v | > 172.19.0.34 Up 3.29 GB 84 > |-->| > > 2) what is the token range ? For example, all our keys starts with customer > number (a few digits), but number is only small part of ASCII table. > > What is the best way to assign tokens manually when using > OrderPreservingPartitioner ? The first thing is to find (estimate most probably) the domain and repartition of the key you will use (note that this is really the hard part as most of the time you can only guess what the repartition will be and most of the time you will be wrong anyway and get bad load balancing). But when you know that, you just assign as tokens the particular keys that split this repartition the more evenly possible (and split here is with respect to (utf8) string comparison). -- Sylvain > > -- > Best regards, > Maxim mailto:maxi...@trackstudio.com > > LinkedIn Profile: http://www.linkedin.com/in/maximkr > Google Talk/Jabber: maxi...@gmail.com > ICQ number: 307863079 > Skype Chat: maxim.kramarenko > Yahoo! Messenger: maxim_kramarenko >
Re: django or pylons
What problems did you run into? On Mon, Jun 21, 2010 at 6:32 AM, Eugenio Minardi wrote: > Hi, I had gave a look to django + cassandra I found the twissandra project > (a django version of twitter based on cassandra). > But since I am new to django I couldnt make it work. If you find it > interesting please give me a hint on how to proceed to make it work :) > Eugenio > > On Mon, Jun 21, 2010 at 3:01 AM, S Ahmed wrote: >> >> aren't you guys using django though? :) >> >> On Sun, Jun 20, 2010 at 7:40 PM, Joe Stump wrote: >>> >>> A lot of the magic that Django brings to the table is derived from the >>> ORM. If you're skipping that then Pylons likely makes more sense. >>> >>> --Joe >>> On Jun 20, 2010, at 5:08 PM, Charles Woerner >>> wrote: >>> >>> I recently looked into this and came to the same conclusion, but I'm not >>> an expert in either Django or Pylons so I'd also be interested in hearing >>> what someone with more Python experience would say. >>> >>> On Sun, Jun 20, 2010 at 1:42 PM, S Ahmed wrote: Seeing as I will be using a different ORM, would it make more sense to use pylons over django? From what I understand, pylons assumes less as compared to django. >>> >>> >>> -- >>> --- >>> Thanks, >>> >>> Charles Woerner >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
UUIDs whose alphanumeric order is the same as their chronological order
I want to use UUIDs whose alphanumeric order is the same as their chronological order. So I'm generating Version 4 UUIDs ( http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_4_.28random.29) as follows: public class Id { static Random random = new Random(); public static String next() { // Format: --4xxx-8xxx- long high = (System.currentTimeMillis() << 16) | 0x4000 | random.nextInt(4096); long low = (random.nextLong() >>> 4) | 0x8000L; UUID uuid = new UUID(high, low); return uuid.toString(); } } Is there anything wrong with this idea?
Re: get_range_slices confused about token ranges after decommissioning a node
What I would expect to have happen is for the removed node to disappear from the ring and for nodes that are supposed to get more data to start streaming it over. I would expect it to be hours before any new data started appearing anywhere when you are anticompacting 80+GB prior to the streaming part. http://wiki.apache.org/cassandra/Streaming On Tue, Jun 22, 2010 at 12:57 AM, Joost Ouwerkerk wrote: > Yes, although "forget" implies that we once knew we were supposed to do so. > Given the following before-and-after states, on which nodes are we supposed > to run repair? Should the cluster be restarted? Is there anything else we > should be doing, or not doing? > > 1. Node is down due to hardware failure > > 192.168.1.104 Up 111.75 GB > 8954799129498380617457226511362321354 | ^ > 192.168.1.106 Up 113.25 GB > 17909598258996761234914453022724642708 v | > 192.168.1.107 Up 75.65 GB > 22386997823745951543643066278405803385 | ^ > 192.168.1.108 Down 75.77 GB > 26864397388495141852371679534086964062 v | > 192.168.1.109 Up 76.14 GB > 35819196517993522469828906045449285416 | ^ > 192.168.1.110 Up 75.9 GB > 40296596082742712778557519301130446093 v | > 192.168.1.111 Up 95.21 GB > 49251395212241093396014745812492767447 | ^ > > 2. nodetool removetoken 26864397388495141852371679534086964062 > > 192.168.1.104 Up 111.75 GB > 8954799129498380617457226511362321354 | ^ > 192.168.1.106 Up 113.25 GB > 17909598258996761234914453022724642708 v | > 192.168.1.107 Up 75.65 GB > 22386997823745951543643066278405803385 | ^ > 192.168.1.109 Up 76.14 GB > 35819196517993522469828906045449285416 | ^ > 192.168.1.110 Up 75.9 GB > 40296596082742712778557519301130446093 v | > 192.168.1.111 Up 95.21 GB > 49251395212241093396014745812492767447 | ^ > > At this point we're expecting 192.168.1.107 to pick up the slack for the > removed token, and for 192.168.1.109 and/or 192.168.1.110 to start streaming > data to 192.168.1.107 since they are holding the replicated data for that > range. > > 3. nodetool repair ? > > On Tue, Jun 22, 2010 at 12:03 AM, Benjamin Black wrote: >> >> Did you forget to run repair? >> >> On Mon, Jun 21, 2010 at 7:02 PM, Joost Ouwerkerk >> wrote: >> > I believe we did nodetool removetoken on nodes that were already down >> > (due >> > to hardware failure), but I will check to make sure. We're running >> > Cassandra >> > 0.6.2. >> > >> > On Mon, Jun 21, 2010 at 9:59 PM, Joost Ouwerkerk >> > wrote: >> >> >> >> Greg, can you describe the steps we took to decommission the nodes? >> >> >> >> -- Forwarded message -- >> >> From: Rob Coli >> >> Date: Mon, Jun 21, 2010 at 8:08 PM >> >> Subject: Re: get_range_slices confused about token ranges after >> >> decommissioning a node >> >> To: user@cassandra.apache.org >> >> >> >> >> >> On 6/21/10 4:57 PM, Joost Ouwerkerk wrote: >> >>> >> >>> We're seeing very strange behaviour after decommissioning a node: when >> >>> requesting a get_range_slices with a KeyRange by token, we are getting >> >>> back tokens that are out of range. >> >> >> >> What sequence of actions did you take to "decommission" the node? What >> >> version of Cassandra are you running? >> >> >> >> =Rob >> >> >> > >> > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Deletion and batch_mutate
right. in other words, you can delete entire rows w/ batch_mutate in 0.6.3 or trunk, but for 0.6.2 the best workaround is to issue multiple remove commands. On Tue, Jun 22, 2010 at 5:09 AM, Mishail wrote: > Take a look at > > https://issues.apache.org/jira/browse/CASSANDRA-494 > > https://issues.apache.org/jira/browse/CASSANDRA-1027 > > > On 22.06.2010 19:00, Ron wrote: >> Hi everyone, >> I'm a new user of Cassandra, and during my tests, I've encountered a >> problem with deleting rows from CFs. >> I use Cassandra 0.6.2 and coding in Java, using the native Java Thrift API. >> >> The way my application works, I need to delete multiple rows at a time >> (just like reads and writes). >> Obviously, in terms of performance, I'd rather use batch_mutate and >> delete several rows and not issue a remove command on each and every row. >> So far, all attempts doing so have failed. >> The following command configurations have been tested: >> >> 1. Deletion, without a Supercolumn or SlicePredicate set. I get this >> error: InvalidRequestException(why:A Deletion must have a >> SuperColumn, a SlicePredicate or both.) >> 2. Deletion, with a SlicePredicate set. The SlicePredicate is without >> column names or SliceRange set. I get this error: >> InvalidRequestException(why:A SlicePredicate must be given a list >> of Columns, a SliceRange, or both) >> 3. Deletion, with a SlicePredicate set. The SlicePredicate is set >> with SliceRange. The SliceRange is set with empty start and finish >> values. I get this error: InvalidRequestException(why:Deletion >> does not yet support SliceRange predicates.) >> >> At this point I'm left with no other alternatives (since I want to >> delete a whole row and not specific columns/supercolumns within a row). >> Using the remove command in a loop has serious implications in terms of >> performance. >> Is there any solution for this problems? >> Thanks, >> Ron > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Reconfiguring nodes - getting bootstrap error
sounds like a problem with your seed configuration On Tue, Jun 22, 2010 at 3:06 AM, Anthony Ikeda < anthony.ik...@cardlink.com.au> wrote: > I had to reconfigure my Cassandra nodes today to allow us to use Lucandra > and made the following changes: > > · Shutdown ALL Cassandra instances > > · For each node: > > o Added in Lucandra Keyspace > > o Changed the Partitioner to OrderPreservingPartitioner > > o Deleted the folders in my Data File Directory > > o Deleted the files in my Commit Log Directory > > · Started each node individually > > > > Now it seems that I’m getting bootstrap errors coalescing to each node > starting with the first node started (Error below) > > > > I understand this is because they are not new nodes but I thought deleting > the data and commit log files would correct this – as there is no data to > get. Is there some other files to remove? > > > > The system had been running with the RandomPartitioner without problem for > the last week. > > > > > > INFO 16:56:47,322 Auto DiskAccessMode determined to be mmap > > INFO 16:56:48,045 Saved Token not found. Using ADZAODw5LiJt6juc > > INFO 16:56:48,045 Saved ClusterName not found. Using MAMBO Space > > INFO 16:56:48,053 Creating new commitlog segment > /var/cassandra/log/CommitLog-1277189808053.log > > INFO 16:56:48,096 Starting up server gossip > > INFO 16:56:48,119 Joining: getting load information > > INFO 16:56:48,119 Sleeping 9 ms to wait for load information... > > INFO 16:56:48,165 Node /172.28.1.138 is now part of the cluster > > INFO 16:56:48,170 Node /172.28.1.139 is now part of the cluster > > INFO 16:56:48,171 Node /172.28.2.136 is now part of the cluster > > INFO 16:56:48,172 Node /172.28.1.141 is now part of the cluster > > INFO 16:56:49,136 InetAddress /172.28.1.141 is now UP > > INFO 16:56:49,145 InetAddress /172.28.2.136 is now UP > > INFO 16:56:49,146 InetAddress /172.28.1.138 is now UP > > INFO 16:56:49,147 InetAddress /172.28.1.139 is now UP > > INFO 16:57:06,772 Node /172.28.2.138 is now part of the cluster > > INFO 16:57:07,538 InetAddress /172.28.2.138 is now UP > > INFO 16:57:12,179 InetAddress /172.28.1.138 is now dead. > > INFO 16:57:19,195 error writing to /172.28.1.138 > > INFO 16:57:24,205 error writing to /172.28.1.139 > > INFO 16:57:26,209 InetAddress /172.28.1.139 is now dead. > > INFO 16:57:43,242 error writing to **/172.28.1.141 > > INFO 16:57:50,254 InetAddress /172.28.1.141 is now dead. > > INFO 16:57:59,271 error writing to /172.28.2.136 > > INFO 16:58:05,280 InetAddress /172.28.2.136 is now dead. > > INFO 16:58:18,136 Joining: getting bootstrap token > > ERROR 16:58:18,139 Exception encountered during startup. > > java.lang.RuntimeException: No other nodes seen! Unable to bootstrap > > at > org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:120) > > at > org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:102) > > at > org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:97) > > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:356) > > at > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99) > > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) > > Exception encountered during startup. > > java.lang.RuntimeException: No other nodes seen! Unable to bootstrap > > at > org.apache.cassandra.dht.BootStrapper.getBootstrapSource(BootStrapper.java:120) > > at > org.apache.cassandra.dht.BootStrapper.getBalancedToken(BootStrapper.java:102) > > at > org.apache.cassandra.dht.BootStrapper.getBootstrapToken(BootStrapper.java:97) > > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:356) > > at > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99) > > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177) > > > > > > Anthony Ikeda > > Java Analyst/Programmer > > Cardlink Services Limited > > Level 4, 3 Rider Boulevard > > Rhodes NSW 2138 > > > > Web: www.cardlink.com.au | Tel: + 61 2 9646 9221 | Fax: + 61 2 9646 9283 > > [image: logo_cardlink1] > > > > ** > This e-mail message and any attachments are intended only for the use of > the addressee(s) named above and may contain information that is privileged > and confidential. If you are not the intended recipient, any display, > dissemination, distribution, or copying is strictly prohibited. If you > believe you have received this e-mail message in error, please immediately > notify the sender by replying to this e-mail message or by telephone to (02) > 9646 9222. Please delete the email and any attachments and do not retain the > email or any attachments in any form. >
Re: UUIDs whose alphanumeric order is the same as their chronological order
Why not just use version 1 UUIDs and TimeUUIDType? On Tue, Jun 22, 2010 at 8:58 AM, David Boxenhorn wrote: > I want to use UUIDs whose alphanumeric order is the same as their > chronological order. So I'm generating Version 4 UUIDs ( > http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_4_.28random.29 > ) as follows: > > public class Id > { > static Random random = new Random(); > > public static String next() > { > // Format: --4xxx-8xxx- > > long high = (System.currentTimeMillis() << 16) | 0x4000 | > random.nextInt(4096); > long low = (random.nextLong() >>> 4) | 0x8000L; > > UUID uuid = new UUID(high, low); > > return uuid.toString(); > } > } > > Is there anything wrong with this idea? > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: UUIDs whose alphanumeric order is the same as their chronological order
As I understand it, the string value of TimeUUIDType does not sort alphanumerically in chronological order. Isn't that right? I want to use these ids in Oracle as well as Cassandra, and I want them to sort in chronological order. In Oracle they will have to be varchars (I think). Even in Cassandra alone, what is the advantage of TimeUUIDType over UTF8Type, if done this way? Is TimeUUIDType faster than UTF8Type? As far as I can tell, the class I give below looks much easier and faster than the one recommended here: http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java - which looks really cumbersome, in addition to the fact that it uses a 3rd party library and presumably machine dependent code! On Tue, Jun 22, 2010 at 5:18 PM, Jonathan Ellis wrote: > Why not just use version 1 UUIDs and TimeUUIDType? > > On Tue, Jun 22, 2010 at 8:58 AM, David Boxenhorn > wrote: > > I want to use UUIDs whose alphanumeric order is the same as their > > chronological order. So I'm generating Version 4 UUIDs ( > > > http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Version_4_.28random.29 > > ) as follows: > > > > public class Id > > { > >static Random random = new Random(); > > > >public static String next() > >{ > > // Format: --4xxx-8xxx- > > > > long high = (System.currentTimeMillis() << 16) | 0x4000 | > > random.nextInt(4096); > > long low = (random.nextLong() >>> 4) | 0x8000L; > > > > UUID uuid = new UUID(high, low); > > > > return uuid.toString(); > >} > > } > > > > Is there anything wrong with this idea? > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type
Gary Dusbabek gmail.com> writes: > > *Hopefully* fixed. I was never able to duplicate the problem on my > workstation, but I had a pretty good idea what was causing the > problem. Julie, if you're in a position to apply and test the fix, it > would help help us make sure we've got this one nailed down. > > Gary. > > On Thu, Jun 17, 2010 at 00:33, Jonathan Ellis gmail.com> wrote: > > That is consistent with the > > https://issues.apache.org/jira/browse/CASSANDRA-1169 bug I mentioned, > > that is fixed in the 0.6 svn branch. > > > > On Wed, Jun 16, 2010 at 10:51 PM, Julie nextcentury.com> wrote: > >> The loop is in IncomingStreamReader.java, line 62, a 3-line while loop. > >> bytesRead is not changing. pendingFile.getExpectedBytes() returns > >> 7,161,538,639 but bytesRead is stuck at 2,147,483,647. > >> > > Thanks for your help, Gary and Jonathon. We updated the JVM to get rid of the Value too large exception (which it did - yay!) and are still running with Cassandra 0.6.2 I have not been able to duplicate the tight loop problem. We did try out the in-progress Cassandra 0.6.3 (off the 0.6 SVN) yesterday but unfortunately can't tell you if the problem is gone because I am seeing a lot more timeouts on reads (retrying up to 10 times then quitting) so I haven't been able to get the database fully populated. I'm trying it again this morning. If problems persist with the in-progress version, I'll drop back to 0.6.2 and do some torture writing and see if the problem truly went away just by updating the JVM. Thanks for all of your suggestions! Julie
Re: get_range_slices confused about token ranges after decommissioning a node
I don't mind missing data for a few hours, it's the weird behaviour of get_range_slices that's bothering me. I added some logging to ColumnFamilyRecordReader to see what's going on: Split startToken=67160993471237854630929198835217410155, endToken=68643623863384825230116928934887817211 ... Getting batch for range: 67965855060996012099315582648654139032 to 68643623863384825230116928934887817211 Token for last row is: 50448492574454416067449808504057295946 Getting batch for range: 50448492574454416067449808504057295946 to 68643623863384825230116928934887817211 ... Notice how the get_range_slices response is invalid since it returns an out-of-range row. This poisons the batching loop and causes the task to spin out of control. /joost On Tue, Jun 22, 2010 at 9:09 AM, Jonathan Ellis wrote: > What I would expect to have happen is for the removed node to > disappear from the ring and for nodes that are supposed to get more > data to start streaming it over. I would expect it to be hours before > any new data started appearing anywhere when you are anticompacting > 80+GB prior to the streaming part. > http://wiki.apache.org/cassandra/Streaming > > On Tue, Jun 22, 2010 at 12:57 AM, Joost Ouwerkerk > wrote: > > Yes, although "forget" implies that we once knew we were supposed to do > so. > > Given the following before-and-after states, on which nodes are we > supposed > > to run repair? Should the cluster be restarted? Is there anything else > we > > should be doing, or not doing? > > > > 1. Node is down due to hardware failure > > > > 192.168.1.104 Up 111.75 GB > > 8954799129498380617457226511362321354 | ^ > > 192.168.1.106 Up 113.25 GB > > 17909598258996761234914453022724642708 v | > > 192.168.1.107 Up 75.65 GB > > 22386997823745951543643066278405803385 | ^ > > 192.168.1.108 Down75.77 GB > > 26864397388495141852371679534086964062 v | > > 192.168.1.109 Up 76.14 GB > > 35819196517993522469828906045449285416 | ^ > > 192.168.1.110 Up 75.9 GB > > 40296596082742712778557519301130446093 v | > > 192.168.1.111 Up 95.21 GB > > 49251395212241093396014745812492767447 | ^ > > > > 2. nodetool removetoken 26864397388495141852371679534086964062 > > > > 192.168.1.104 Up 111.75 GB > > 8954799129498380617457226511362321354 | ^ > > 192.168.1.106 Up 113.25 GB > > 17909598258996761234914453022724642708 v | > > 192.168.1.107 Up 75.65 GB > > 22386997823745951543643066278405803385 | ^ > > 192.168.1.109 Up 76.14 GB > > 35819196517993522469828906045449285416 | ^ > > 192.168.1.110 Up 75.9 GB > > 40296596082742712778557519301130446093 v | > > 192.168.1.111 Up 95.21 GB > > 49251395212241093396014745812492767447 | ^ > > > > At this point we're expecting 192.168.1.107 to pick up the slack for the > > removed token, and for 192.168.1.109 and/or 192.168.1.110 to start > streaming > > data to 192.168.1.107 since they are holding the replicated data for that > > range. > > > > 3. nodetool repair ? > > > > On Tue, Jun 22, 2010 at 12:03 AM, Benjamin Black wrote: > >> > >> Did you forget to run repair? > >> > >> On Mon, Jun 21, 2010 at 7:02 PM, Joost Ouwerkerk > >> wrote: > >> > I believe we did nodetool removetoken on nodes that were already down > >> > (due > >> > to hardware failure), but I will check to make sure. We're running > >> > Cassandra > >> > 0.6.2. > >> > > >> > On Mon, Jun 21, 2010 at 9:59 PM, Joost Ouwerkerk < > jo...@openplaces.org> > >> > wrote: > >> >> > >> >> Greg, can you describe the steps we took to decommission the nodes? > >> >> > >> >> -- Forwarded message -- > >> >> From: Rob Coli > >> >> Date: Mon, Jun 21, 2010 at 8:08 PM > >> >> Subject: Re: get_range_slices confused about token ranges after > >> >> decommissioning a node > >> >> To: user@cassandra.apache.org > >> >> > >> >> > >> >> On 6/21/10 4:57 PM, Joost Ouwerkerk wrote: > >> >>> > >> >>> We're seeing very strange behaviour after decommissioning a node: > when > >> >>> requesting a get_range_slices with a KeyRange by token, we are > getting > >> >>> back tokens that are out of range. > >> >> > >> >> What sequence of actions did you take to "decommission" the node? > What > >> >> version of Cassandra are you running? > >> >> > >> >> =Rob > >> >> > >> > > >> > > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: get_range_slices confused about token ranges after decommissioning a node
Ah, that sounds like https://issues.apache.org/jira/browse/CASSANDRA-1198. That it happened after removetoken is just that that happened to change your ring topology enough to make your queries start hitting it. On Tue, Jun 22, 2010 at 10:39 AM, Joost Ouwerkerk wrote: > I don't mind missing data for a few hours, it's the weird behaviour of > get_range_slices that's bothering me. I added some logging to > ColumnFamilyRecordReader to see what's going on: > > Split startToken=67160993471237854630929198835217410155, > endToken=68643623863384825230116928934887817211 > > ... > > Getting batch for range: 67965855060996012099315582648654139032 to > 68643623863384825230116928934887817211 > > Token for last row is: 50448492574454416067449808504057295946 > > Getting batch for range: 50448492574454416067449808504057295946 to > 68643623863384825230116928934887817211 > > ... > > Notice how the get_range_slices response is invalid since it returns an > out-of-range row. This poisons the batching loop and causes the task to > spin out of control. > /joost > > On Tue, Jun 22, 2010 at 9:09 AM, Jonathan Ellis wrote: >> >> What I would expect to have happen is for the removed node to >> disappear from the ring and for nodes that are supposed to get more >> data to start streaming it over. I would expect it to be hours before >> any new data started appearing anywhere when you are anticompacting >> 80+GB prior to the streaming part. >> http://wiki.apache.org/cassandra/Streaming >> >> On Tue, Jun 22, 2010 at 12:57 AM, Joost Ouwerkerk >> wrote: >> > Yes, although "forget" implies that we once knew we were supposed to do >> > so. >> > Given the following before-and-after states, on which nodes are we >> > supposed >> > to run repair? Should the cluster be restarted? Is there anything else >> > we >> > should be doing, or not doing? >> > >> > 1. Node is down due to hardware failure >> > >> > 192.168.1.104 Up 111.75 GB >> > 8954799129498380617457226511362321354 | ^ >> > 192.168.1.106 Up 113.25 GB >> > 17909598258996761234914453022724642708 v | >> > 192.168.1.107 Up 75.65 GB >> > 22386997823745951543643066278405803385 | ^ >> > 192.168.1.108 Down 75.77 GB >> > 26864397388495141852371679534086964062 v | >> > 192.168.1.109 Up 76.14 GB >> > 35819196517993522469828906045449285416 | ^ >> > 192.168.1.110 Up 75.9 GB >> > 40296596082742712778557519301130446093 v | >> > 192.168.1.111 Up 95.21 GB >> > 49251395212241093396014745812492767447 | ^ >> > >> > 2. nodetool removetoken 26864397388495141852371679534086964062 >> > >> > 192.168.1.104 Up 111.75 GB >> > 8954799129498380617457226511362321354 | ^ >> > 192.168.1.106 Up 113.25 GB >> > 17909598258996761234914453022724642708 v | >> > 192.168.1.107 Up 75.65 GB >> > 22386997823745951543643066278405803385 | ^ >> > 192.168.1.109 Up 76.14 GB >> > 35819196517993522469828906045449285416 | ^ >> > 192.168.1.110 Up 75.9 GB >> > 40296596082742712778557519301130446093 v | >> > 192.168.1.111 Up 95.21 GB >> > 49251395212241093396014745812492767447 | ^ >> > >> > At this point we're expecting 192.168.1.107 to pick up the slack for the >> > removed token, and for 192.168.1.109 and/or 192.168.1.110 to start >> > streaming >> > data to 192.168.1.107 since they are holding the replicated data for >> > that >> > range. >> > >> > 3. nodetool repair ? >> > >> > On Tue, Jun 22, 2010 at 12:03 AM, Benjamin Black wrote: >> >> >> >> Did you forget to run repair? >> >> >> >> On Mon, Jun 21, 2010 at 7:02 PM, Joost Ouwerkerk >> >> wrote: >> >> > I believe we did nodetool removetoken on nodes that were already down >> >> > (due >> >> > to hardware failure), but I will check to make sure. We're running >> >> > Cassandra >> >> > 0.6.2. >> >> > >> >> > On Mon, Jun 21, 2010 at 9:59 PM, Joost Ouwerkerk >> >> > >> >> > wrote: >> >> >> >> >> >> Greg, can you describe the steps we took to decommission the nodes? >> >> >> >> >> >> -- Forwarded message -- >> >> >> From: Rob Coli >> >> >> Date: Mon, Jun 21, 2010 at 8:08 PM >> >> >> Subject: Re: get_range_slices confused about token ranges after >> >> >> decommissioning a node >> >> >> To: user@cassandra.apache.org >> >> >> >> >> >> >> >> >> On 6/21/10 4:57 PM, Joost Ouwerkerk wrote: >> >> >>> >> >> >>> We're seeing very strange behaviour after decommissioning a node: >> >> >>> when >> >> >>> requesting a get_range_slices with a KeyRange by token, we are >> >> >>> getting >> >> >>> back tokens that are out of range. >> >> >> >> >> >> What sequence of actions did you take to "decommission" the node? >> >> >> What >> >> >> version of Cassandra are you running? >> >> >> >> >> >> =Rob >> >> >> >> >> > >> >> > >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > --
Finding new Cassandra data
In my system, I have a Cassandra front end, and an Oracle back end. Some information is created in the back end, and pushed out to the front end, and some information is created in the front end and pulled into the back end. Question: How do I locate new rows that have been crated in Cassandra, for import into Oracle? I'm thinking of having a special column family "newRows" that contains only the keys of the new rows. The offline process would look there to see what's new, then delete those rows. The "newRows" CF would have no data! (The data would be in the "real" CF.) Is this a good solution? It seems weird to have a CF with rows but no data. But I can't think of a better way. Any thoughts?
Re: Finding new Cassandra data
On Tue, Jun 22, 2010 at 09:59, David Boxenhorn wrote: > In my system, I have a Cassandra front end, and an Oracle back end. Some > information is created in the back end, and pushed out to the front end, and > some information is created in the front end and pulled into the back end. > > Question: How do I locate new rows that have been crated in Cassandra, for > import into Oracle? > > I'm thinking of having a special column family "newRows" that contains only > the keys of the new rows. The offline process would look there to see what's > new, then delete those rows. The "newRows" CF would have no data! (The data > would be in the "real" CF.) I've never tried an empty row, but I'm pretty sure you need at least one column. > > Is this a good solution? It seems weird to have a CF with rows but no data. > But I can't think of a better way. > > Any thoughts? Another approach would be to have a CF with a single row whose column names refer to the new row ids. This would allow you efficient slicing. The downside is that you'd need to make sure the row doesn't get too wide. So depending on your throughput and application behavior, this may or may not work. Gary.
Re: Hector vs cassandra-java-client
"Dop Sun" writes: > Updated. the first Cassandra client lib to make it into the Maven repositories will probably end up with a big audience. :-) -Bjørn
Re: Finding new Cassandra data
I can envision two fundamentally different approaches: 1. A CF that is CompareWith LONG ... use microsecond timestamps as your keys ... then you can filter by time ranges. This implies that you are willing to do a double write (once for the original data and then again for the logging). And a third read of a range_slice (which will most likely require pagination) to determine what to then push into your other system. Which begs a question ... if you know you are inserting and generating keys ... and you know the keyname ... why not simply push the key into a queue (non-Cassandra) and do processing against that. So ... 2. Don't store new row keys in a CF ... at the point of using the thrift API simply build a log of new keys and process that log asynchronously. This approach causes you to ask yourself another question: of the nodes in my cluster, am I willing to declare that some of those nodes are only available for write-thru processing. It's not Cassandra's job to make these decisions for you ... it's an applications decision. If you allow all nodes to perform writes, then you'll either have to consolidate logs or introduce some form of common queue for coordination of the async updates to non-Cassandra data stores. -phil On Jun 22, 2010, at 11:18 AM, Gary Dusbabek wrote: > On Tue, Jun 22, 2010 at 09:59, David Boxenhorn wrote: >> In my system, I have a Cassandra front end, and an Oracle back end. Some >> information is created in the back end, and pushed out to the front end, and >> some information is created in the front end and pulled into the back end. >> >> Question: How do I locate new rows that have been crated in Cassandra, for >> import into Oracle? >> >> I'm thinking of having a special column family "newRows" that contains only >> the keys of the new rows. The offline process would look there to see what's >> new, then delete those rows. The "newRows" CF would have no data! (The data >> would be in the "real" CF.) > > I've never tried an empty row, but I'm pretty sure you need at least one > column. > >> >> Is this a good solution? It seems weird to have a CF with rows but no data. >> But I can't think of a better way. >> >> Any thoughts? > > Another approach would be to have a CF with a single row whose column > names refer to the new row ids. This would allow you efficient > slicing. The downside is that you'd need to make sure the row doesn't > get too wide. So depending on your throughput and application > behavior, this may or may not work. > > Gary.
Re: UUIDs whose alphanumeric order is the same as their chronological order
On Tue, Jun 22, 2010 at 5:58 AM, David Boxenhorn wrote: > I want to use UUIDs whose alphanumeric order is the same as their > chronological order. So I'm generating Version 4 UUIDs ( ... > Is there anything wrong with this idea? If you want to keep it completely ordered, it's probably not enough to rely on System.currentTimeMillis(). It seems likely that it would sometimes be called twice for same clock value? This is easy to solve locally (just use an additional counter, that's what UUID packages do to get to 100 nanosecond resolution); and it might not matter in concurrent case (intra-node ordering is arbitrary but close enough). The other theoretical problem is reduction in random value space, but 75 bits of randomness may be well is enough. -+ Tatu +-
Re: UUIDs whose alphanumeric order is the same as their chronological order
A little bit of time fuzziness on the order of a few milliseconds is fine with me. This is user-generated data, so it only has to be time-ordered at the level that a user can perceive. I have no worries about my solution working - I'm sure it will work. I just wonder if TimeUUIDType isn't superior for some reason that I don't know about. (TimeUUIDType seems so bad in so many ways that I wonder why anyone uses it. There must be some reason!) On Tue, Jun 22, 2010 at 7:04 PM, Tatu Saloranta wrote: > On Tue, Jun 22, 2010 at 5:58 AM, David Boxenhorn > wrote: > > I want to use UUIDs whose alphanumeric order is the same as their > > chronological order. So I'm generating Version 4 UUIDs ( > ... > > Is there anything wrong with this idea? > > If you want to keep it completely ordered, it's probably not enough to > rely on System.currentTimeMillis(). It seems likely that it would > sometimes be called twice for same clock value? This is easy to solve > locally (just use an additional counter, that's what UUID packages do > to get to 100 nanosecond resolution); and it might not matter in > concurrent case (intra-node ordering is arbitrary but close enough). > The other theoretical problem is reduction in random value space, but > 75 bits of randomness may be well is enough. > > -+ Tatu +- >
Re: Uneven distribution using RP
This node's load is now growing at a ridiculous rate. It is at 105GB, with the next most loaded node at 70.63GB. Given that RF=3, I would assume that the replicas' nodes would grow relatively quickly too? On Mon, Jun 21, 2010 at 6:44 AM, aaron morton wrote: > According to http://wiki.apache.org/cassandra/Operations nodetool repair > is used to perform a major compaction and compare data between the nodes, > repairing any conflicts. Not sure that would improve the load balance, > though it may reduce some wasted space on the nodes. > > nodetool loadbalance will remove the node from the ring after streaming > it's data to the remaining nodes and the add it back in the busiest part. > I've used it before and it seems to do the trick. > > Also consider the size of the rows. Are they generally similar or do you > have some that are much bigger? The keys will be distributed without > considering the size of the data. > > The RP is random though, i do not think it tries to evenly distribute the > keys. So some variance with a small number of nodes should be expected IMHO. > > Aaron > > On 21 Jun 2010, at 02:31, James Golick wrote: > > I ran cleanup on all of them and the distribution looked roughly even after > that, but a couple of days later, it's looking pretty uneven. > > On Sun, Jun 20, 2010 at 10:21 AM, Jordan Pittier - Rezel > wrote: > >> Hi, >> Have you tried nodetool repair (or cleanup) on your nodes ? >> >> >> On Sun, Jun 20, 2010 at 4:16 PM, James Golick wrote: >> >>> I just increased my cluster from 2 to 4 nodes, and RF=2 to RF=3, using >>> RP. >>> >>> The tokens seem pretty even on the ring, but two of the nodes are far >>> more heavily loaded than the others. I understand that there are a variety >>> of possible reasons for this, but I'm wondering whether anybody has >>> suggestions for now to tweak the tokens such that this problem is >>> alleviated. Would it be better to just add 2 more nodes? >>> >>> Address Status Load Range >>> Ring >>> >>> 170141183460469231731687303715884105728 >>> 10.36.99.140 Up 61.73 GB >>> 43733172796241720623128947447312912170 |<--| >>> 10.36.99.134 Up 69.7 GB >>> 85070591730234615865843651857942052864 | | >>> 10.36.99.138 Up 54.08 GB >>> 128813844387867495544257452469445200073| | >>> 10.36.99.136 Up 54.75 GB >>> 170141183460469231731687303715884105728|-->| >>> >>> >> >> > >
Re: Uneven distribution using RP
On 6/22/10 10:07 AM, James Golick wrote: This node's load is now growing at a ridiculous rate. It is at 105GB, with the next most loaded node at 70.63GB. Given that RF=3, I would assume that the replicas' nodes would grow relatively quickly too? What Replica Placement Strategy are you using (Rackunaware, Rackaware, etc?)? The current implementation of Rackaware is pretty simple and relies on careful placement of nodes in multiple DCs along the ring to avoid hotspots. http://wiki.apache.org/cassandra/Operations#Replication " RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in another data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the same rack as the first Note that with RackAwareStrategy, succeeding nodes along the ring should alternate data centers to avoid hot spots. For instance, if you have nodes A, B, C, and D in increasing Token order, and instead of alternating you place A and B in DC1, and C and D in DC2, then nodes C and A will have disproportionately more data on them because they will be the replica destination for every Token range in the other data center. " https://issues.apache.org/jira/browse/CASSANDRA-785 Is also related, and marked Fix Version 0.8. =Rob
Re: Uneven distribution using RP
RackUnaware, currently On Tue, Jun 22, 2010 at 1:26 PM, Robert Coli wrote: > On 6/22/10 10:07 AM, James Golick wrote: > >> This node's load is now growing at a ridiculous rate. It is at 105GB, with >> the next most loaded node at 70.63GB. >> >> Given that RF=3, I would assume that the replicas' nodes would grow >> relatively quickly too? >> > What Replica Placement Strategy are you using (Rackunaware, Rackaware, > etc?)? The current implementation of Rackaware is pretty simple and relies > on careful placement of nodes in multiple DCs along the ring to avoid > hotspots. > > http://wiki.apache.org/cassandra/Operations#Replication > " > RackAwareStrategy: replica 2 is placed in the first node along the ring the > belongs in another data center than the first; the remaining N-2 replicas, > if any, are placed on the first nodes along the ring in the same rack as the > first > > Note that with RackAwareStrategy, succeeding nodes along the ring should > alternate data centers to avoid hot spots. For instance, if you have nodes > A, B, C, and D in increasing Token order, and instead of alternating you > place A and B in DC1, and C and D in DC2, then nodes C and A will have > disproportionately more data on them because they will be the replica > destination for every Token range in the other data center. > " > > https://issues.apache.org/jira/browse/CASSANDRA-785 > > Is also related, and marked Fix Version 0.8. > > =Rob > >
forum application data model conversion
Converting a Forum application to cassandra's data model. Tables: Posts [postID, threadID, userID, subject, body, created, lastmodified] So this table contains the actual question subject and body. When a user logs in, they want to see a list of their questions, and also order by the last-modified date (to see if people responed to their question). How would you do this best in Cassandra, seeing as the question/answer text is stored in another table. I know you could make a CF like: userID { postID1, postID2, ...} And somehow order by last-modified, but then on the actual web page you would have to first query for postID's owned by the user, and orderd by last-modified. THEN you would have to fetch the post data from the posts collection. Is this the only way? I mean other than repeating the post subject+body in the user-to-postID index CF.
Write Rate / Second
How to find out the performance metrics such as write rate per second, and read rate per second. I could not find out from tpstats and cfstats command. Are there any attributes in JMX? Can someone please help me. Thanks, Mubarak
Re: Write Rate / Second
rate = operations / latency On Tue, Jun 22, 2010 at 2:50 PM, Mubarak Seyed wrote: > How to find out the performance metrics such as write rate per second, and > read rate per second. I could not find out from tpstats and cfstats command. > > Are there any attributes in JMX? Can someone please help me. > > Thanks, > Mubarak > > > > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: how to implement the function similar to inbox search?
Not having an index doesn't matter if you're going to read all the subcolumns back at once, which IIANM is the idea here. On Mon, Jun 21, 2010 at 12:20 PM, hu wei wrote: > in datamodel wiki: > You can think of each super column name as a term and the columns within as > the docids with rank info and other attributes being a part of it. If you > have keys as the userids then you can have a per-user index stored in this > form. This is how the per user index for term search is laid out for Inbox > search at Facebook. > a question: because subcolumn does't has index ,does it has performance > bottleneck? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: bulk loading
I looked at the thrift service implementation and got it working. (Much faster import!) Thanks! On Mon, Jun 21, 2010 at 13:09, Oleg Anastasjev wrote: > Torsten Curdt vafer.org> writes: > >> >> First I tried with my one "cassandra -f" instance then I saw this >> requires a separate IP. (Why?) > > This is because your import program becomes a special member of cassandra > cluster to be able to speak internal protocol. And each memboer of cassandra > cluster must have its own IP. > >> But even with a separate IPs >> "StorageService.instance.getNaturalEndpoints" does not return an >> endpoint. > > Did you defined -Dstorage-config for your import program to point to the same > configuration your normal cassandra nodes use ? > > Did you initialized client-mode storage service, like below ? > // init cassandra proxy > try > { > StorageService.instance.initClient(); > } > catch (IOException e) > { > throw new RuntimeException(e); > } > try > { > Thread.sleep(10*1000); > } > catch (InterruptedException e) > { > throw new RuntimeException(e); > } > > > >
Re: UUIDs whose alphanumeric order is the same as their chronological order
On Tue, Jun 22, 2010 at 9:12 AM, David Boxenhorn wrote: > A little bit of time fuzziness on the order of a few milliseconds is fine > with me. This is user-generated data, so it only has to be time-ordered at > the level that a user can perceive. Ok, so mostly ordered. :-) > I have no worries about my solution working - I'm sure it will work. I just > wonder if TimeUUIDType isn't superior for some reason that I don't know > about. (TimeUUIDType seems so bad in so many ways that I wonder why anyone > uses it. There must be some reason!) I think that rationally thinking random-number based UUID is the best, provided one has a good random number generator. But there is something intuitive about rather using location + time-based alternative, based on tiny chance of collision that any (pseudo) random number based system has. So it just seems intuitive safer to use time-uuids, I think -- it isn't, it just feels that way. :-) Secondary reason is probably the ordering, and desire to stay standards compliant. As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting. Java Uuid Generator (2.0) defaults to such comparator, as I agree that this makes more sense than whatever sorting you would otherwise get. It is unfortunate that clock chunks are ordered in weird way by uuid specification; there is no reason it couldn't have been made "right way" so that hex representation would sort nicely. -+ Tatu +-
SQL Server to Cassandra Schema Design - Ideas Anyone?
I'm having a little block in converting an existing SQL Server schema that we have into Cassandra Keyspace(s). The whole key-value thing has just not clicked yet. Do any of you know of any good examples that are more complex than the example in the readme file? We are looking to report on web traffic so things like hits, page views, unique visitors,... All the basic web stuff. I'm very sure that one of you, likely many more, is already doing this. Here are two queries just to give you a few key works related to the metrics that we want to move into Cassandra: /* Data logged */ select t.Datetime,c.CustomerNumber,ct.cust_type,ws.SiteNumber,ws.SiteName ,f.Session,wa.page,wa.Note,f.CPUTime,f.DCLWaitTime,f.DCLRequestCount,'clientip' = dbo.u_IpInt2Str(ClientIP) from warehouse.dbo.fact_WebHit f join Warehouse.dbo.dim_Time t on t.ID = f.TimeID join Warehouse.dbo.dim_CustomerType ct on ct.ID = f.CustomerTypeID join Warehouse.dbo.dim_Customer c on c.ID = f.CustomerID join Warehouse.dbo.dim_Symbol s on s.ID = f.SymbolID join Warehouse.dbo.dim_WebAction wa on wa.ID = f.WebActionID join Warehouse.dbo.dim_WebSite ws on ws.ID = f.WebSiteID /* Data with surrogate keys */ select f.Timeid,f.CustomerID,f.CustomerTypeID,f.WebSiteID ,f.Session,f.WebActionID,f.CPUTime ,f.DCLWaitTime,f.DCLRequestCount,ClientIP from warehouse.dbo.fact_WebHit f Any good info would be appreciated. I have of course checked the main web sites but I could have missed something along the way. Craig
Cassandra Health Monitoring
All, We have been working through some operations scenarios, so that we are ready to deploy our first Cassandra cluster into production in the coming months. During this process our operations folks have asked us to provide a Health Check service. I am using the word service here very liberally - really we just need to provide a way for the folks in out NOC to know that not only is the Cassandra process running (which they will get with their monitoring tools ), but that it is actually alive and well. We do not have the intent of verifying that the data is valid, just that every node in the cluster that is known to be running is actually alive and healthy. My questions are - What does it mean for a Cassandra node to be healthy? What is the minimum (from an impact to the performance of a node) things we can check to make sure that a node is not a zombie? Any and all input is greatly appreciated. Thanks, Andrew
Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type
Gary Dusbabek gmail.com> writes: > > *Hopefully* fixed. I was never able to duplicate the problem on my > workstation, but I had a pretty good idea what was causing the > problem. Julie, if you're in a position to apply and test the fix, it > would help help us make sure we've got this one nailed down. > > Gary. Gary, I have run a full write test with the SVN 0.6 Cassandra from yesterday which I'll call 0.6.3 beta. I am definitely not seeing the problem in 0.6.3 beta. I am seeing something different than in 0.6.2 that is probably totally unrelated. I can tell you more if you are interested and want details. The headline is that when I run my 8 write clients (all on separate nodes) with 10 cassandra nodes and my clients are writing as fast as they can with consistency=ALL with 0.6.3 beta, I get timeouts within 2 minutes of starting my run. When I drop back to 0.6.2, I do not get the timeouts even after 30 minutes of running, all else the same. I have cpu usage and disk io stats on all my cassandra nodes during the 0.6.3 beta run if they would be helpful. I am going to go back to 0.6.2 until 0.6.3 is officially released. Just wanted to try it out and let you know if I saw the problem. Interestingly, since updating our JVM, I'm not seeing the tight-loop problem in 0.6.2 either. Thank you for your help! Julie
Re: Uneven distribution using RP
Turns out that this is due to a larger proportion of the wide rows in the system being located on that node. I moved its token over a little to compensate for it, but it doesn't seem to have helped at this point. What's confusing about this is that RF=3 and no other node's load is growing as quickly as that one. - James On Tue, Jun 22, 2010 at 1:31 PM, James Golick wrote: > RackUnaware, currently > > > On Tue, Jun 22, 2010 at 1:26 PM, Robert Coli wrote: > >> On 6/22/10 10:07 AM, James Golick wrote: >> >>> This node's load is now growing at a ridiculous rate. It is at 105GB, >>> with the next most loaded node at 70.63GB. >>> >>> Given that RF=3, I would assume that the replicas' nodes would grow >>> relatively quickly too? >>> >> What Replica Placement Strategy are you using (Rackunaware, Rackaware, >> etc?)? The current implementation of Rackaware is pretty simple and relies >> on careful placement of nodes in multiple DCs along the ring to avoid >> hotspots. >> >> http://wiki.apache.org/cassandra/Operations#Replication >> " >> RackAwareStrategy: replica 2 is placed in the first node along the ring >> the belongs in another data center than the first; the remaining N-2 >> replicas, if any, are placed on the first nodes along the ring in the same >> rack as the first >> >> Note that with RackAwareStrategy, succeeding nodes along the ring should >> alternate data centers to avoid hot spots. For instance, if you have nodes >> A, B, C, and D in increasing Token order, and instead of alternating you >> place A and B in DC1, and C and D in DC2, then nodes C and A will have >> disproportionately more data on them because they will be the replica >> destination for every Token range in the other data center. >> " >> >> https://issues.apache.org/jira/browse/CASSANDRA-785 >> >> Is also related, and marked Fix Version 0.8. >> >> =Rob >> >> >
Re: Uneven distribution using RP
On Tue, Jun 22, 2010 at 4:08 PM, James Golick wrote: > Turns out that this is due to a larger proportion of the wide rows in the > system being located on that node. I moved its token over a little to > compensate for it, but it doesn't seem to have helped at this point. > What's confusing about this is that RF=3 and no other node's load is growing > as quickly as that one. Maybe it's failing to compact for some reason?
Re: Uneven distribution using RP
It's compacting at a ridiculously fast rate. The pending compactions have been growing for a while. It's also flushing memtables really quickly for a particular CF. Like, really quickly. Like, one every minute. I increased the thresholds by 10x and it's still going fast. On Tue, Jun 22, 2010 at 5:27 PM, Jeremy Dunck wrote: > On Tue, Jun 22, 2010 at 4:08 PM, James Golick > wrote: > > Turns out that this is due to a larger proportion of the wide rows in the > > system being located on that node. I moved its token over a little to > > compensate for it, but it doesn't seem to have helped at this point. > > What's confusing about this is that RF=3 and no other node's load is > growing > > as quickly as that one. > > Maybe it's failing to compact for some reason? >
nodetool loadbalance : Strerams Continue on Non Acceptance of New Token
Hi, Please confirm if this is an issue and should be reported or I am doing something wrong. I could not find anything relevant on JIRA: Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way: - Added one node; - Loaded default schema with RF 1 from YAML using JMX; - Loaded 2M keys using py_stress; - Bootstrapped a second node; - Cleaned up the first node; - Bootstrapped a third node; - Cleaned up the second node; I got the following ring: Address Status Load Range Ring 154293670372423273273390365393543806425 10.50.26.132 Up 518.63 MB 69164917636305877859094619660693892452 |<--| 10.50.26.134 Up 234.8 MB 111685517405103688771527967027648896391 | | 10.50.26.133 Up 235.26 MB 154293670372423273273390365393543806425 |-->| Now I ran: nodetool --host 10.50.26.132 loadbalance It's been going for a while. I checked the streams nodetool --host 10.50.26.134 streams Mode: Normal Not sending any streams. Streaming from: /10.50.26.132 Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096), (22206096,27271682)] Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462), (15180462,18656982)] Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829), (353139829,433883659)] Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059), (366336059,450095320)] nodetool --host 10.50.26.132 streams Mode: Leaving: streaming data to other nodes Streaming to: /10.50.26.134 /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)] Not receiving any streams. These have been going for the past 2 hours. I see in the logs of the node with 134 IP address and I saw this: INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) Will not change my token ownership to /10.50.26.132 So, to my understanding from wikis loadbalance supposed to decommission and re-bootstrap again by sending its tokens to other nodes and then bootstrap again. It's been stuck in streaming for the past 2 hours and the size of ring has not changed. The log in the first node says it has started streaming for the past hours: INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) Beginning transfer process to /10.50.26.134 for ranges (154293670372423273273390365393543806425,69164917636305877859094619660693892452] INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) Flushing memtables for Keyspace1... INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) Stream context metadata [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]] 1 sstables. INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) Sending a stream initiate message to /10.50.26.134 ... INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) Waiting for transfer to /10.50.26.134 to complete INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log', position=720) INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) Enqueuing flush of Memtable(LocationInfo)@1637794189 INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) Writing Memtable(LocationInfo)@1637794189 INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) Completed flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 ColumnFamilyStore.java (line 374) Discarding 1000 Nothing more after this line. Am I doing something wrong? Best Regards, -Arya
Hector - Java doc
Where can i find the java doc for Hector java client? Do i need to build one from source? -- Thanks, Mubarak Seyed.
Never ending compaction
We had to take a node down for an upgrade last night. When we brought it back online in the morning, it got slammed by HH data all day so badly that it was compacting near constantly, and the pending compactions pool was piling up. I shut most of the writes down to let things catch up, which they mostly have, but in an effort to minimize downtime of the component that relies on cassandra, I restarted the reading and writing with 4 pending compactions. Now, the writing is taking place quickly enough that compactions just keep queueing up. That basically means that at this pace, compactions will *never* complete. And compactions are expensive. They essentially make a node useless. So, we're left with 3/4 of a cluster, since we only have 4 nodes. Since then, another node in the cluster has started queueing up compactions. This is on pretty beefy hardware, too: 2 x E5620, 24GB, 2 x 15kRPM SAS disks in RAID1 for data, and 1 x 7200RPM SATA for commit logs. I guess we need more nodes? But, we only have about 80GB total per node, which doesn't really seem like that much for that kind of hardware? - James
Re: Hector - Java doc
I couldn't find the docs online but the Ant build script here in the source: http://github.com/rantav/hector/blob/master/build.xml has a javadoc target you can run to generate them... hope that helps... Jon. On 22 June 2010 21:25, Mubarak Seyed wrote: > Where can i find the java doc for Hector java client? Do i need to build > one from source? > > -- > Thanks, > Mubarak Seyed.
Re: Hector - Java doc
There isn't an online javadoc page, but the code is online and well documented and there's a wiki and all sorts of documents and examples http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/Keyspace.java http://wiki.github.com/rantav/hector/ On Wed, Jun 23, 2010 at 8:11 AM, Jonathan Holloway < jonathan.hollo...@gmail.com> wrote: > I couldn't find the docs online but the Ant build script here in the > source: > > http://github.com/rantav/hector/blob/master/build.xml > > has a javadoc target you can run to generate them... hope that helps... > > Jon. > > > On 22 June 2010 21:25, Mubarak Seyed wrote: > >> Where can i find the java doc for Hector java client? Do i need to build >> one from source? >> >> -- >> Thanks, >> Mubarak Seyed. > >
Re: UUIDs whose alphanumeric order is the same as their chronological order
Having a physical location encoded in the UUID *increases* the chance of a collision, because it means fewer random bits. There definitely will be more than one UUID created in the same clock unit on the same machine! The same bits that you use to encode your few servers can be used for over 100 trillion random numbers! "As to ordering, if you wanted to use time-uuids, comparators that do give time-based ordering are trivial, and no slower than lexical sorting." "No slower" isn't a good reason to use it! I am willing to take a (reasonable) time *penalty* to use lexically ordered UUIDs that will work both in Cassandra and Oracle (and which are human-readable - always good for debugging)! I am also willing to take a reasonable penalty to avoid using weird third-party code for generating UUIDs in the first place. On Tue, Jun 22, 2010 at 10:05 PM, Tatu Saloranta wrote: > On Tue, Jun 22, 2010 at 9:12 AM, David Boxenhorn > wrote: > > A little bit of time fuzziness on the order of a few milliseconds is fine > > with me. This is user-generated data, so it only has to be time-ordered > at > > the level that a user can perceive. > > Ok, so mostly ordered. :-) > > > I have no worries about my solution working - I'm sure it will work. I > just > > wonder if TimeUUIDType isn't superior for some reason that I don't know > > about. (TimeUUIDType seems so bad in so many ways that I wonder why > anyone > > uses it. There must be some reason!) > > I think that rationally thinking random-number based UUID is the best, > provided one has a good random number generator. > But there is something intuitive about rather using location + > time-based alternative, based on tiny chance of collision that any > (pseudo) random number based system has. > So it just seems intuitive safer to use time-uuids, I think -- it > isn't, it just feels that way. :-) > > Secondary reason is probably the ordering, and desire to stay > standards compliant. > As to ordering, if you wanted to use time-uuids, comparators that do > give time-based ordering are trivial, and no slower than lexical > sorting. > Java Uuid Generator (2.0) defaults to such comparator, as I agree that > this makes more sense than whatever sorting you would otherwise get. > It is unfortunate that clock chunks are ordered in weird way by uuid > specification; there is no reason it couldn't have been made "right > way" so that hex representation would sort nicely. > > -+ Tatu +- >