"Internal error processing get" during bootstrap
Hello, I'm evaluating cassandra for use in my system. I could add approximately 16 million items using a single node. I'm using libcassandra (I can find my way through its code when I need to) to connect to it and I already have some infrastructure for handling and adding those items (I was using tokio cabinet before). I couldn't find much documentation regarding how to make a cluster, but it seemed simple enough. At cassandra server A (10.0.0.2) I had seeds: "locahost". At server B (10.0.0.3) I configured seeds: "10.0.0.2" and auto_bootstrap: true. Then I created a keyspace and a few column families in it. I imediately began to add items and to get all these "Internal error processing get". I found it quite odd, I thought it had to do with the load I was putting in, seeing that a few small tests had worked before. I spent quite some time debugging, when I finally decided to write this e-mail. I wanted to double check stuff, so I ran nodetool to see if everything was right. To my surprise, there was only one of the node available. It took a little while for the other one to show up as Joining and then as Normal. After I waited that period, I was able to insert items to the cluster with no error at all. Is that expected behaviour? What is the recommended way to setup a cluster? Should it be done manually. Setting up the machines, creating all keyspaces and colum families then checking nodetool and waiting for it to get stable? On a side note, sometimes I get "Default TException" (that seems to happen when the machine is in a heavier load than usual), commonly retrying the read or insert right after works fine. Is that what's supposed to happen? Perhaps I should raise some timeout somewhere? This is what ./bin/nodetool -h localhost ring reports me: Address DC Rack Status State Load Owns Token 119105113551249187083945476614048008053 10.0.0.3 datacenter1 rack1 Up Normal 3.43 GB 65.90% 61078635599166706937511052402724559481 10.0.0.2 datacenter1 rack1 Up Normal 1.77 GB 34.10% 119105113551249187083945476614048008053 It's still adding stuff. I have no idea why B owns so many more keys than A. I'm sorry if what I'm asking is trivial. But I have been having a hard time finding documentation. I've found a lot of outdated stuff, which was frustrating. I hope you guys have the time to help me out or -- if not -- I hope you can give me good reading material. Thank you, Rafael
Re: Kundera 2.0.2 Released
On Saturday, July 30, 2011, Amresh Singh wrote: > We are happy to announce release of Kundera 2.0.2 > > > Kundera is a JPA 2.0 compliant, Object-Datastore Mapping Library for > NoSQL Datastores. The idea behind Kundera is to make working with > NoSQL Databases drop-dead simple and fun. It currently supports > Cassandra, HBase and MongoDB. New features added in this release are: > > > 1. Kundera is now JPA 2.0 compliant. Interesting. I thought that, in order to be JPA compilant, you must support transactions. Does Kundera implement transactions on top of cassandra? I could be mixing things up, I have worked with EJB and Hibernate inside an EJB environment. Maybe the transaction requirement came from the EJB part of specification, not JPA. If I recall correctly, JPA is just a part of EJB specification, right? Also, you got the entire HQL to work with Cassandra, MongoDB and HBase? That's impressive!
How tokens work?
Hello, I have computers that are better than others in my cluster. In special, there's one which is much better and I'd like to give it more load than the others. Is it possible? I'm using RandomPartitioner, should I use other? Should I select tokens in some particular way? How is load distribution implemented in RandomPartitioner with respect to tokens? Thank you, Rafael
Re: How tokens work?
On Saturday, July 30, 2011, Rafael Almeida wrote: > Hello, > > I have computers that are better than others in my cluster. In special, > there's one which is much better and I'd like to give it more load than the > others. Is it possible? I'm using RandomPartitioner, should I use other? > Should I select tokens in some particular way? How is load distribution > implemented in RandomPartitioner with respect to tokens? > I'm answering myself this time. I think I've got things figured out, at least for RandomPartitioner. The token space goes from 0 to 2^217. There are 2^217 tokens possible. The load a node will receive is proportional to the number of tokens assigned to it. If you assign 2^217 / 2 tokens to a node, it will be responsible for half the load in the system. If you assign 2^217 / 3 tokens to a node it will be responsible for 1/3 the load and so on. But you assign only one token in cassandra's configuration file! True, but that's the first token for that node, in a range of tokens it will accept. The number of tokens actually assigned to it is the range from the value you wrote in intiial_token in cassandra.yaml up to the next token. I find it hard to explain that without an example. So, let's say the token space is actually from 0 to 100 and we have 4 nodes (let's do this in order to make things more manageble). In our example, we have the following initial_tokens: node A = 0 node B = 20 node C = 70 node D = 90 Node A would have 0 - 20 tokens assigned to it (20/100 = 20% of the load). Node B would have 70 - 20 = 50 tokens assigned to it (50% of the load). Node C would have 90 - 70 = 20 tokens assigned to it (20% of the load) and, finally, node D would have 10% of the tokens assigned to it. See how that works? If you mess up in your configuration. Let's say you set up initial_token like this: node A = 10 node B = 20 node C = 70 node D = 90 That way you'd have 10 unhandled tokens. I think cassandra detects it and set things up in a way no token is missing. But I'm not sure what it does exactly. I've tested it with two nodes and, when I make such invalid configuration, I get each node handling 50% of the load. I hope I've been clear. Please correct me if I misunderstood something.
Re: "Internal error processing get" during bootstrap
I'm going to tell you guys the answers I could find so far. On Tuesday, July 26, 2011, Rafael Almeida wrote: > I couldn't find much documentation regarding how to make a cluster, but it > seemed simple enough. At cassandra server A (10.0.0.2) I had seeds: > "locahost". At server B (10.0.0.3) I configured seeds: > "10.0.0.2" and auto_bootstrap: true. Then I created a keyspace and a > few column families in it. > > I imediately began to add items and to get all these "Internal error > processing get". I found it quite odd, I thought it had to do with the load > I was putting in, seeing that a few small tests had worked before. I spent > quite > some time debugging, when I finally decided to write this e-mail. I wanted to > double check stuff, so I ran nodetool to see if everything was right. To my > surprise, there was only one of the node available. It took a little while > for > the other one to show up as Joining and then as Normal. > > After I waited that period, I was able to insert items to the cluster with no > error at all. Is that expected behaviour? What is the recommended way to > setup a > cluster? Should it be done manually. Setting up the machines, creating all > keyspaces and colum families then checking nodetool and waiting for it to get > stable? The problem that I was having was mainly because I had set node A as seed of B and B as seed of A. I don't know what possessed me! Regarding the schema configuration. I made a schema file and I load it using: cassandra-cli -h localhost --batch < schema-file It works alright. > On a side note, sometimes I get "Default TException" (that seems to > happen when the machine is in a heavier load than usual), commonly retrying > the > read or insert right after works fine. Is that what's supposed to happen? > Perhaps I should raise some timeout somewhere? I still don't get why that error was so frequent. At first I was testing it on workstations, where people would compile stuff and run all sorts of software. I think that slowed down things considerable and the system was having a hard time managing connections from the application. After I moved it to dedicated computers those problems ceased to happen. > This is what ./bin/nodetool -h localhost ring reports me: > > Address DC Rack Status State Load > Owns > Token > > > 119105113551249187083945476614048008053 > 10.0.0.3 datacenter1 rack1 Up Normal 3.43 GB 65.90% > 61078635599166706937511052402724559481 > 10.0.0.2 datacenter1 rack1 Up Normal 1.77 GB 34.10% > 119105113551249187083945476614048008053 > > It's still adding stuff. I have no idea why B owns so many more keys than A. It happened due to my weird double-seed configuration. Now everything is fine. I've explained how tokens work on a different thread. Cheers, Rafael
Re: Problems using Thrift API in C
- Original Message - > From: Konstantin Naryshkin > To: user@cassandra.apache.org > Cc: > Sent: Thursday, August 4, 2011 10:36 AM > Subject: Re: Problems using Thrift API in C > > I have had similar issues when I generated Cassandra for Erlang. It seems > that > Thrift 0.6.1 (the latest stable version) does not work with Cassandra. Using > Thrift 0.7 does. > > I had issues where it would give me run time errors when trying to send an > insert (it would not serialize correctly). > I have a problem using thrift on C as well. I'm using thrift 0.5 and if I try to add a row to a column family that doesn't exists the exception I get is Default TException very unspecific. Is that an issue of cassandra? Is there probably something wrong with my setup? I was hoping to get an "Column family not found" message or something in those lines.
Re: Best indexing solution for Cassandra
>From Anthony Ikeda : > Well, we go live with our project very soon and we are now looking into what > we will be doing for the next phase. One of the enhancements we would like to > consider is an indexing platform to start building searches into our > application. > > > Right now we are just using column families to index the information > (different views based on what we want to find) however it is proving to be > quite a task to keep the index views in sync with the data - although not a > showstopper, it isn't something we want to be handling all the time > especially since operations like deletions require changes to multiple column > families. > > > I've heard of Solandra and Lucandra but I want to understand the experiences > of people that may have used them or other suggestions. I've had some experience with that. My main problem was that I had a limited vocabulary and a large number of documents. It seems like solandra kept all my documents on the same row for a given term. That means the documents don't get spread out throught the cluster and search was painfully slow. We ended up rolling up our own solution and not using cassandra at all for that purpose (althought we still use it for storage).
Creating column families per client
Hello, I am evaluating the usage of cassandra for my system. I will have several clients who won't share data with each other. My idea is to create one column family per client. When a new client comes in and adds data to the system, I'd like to create a column family dynamically. Is that reliable? Can I create a column family on a node and imediately add new data on that column family and be confident that the data added will eventually become visible to a read? []'s Rafael
Re: List all keys with RandomPartitioner
> > From: Franc Carter >To: user@cassandra.apache.org >Sent: Wednesday, February 22, 2012 9:24 AM >Subject: Re: List all keys with RandomPartitioner > > >On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti >wrote: > >I need to iterate over all the rows in a column family stored with >RandomPartitioner. >>When I reach the end of a key slice, I need to find the token of the last key >>in order to ask for the next slice. >>I saw in an old email that the token for a specific key can be recoveder >>through FBUtilities.hash(). That class however is inside the full Cassandra >>jar, not inside the client-specific part. >>Is there a way to iterate over all the keys which does not require the >>server-side Cassandra jar? >> > > >Does this help ? > > > http://wiki.apache.org/cassandra/FAQ#iter_world I don't get it. It says to use the last key read as start key, but what should be used as end key?
Re: Please advise -- 750MB object possible?
Keep them where? > > From: Mohit Anchlia >To: user@cassandra.apache.org >Cc: potek...@bnl.gov >Sent: Wednesday, February 22, 2012 3:44 PM >Subject: Re: Please advise -- 750MB object possible? > > >In my opinion if you are busy site or application keep blobs out of the >database. > > >On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff wrote: > >Chunking is a good idea, but you'll have to do it yourself. A few of the >columns in our application got quite large (maybe ~150MB) and the failure mode >was RPC timeout exceptions. Nodes couldn't always move that much data across >our data center interconnect in the default 10 seconds. With enough heap and a >faster network you could probably get by without chunking, but it's not ideal. >> >> >> >>On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin wrote: >> >>Hello everybody, >>> >>>I'm being asked whether we can serve an "object", which I assume is a blob, >>>of 750MB size? >>>I guess the real question is of how to chunk it and/or even it's possible to >>>chunk it. >>> >>>Thanks! >>> >>>Maxim >>> >>> >> > > >