Re: Some questions about using Binary Memtable to import data.
Thanks for you information. I look at some source code of the implement. There still some question: 1 How did I know that the binary write message send to endpoint success? 2 What will happen if the some of nature endpoints dead? Thanks again. On Wed, May 19, 2010 at 2:26 PM, Jonathan Ellis wrote: > 1. yes > 2. yes > 3. compaction will slow down the load > 4. it will flush the memtable > > On Tue, May 18, 2010 at 12:24 AM, Peng Guo wrote: > > Hi All: > > > > I am trying to use Binary Memtable to import a large number of data. > > > > But after I look at the wiki > > intro:http://wiki.apache.org/cassandra/BinaryMemtable > > > > I have some questions about using BinaryMemtable > > > > Will the data be replicated automatic? > > Can we modify the data that already exist in Cassandra? > > What will happen if we not turn off compaction? > > What will happen if the data beyond the limited of the > > BinaryMemtableThroughputInMB? > > > > Thanks. > > > > -- > > Regards > >Peng Guo > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > -- Regards Peng Guo
Cassandra compaction disk space logic
Hi! We have mail archive application, so we have a lot of data (30TB on multiple nodes) and should delete data after a few months of storing. Questions are: 1) Compaction require extra space to process. What happend if node have no extra space for compaction ? Will it crash, or just stop compaction process ? 2) Is it possible to limit max SSTable file size ? I am worry about following situation: we have 1TB disk, 600 GB of data in single file and should delete 50 GB of outdated data. This can lead to another 550 GB data file generation, which cannot fit on the disk. 3) If we have 30 TB in data and replicas, how much disk space required to handle this, including adding new data, deleting old, compaction, etc ? 4) What occurs, if we run decommission, but target node have not enough disk space ?
Strange error with data reading
Hello! I have 3 node cluster: node1, node2, node3. Replication factor = 2. I run decommission on node3 and it's in progress, moving data to node1 Ring on all nodes show all 3 nodes up, no problems (but node 1 response with 3-5 sec delay). I tried to execute a few "get" statements using cli, like get MailArchive.Meta['ec3-n2:1274046482!5C/9B-05558-11860FB4!c'] on node 1 and 3 all works fine, but node 2 cli always return Exception null data is 3 days old, so it doesn't seems like temp effect. No errors in log. Node 2 restart doesn't help too. tpstats returns 0 active/pending in all rows. What is it ?
Re: Data migration from mysql to cassandra
Thanks Jonathan, using mysql as an id sequence generator definitely is a good options. One thing though, does using sequential ids defeat the purpose of random partitioner? On Tue, May 18, 2010 at 11:25 PM, Jonathan Ellis wrote: > Those are 2 of the 3 options (the other one being, continue to > generate incrementing IDs either by continuing to use mysql solely for > that purpose, or by using another system like redis for that). > > On Mon, May 17, 2010 at 10:48 PM, Beier Cai wrote: > > I'm currently moving my existing mysql database to cassandra. One > particular > > problem I have is to migrate all those integer auto-increment ids to > > cassandra's code generated keys (like UUID). One way I can do is to dump > all > > the existing records into Cassandra and start with UUID for new records, > but > > this will leave mixed mode of ids. another way I can think of is to > > re-create existing records using UUID and deal with all > > those referential keys. Either way seems kinda awkward. Are there any > > good practice to deal with this? I know many people here come from mysql, > > what did you do? > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
key path vs super column
We are currently working on a prototype that is using Cassandra for realtime-ish statistics system. This seems to be quite a common use case. If people are interested - maybe it be worth collaborating on this beyond design discussions on the list. But first let's me explain our approach and where we could use some input. We are storing the raw events into minute buckets. => { => { 'id'=>1, 'attrA' => 'a1', 'attrB' => 'b1' }, => { 'id'=>2, 'attrA' => 'a2', 'attrB' => 'b1' } ... } The number of attributes are quite limited currently (below 20) and for now we only plan to have no more than 1000 events per minute. So this should be really a piece of cake for Cassandra. With this little data using a super column should be no problem. Now the idea is to iterate over the minute buckets and build hour, day, month and year aggregates. With that getting the totals across a certain time frame isn't more than a few gets (or a multiget) and summing it all up. I guess the idea is straight forward. One could use a super column to store and access the aggregated data from the time buckets: => { 'id/1' => { 'count' => 12 }, 'id/2' => { 'count' => 21 } ... } While this feels natuaral, the hierarchy might not be best choice for with the current Cassandra if the number of different ids becomes too large IIUC. One could also move the id part into the row key space instead. + 'id/1' => 12 + 'id/2' => 21 ...at least as long as we don't have to access all data for one time slot (like one hour in this case). (This should still be possible with a row key range query though ...if the ordered partitioner is being used) Q: Is the only difference the limitation from the row size? What are the performance considerations weighting in for one or the other approach. Does Cassandra first has to load the whole row into memory before one can access e.g. "id/1" with the super column approach? cheers -- Torsten
Re: Cassandra compaction disk space logic
2010/5/19 Maxim Kramarenko : > Hi! > > We have mail archive application, so we have a lot of data (30TB on multiple > nodes) and should delete data after a few months of storing. > > Questions are: > > 1) Compaction require extra space to process. What happend if node have no > extra space for compaction ? Will it crash, or just stop compaction process > ? Stop compaction. > 2) Is it possible to limit max SSTable file size ? I am worry about > following situation: we have 1TB disk, 600 GB of data in single file and > should delete 50 GB of outdated data. This can lead to another 550 GB data > file generation, which cannot fit on the disk. Not currently. You have to manage this operationally. > 3) If we have 30 TB in data and replicas, how much disk space required to > handle this, including adding new data, deleting old, compaction, etc ? All of that depends on the cardinality of those operations (how much is deleted, etc.). You're going to have to benchmark it. > 4) What occurs, if we run decommission, but target node have not enough disk > space ? I don't know. Please let me know when you find out. :) -ryan
Ring out of sync, cassandra_UnavailableException being thrown
in a 5 node cluster, i noticed in our client error log that one of the nodes was consistently throwing cassandra_UnavailableException during a read operation. looking into jmx, it was obvious that one of the node's view of the ring was out of sync. $ nodetool -host 192.168.20.150 ring Address Status Load Range Ring 139508497374977076191526400448759597506 192.168.20.156Up 5.73 GB 733665530305941485083898696792520436 |<--| 192.168.20.158Up 3.41 GB 9629533262984150011756238989685472219 | ^ 192.168.20.154Up 2.44 GB 31048334058970902242412812423471654868 v | 192.168.20.150Up 4.89 GB 105769574715070648260922426249777160699| ^ 192.168.20.152Up 5.24 GB 139508497374977076191526400448759597506|-->| $ nodetool -host 192.168.20.158 ring Address Status Load Range Ring 192.168.20.158Up 3.41 GB 9629533262984150011756238989685472219 |<--| looking at the CF stats on that node, it is obvious that reads and writes are happening, but i have to assume that those are coming from proxy connections via the other nodes. when restarting that node, the error logs in the other cluster nodes show that they detect the server going away and then coming back into the ring. INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:39,448 OutboundTcpConnection.java (line 102) error writing to /192.168.20.158 INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:55,475 OutboundTcpConnection.java (line 102) error writing to /192.168.20.158 INFO [GMFD:1] 2010-05-19 21:27:56,481 Gossiper.java (line 582) Node /192.168.20.158 has restarted, now UP again INFO [GMFD:1] 2010-05-19 21:27:56,482 StorageService.java (line 538) Node /192.168.20.158 state jump to normal any ideas on how to kick that node and remind it of its buddies? thanks! -keith
Re: Disk usage doubled after nodetool decommission and node still in ring
Run nodetool streams. On May 18, 2010 4:14 PM, "Maxim Kramarenko" wrote: Hi! After nodetool decomission data size on all nodes grow twice, node still up and in ring, and no streaming now / tmp SSTables now. BTW, I have ssh connection to server, so after run nodetool decommission I expect, that server receive the command press Ctrl-C and close shell. It is correct ? What is the best way to check current node state to check, is decommission is finished ? Should node accept new data after I run "decommission" command ?
Re: ConcurrentModificationException in gossiper while decommissioning another node
that sounds like it, thanks On Tue, May 18, 2010 at 3:53 PM, roger schildmeijer wrote: > This is hopefully fixed in trunk (CASSANDRA-757 (revision 938597)); > "Replace synchronization in Gossiper with concurrent data structures and > volatile fields." > > // Roger Schildmeijer > > > On Tue, May 18, 2010 at 1:55 PM, Ran Tavory wrote: > >> While the node 192.168.252.61 was in the process of decommissioning I see >> this error in two other nodes: >> >> INFO [Timer-1] 2010-05-18 06:01:12,048 Gossiper.java (line 179) >> InetAddress /192.168.252.62 is now dead. >> INFO [GMFD:1] 2010-05-18 06:04:00,189 Gossiper.java (line 568) >> InetAddress /192.168.252.62 is now UP >> INFO [Timer-1] 2010-05-18 06:11:45,311 Gossiper.java (line 401) FatClient >> /192.168.252.61 has been silent for 360ms, removing from gossip >> ERROR [Timer-1] 2010-05-18 06:11:45,315 CassandraDaemon.java (line 88) >> Fatal exception in thread Thread[Timer-1,5,main] >> java.lang.RuntimeException: java.util.ConcurrentModificationException >> at >> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:97) >> at java.util.TimerThread.mainLoop(Timer.java:512) >> at java.util.TimerThread.run(Timer.java:462) >> Caused by: java.util.ConcurrentModificationException >> at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) >> at >> org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:382) >> at >> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:91) >> ... 2 more >> >> >> .61 is the decommissioned node. .62 was under load (streams transferred to >> it from .61) >> >> I simply ran nodetool decommission on the 61 node and then (after an hour, >> I guess) I saw this error in two other live nodes. >> >> Does this ring any bell? It's either a bug, or that I wasn't >> running decommission correctly... >> > >
Re: decommission and org.apache.thrift.TApplicationException: get_slice failed: unknown result
My decommission was progressing OK, although very slow, but I'll send another question to the list about that... The exception must be a hiccup, I hope I won't get it again I suppose... On Tue, May 18, 2010 at 4:10 PM, Gary Dusbabek wrote: > If I had to guess, I'd say that something at the transport layer had > trouble. Possibly some kind of thrift hiccup that we haven't seen > before. > > Your description makes it sound as if the decommission is proceeding > normally though. > > Gary. > > On Tue, May 18, 2010 at 04:42, Ran Tavory wrote: > > What's the correct way to remove a node from a cluster? > > According to this page http://wiki.apache.org/cassandra/Operations a > > decommission call should be enough. > > When decommissioning one of the nodes from my cluster I see an error in > the > > client: > > org.apache.thrift.TApplicationException: get_slice failed: unknown result > >at > > > org.apache.cassandra.thrift.Cassandra$Client.recv_get_slice(Cassandra.java:407) > >at > > > org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:367) > > > > The client isn't talking to the decommissioned node, it's connected to > > another node, so I'd expect all operations to continue as normal > (although > > slower), right? > > I simply called "nodetool -h ... decommission" on the host and waited. > After > > a while, while the node is still decommissioning I saw the error at the > > client. > > The current state of the node is Decommissioned and it's not in the ring > > now. It is still moving streams to other hosts, though. I can't be sure, > > though whether the error happened during the time it was Leaving the ring > or > > was it already Decommissioned. > > The server logs don't show something of note (no errors or warnings). > > What do you think? >
how to decommission two slow nodes?
In my cluster setup I have two datacenters with 5 hosts in one DC and 3 in the other. In the 5 hosts DC I'd like to remove two hosts so I'd get 3 and 3 in each. The two nodes I'd like to decommission have less RAM than the other 3 so they operate slower. What's the most effective way to decommission them? At first I thought I'd decommission the first and then when it's done, decommission the second, but the problem was that when I decommissioned the first it started streaming its data to the second node (as well as others I think) and since the second node was under heavy load, and not enough ram, it was busy GCing and worked horribly slow. Eventually, after almost 24h of horribly slow streaming I gave up. This also caused the entire cluster to operate horribly slow. So, is there a better way to decommission the two under provisioned nodes without slowing down the cluster, or at least with a minimum effect? My replication is 2 and I'm using a RackAwareStrategy so (if everything is configured correctly with the EndPointSnitch) then at any given time, two copies of the data exist, one in each DC. Thanks