Re: compression
hi Ran, I think there's no compression on the sever end. I am doing the gzip compression on the client side myself. cheers, Cao Jiguang 2010-04-01 casablinca126.com 发件人: Ran Tavory 发送时间: 2010-04-01 14:37:59 收件人: user@cassandra.apache.org 抄送: 主题: compression What sort of compression (if any) is performed by cassandra? Does the thrift client compress anything before sending to the server to preserve bandwidth? Does the server compress the values in the columns to preserve disk or memory? ... I assume compaction, performed on the server side, is different than compression... however, does compaction include any compression features as well? Thanks
Re: Read Performance
Hi James, I don't know how to get the below statistics data and calculate the access times (read/write in ms) in your previous mails. Can you explain a little? Iike to work on it also. CD On Thu, Apr 1, 2010 at 4:15 AM, Jonathan Ellis wrote: > On Wed, Mar 31, 2010 at 6:21 PM, James Golick > wrote: > > Keyspace: ActivityFeed > > Read Count: 699443 > > Read Latency: 16.11017477192566 ms. > > > Column Family: Events > > Read Count: 232378 > > Read Latency: 0.396 ms. > > Row cache capacity: 50 > > Row cache size: 62768 > > Row cache hit rate: 0.007716049382716049 > > This says that > > - recent queries to Events are much faster than the lifetime average > for your Keyspace > - even though you have almost no row cache hits (~1700 out of 232000 > reads) > > Not sure what to make of that, tbh. If it were me I would try to > reproduce on a test machine w/o all that pesky live traffic confusing > things. > > -Jonathan >
RE: compression
Thrift client doesn’t seem to compress anything unless you change thrift protocol or use a transport that support compression. I modified TSocket to support compression but it occasionally has broken pipe error due to crappy Java zlib support (so that clients has to reconnect to get around the socket error). This is a support in transport layer meaning you’ll get compression support for all or none. Cassandra server doesn’t seem to support compression either and we are doing that for memory cache by plugging memcached into Cassandra. Still testing… -Weijun From: Ran Tavory [mailto:ran...@gmail.com] Sent: Wednesday, March 31, 2010 11:37 PM To: user@cassandra.apache.org Subject: compression What sort of compression (if any) is performed by cassandra? Does the thrift client compress anything before sending to the server to preserve bandwidth? Does the server compress the values in the columns to preserve disk or memory? ... I assume compaction, performed on the server side, is different than compression... however, does compaction include any compression features as well? Thanks
Re: expiring data out of Cassandra/time to live
> On that topic, what exactly is keeping this feature out of the official > releases? The patch changes the thrift API. Among possibly other reason, I think it was one reason why it wasn't even consider for inclusion in the 0.6 branch. As for trunk (and for the future 0.7 thus), there is scheduled internal changes (vector clocks and changes to the SSTable format at least) that will force this patch to be rewritten somehow. I think that is part of the reasons why it is not yet included. But of course, that being said, I'm all for an inclusion. (as a side node, patch for the 0.6 version are (now) attached to the jira ticket. Should make it much more easier for those who want to test than checking the old svn version and merge back to 0.6) > > On Wed, Mar 31, 2010 at 3:43 PM, Daniel Kluesing wrote: >> >> We also applied this patch to the 0.6 branch and have been running it for >> a bit over a week. Works well, would love to see it get into trunk/0.7 >> proper. >> >> >> >> From: Ryan Daum [mailto:r...@thimbleware.com] >> Sent: Wednesday, March 31, 2010 11:49 AM >> To: user@cassandra.apache.org >> Subject: Re: expiring data out of Cassandra/time to live >> >> >> >> I was able to successfully merge this patch into the 0.6 branch a few >> weeks ago by doing the following: >> >> >> >> Downloading the patch >> Checking out the trunk of Cassandra from github >> Rolling back (checking out) the git repo to the same date that the patch >> was submitted to Jira >> Applying the patch >> Committing to Git >> Merging forward to the 0.6 branch >> Resolve one or two minor conflicts. >> >> >> >> R >> >> >> >> On Wed, Mar 31, 2010 at 2:46 PM, Jonathan Ellis wrote: >> >> Sounds like you want to follow >> https://issues.apache.org/jira/browse/CASSANDRA-699. There is a patch >> there but I wouldn't recommend merging it if Java scares you. :) >> >> On Wed, Mar 31, 2010 at 1:39 PM, Mike Gallamore >> wrote: >> > Hello everyone, >> > >> > I saw a thread on the incubator user chat that started a few months ago: >> > >> > http://www.mail-archive.com/cassandra-u...@incubator.apache.org/msg02047.html >> > . It looks like this is the new official user mailing list so I'll add >> > my >> > thoughts/question here. >> > >> > Is there any way to set a TTL on data stored in Cassandra? Deleting old >> > SSTables isn't enough for my needs. I need the data to go away after a >> > fixed >> > period of time. Here is what I'm trying to do and my reasoning why I >> > think >> > Cassandra and not something like Flare/Memcache mets my need: >> > >> > I'm building a reputation system. We get lots of data at my work (in the >> > 10's of GB of reputation data a day). The trick is that old data is not >> > useful as a senders ip address might have changed, they might have had a >> > bot >> > on their system and no have removed it, etc. So I need to be able to >> > keep >> > data for a fixed period of time and then afterwords it isn't >> > needed/ideally >> > would be GC'd out. >> > >> > We want to do one thing if we either never heard of the individual or at >> > least not since the expiry time, and another thing based on the >> > reputation >> > data that is stored in Cassandra if it is current. So ideally a >> > Cassandra >> > call for a key for someone who's reputation is expired would return >> > nothing >> > and we'd reply with our default reputation for that individual. There >> > really >> > is no point using network bandwidth to return all the fields associated >> > with >> > that key only to look at a timestamp and end up ignoring it anyways. >> > Similarly the latency of requesting first the timestamp and then the >> > data in >> > two separate requests is prohibitive. >> > >> > Why Cassandra: >> > >> > Our data is complex and is hard to handle completely in a key/value >> > sense. >> > In the past we were doing this and just encoding the complex structure >> > inside of JSON but this isn't ideal. It is very nice algorithmically to >> > be >> > able to say: give me this column, or update this element of this hash >> > etc, >> > rather than having to pull the old version, decode, modify, re-encode >> > and >> > push back to a cache based system. >> > Our data is large (in the low TB's at the moment, but expected to grow >> > to >> > 50-100TB of live data) >> > Need quick response for both searches and writes: typically for each >> > thing >> > we track we get a request for the reputation, the message gets processed >> > and >> > then we get feedback back from the recipient. So reads and writes are >> > symmetric. >> > High request rate: millions per hour >> > hundreds of millions of unique reputations (this is way crawling though >> > the >> > data with a script purging old data doesn't make sense) >> > Availablity/load balancing a must. Data needs to be replicated a disk >> > copy >> > is useful so if we have a power outage we don't lose the system. >> > It would be interesting to keep a local subset of our data at customers >> > sites and
Re: Cassandra data file corrupt
Dear David Timothy Strauss, Could you tell me more detail about Backups? As I know, Cassandra data file will compact new data, so it can be changed many times. How to backup Cassandra data? Thanks. On Wed, Mar 31, 2010 at 8:16 AM, David Timothy Strauss < da...@fourkitchens.com> wrote: > Cassandra has always supported two great ways to prevent data loss: > > * Replication > * Backups > > I doubt Cassandra will ever focus extensively on single-node recovery when > it's so easy to wipe and rebuild any node from the cluster. > -- > *From: * JKnight JKnight > *Date: *Wed, 31 Mar 2010 03:48:01 -0400 > *To: * > *Subject: *Cassandra data file corrupt > > Dear all, > > My Cassandra data file had problem and I can not get data from this file. > And all row after error row can not be accessed. So I lost a lot of data. > > Will next version of Cassandra implement the way to prevent data lost. > Maybe we use the checkpoint. If data file corrupt, we will read from the > next checkpoint. > > If not, can you suggest me the way to implement this function? > > -- > Best regards, > JKnight > -- Best regards, JKnight
Re: compression
To Cao Jiguang I was watching this presentation on bigtable yesterday http://video.google.com/videoplay?docid=7278544055668715642# and Jeff mentioned that they compared three different compression libraries BMDiff, LZO and gzip. Apparently, gzip was the most cpu intensive and they ended up going with BMDiff. I didn't find any Open source / Free implementation of BMDiff but I found LZO. http://www.oberhumer.com/opensource/lzo/ Thanks -Venu On Thu, Apr 1, 2010 at 3:07 AM, Weijun Li wrote: > Thrift client doesn’t seem to compress anything unless you change thrift > protocol or use a transport that support compression. I modified TSocket to > support compression but it occasionally has broken pipe error due to crappy > Java zlib support (so that clients has to reconnect to get around the socket > error). This is a support in transport layer meaning you’ll get compression > support for all or none. > > > > Cassandra server doesn’t seem to support compression either and we are > doing that for memory cache by plugging memcached into Cassandra. Still > testing… > > > > -Weijun > > > > *From:* Ran Tavory [mailto:ran...@gmail.com] > *Sent:* Wednesday, March 31, 2010 11:37 PM > > *To:* user@cassandra.apache.org > *Subject:* compression > > > > What sort of compression (if any) is performed by cassandra? > > Does the thrift client compress anything before sending to the server to > preserve bandwidth? > > Does the server compress the values in the columns to preserve disk or > memory? > > > > ... I assume compaction, performed on the server side, is different than > compression... however, does compaction include any compression features as > well? > > > > Thanks >
Stalled Bootstrapping Process
So we are adding another node to the cluster with the latest 0.6 branch (RC1). It seems to be hung in some limbo state. Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4 machines, with RF=3. One machine had more load than the others, and sure enough bootstrapping selected that node. That is the red machine. The light blue machine is the new machine. I have attached a graph to illustrate when the bootstrap process started. In jconsole the streamingservice status was "performing anticompaction..." for over 18-24 hrs. It is currently in "nothing is happening". It did have 1 active STREAM-STAGE task, but the machine had to be rebooted for something unrelated to cassandra. Now the light blue machine appears to be getting data, but its growing at virtually the same rate as the other machines which makes me think it is part of the cluster and not actually streaming data from the machine its supposed to. Any other ideas on how to debug? -- Dan Di Spaltro <>
Re: compression
On Thu, Apr 1, 2010 at 8:27 AM, Rao Venugopal wrote: > To Cao Jiguang > > I was watching this presentation on bigtable yesterday > http://video.google.com/videoplay?docid=7278544055668715642# > > and Jeff mentioned that they compared three different compression libraries > BMDiff, LZO and gzip. Apparently, gzip was the most cpu intensive and they > ended up going with BMDiff. > I didn't find any Open source / Free implementation of BMDiff but I found > LZO. > http://www.oberhumer.com/opensource/lzo/ Another IMO good alternative is LZF -- it has characteristics similar to LZO. Gzip (i.e. deflate) is a two-phase compressor, with usual lempel-ziv first, then huffman (oldest statistical encoding). LZO, LZF and most other newer simpler but less compressing variants usually only do lempel-ziv. Why LZF? Because there are simple Java free+open implementations: H2 has codec, I ported it to Voldemort, and I think there was talk of generalizing one from H2 as stand-alone codec for reuse. Possibly others may have ported it for other libs/frameworks too (there were multiple jira issues for adding some of these to hadoop). Block format itself is simple, and it is possible to decode adjacent blocks separately by skipping encoded blocks without decoding: this can be used to allow some level of random access (access random block, decode it, access something inside the block). Performance-wise simpler codecs are fast enough to add less overhead than fastest parsing of textual formats (json, xml), but more importantly, they are MUCH faster to write (once again, not much more overhead than format encoding). It is compression speed that really kills gzip, esp. since it is often server that has to do it, for small-requests, large-responses. -+ Tatu +-
Re: Stalled Bootstrapping Process
Does the JMX StreamingService list any incoming/outgoing files/hosts on the sending/receiving nodes? Gary. On Thu, Apr 1, 2010 at 10:26, Dan Di Spaltro wrote: > So we are adding another node to the cluster with the latest 0.6 branch > (RC1). It seems to be hung in some limbo state. > Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4 > machines, with RF=3. One machine had more load than the others, and sure > enough bootstrapping selected that node. That is the red machine. The > light blue machine is the new machine. > I have attached a graph to illustrate when the bootstrap process started. > In jconsole the streamingservice status was "performing anticompaction..." > for over 18-24 hrs. It is currently in "nothing is happening". It did > have 1 active STREAM-STAGE task, but the machine had to be rebooted for > something unrelated to cassandra. Now the light blue machine appears to be > getting data, but its growing at virtually the same rate as the other > machines which makes me think it is part of the cluster and not actually > streaming data from the machine its supposed to. > Any other ideas on how to debug? > > -- > Dan Di Spaltro >
LazyBoy question
I am trying out the lazyboy library to access cassandra, I was able to get the data in and out using Record save/load functions. Is there a way to get a slice, or all the records under a CF so I can iterate? It is probably a naive question, as I am just getting into this field Thanks, Gary
Re: Stalled Bootstrapping Process
The light-blue machine is in Operation Mode: Bootstrap On Thu, Apr 1, 2010 at 9:26 AM, Dan Di Spaltro wrote: > So we are adding another node to the cluster with the latest 0.6 branch > (RC1). It seems to be hung in some limbo state. > > Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4 > machines, with RF=3. One machine had more load than the others, and sure > enough bootstrapping selected that node. That is the red machine. The > light blue machine is the new machine. > > I have attached a graph to illustrate when the bootstrap process started. > > In jconsole the streamingservice status was "performing anticompaction..." > for over 18-24 hrs. It is currently in "nothing is happening". It did > have 1 active STREAM-STAGE task, but the machine had to be rebooted for > something unrelated to cassandra. Now the light blue machine appears to be > getting data, but its growing at virtually the same rate as the other > machines which makes me think it is part of the cluster and not actually > streaming data from the machine its supposed to. > > Any other ideas on how to debug? > > > -- > Dan Di Spaltro > -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
which node rebooted, the red one, or the blue one? On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro wrote: > So we are adding another node to the cluster with the latest 0.6 branch > (RC1). It seems to be hung in some limbo state. > Before bootstrapping our cluster had 50-60GB spread fairly evenly across 4 > machines, with RF=3. One machine had more load than the others, and sure > enough bootstrapping selected that node. That is the red machine. The > light blue machine is the new machine. > I have attached a graph to illustrate when the bootstrap process started. > In jconsole the streamingservice status was "performing anticompaction..." > for over 18-24 hrs. It is currently in "nothing is happening". It did > have 1 active STREAM-STAGE task, but the machine had to be rebooted for > something unrelated to cassandra. Now the light blue machine appears to be > getting data, but its growing at virtually the same rate as the other > machines which makes me think it is part of the cluster and not actually > streaming data from the machine its supposed to. > Any other ideas on how to debug? > > -- > Dan Di Spaltro >
Re: Stalled Bootstrapping Process
Red one. Gary - both say nothing is happening with no destinations or sources. On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis wrote: > which node rebooted, the red one, or the blue one? > > On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro > wrote: > > So we are adding another node to the cluster with the latest 0.6 branch > > (RC1). It seems to be hung in some limbo state. > > Before bootstrapping our cluster had 50-60GB spread fairly evenly across > 4 > > machines, with RF=3. One machine had more load than the others, and > sure > > enough bootstrapping selected that node. That is the red machine. The > > light blue machine is the new machine. > > I have attached a graph to illustrate when the bootstrap process started. > > In jconsole the streamingservice status was "performing > anticompaction..." > > for over 18-24 hrs. It is currently in "nothing is happening". It did > > have 1 active STREAM-STAGE task, but the machine had to be rebooted for > > something unrelated to cassandra. Now the light blue machine appears to > be > > getting data, but its growing at virtually the same rate as the other > > machines which makes me think it is part of the cluster and not actually > > streaming data from the machine its supposed to. > > Any other ideas on how to debug? > > > > -- > > Dan Di Spaltro > > > -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has 0 in STREAM-STAGE. On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro wrote: > Red one. > > Gary - both say nothing is happening with no destinations or sources. > > > On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis wrote: > >> which node rebooted, the red one, or the blue one? >> >> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro >> wrote: >> > So we are adding another node to the cluster with the latest 0.6 branch >> > (RC1). It seems to be hung in some limbo state. >> > Before bootstrapping our cluster had 50-60GB spread fairly evenly across >> 4 >> > machines, with RF=3. One machine had more load than the others, and >> sure >> > enough bootstrapping selected that node. That is the red machine. The >> > light blue machine is the new machine. >> > I have attached a graph to illustrate when the bootstrap process >> started. >> > In jconsole the streamingservice status was "performing >> anticompaction..." >> > for over 18-24 hrs. It is currently in "nothing is happening". It did >> > have 1 active STREAM-STAGE task, but the machine had to be rebooted for >> > something unrelated to cassandra. Now the light blue machine appears to >> be >> > getting data, but its growing at virtually the same rate as the other >> > machines which makes me think it is part of the cluster and not actually >> > streaming data from the machine its supposed to. >> > Any other ideas on how to debug? >> > >> > -- >> > Dan Di Spaltro >> > >> > > > > -- > Dan Di Spaltro > -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
Bootstrap source restarting will always fail bootstrap. You'll need to restart the blue one too now, I'm afraid. On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro wrote: > Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has 0 in > STREAM-STAGE. > > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro > wrote: >> >> Red one. >> Gary - both say nothing is happening with no destinations or sources. >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis wrote: >>> >>> which node rebooted, the red one, or the blue one? >>> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro >>> wrote: >>> > So we are adding another node to the cluster with the latest 0.6 branch >>> > (RC1). It seems to be hung in some limbo state. >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly >>> > across 4 >>> > machines, with RF=3. One machine had more load than the others, and >>> > sure >>> > enough bootstrapping selected that node. That is the red machine. >>> > The >>> > light blue machine is the new machine. >>> > I have attached a graph to illustrate when the bootstrap process >>> > started. >>> > In jconsole the streamingservice status was "performing >>> > anticompaction..." >>> > for over 18-24 hrs. It is currently in "nothing is happening". It >>> > did >>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted for >>> > something unrelated to cassandra. Now the light blue machine appears to >>> > be >>> > getting data, but its growing at virtually the same rate as the other >>> > machines which makes me think it is part of the cluster and not >>> > actually >>> > streaming data from the machine its supposed to. >>> > Any other ideas on how to debug? >>> > >>> > -- >>> > Dan Di Spaltro >>> > >> >> >> >> -- >> Dan Di Spaltro > > > > -- > Dan Di Spaltro >
Re: Stalled Bootstrapping Process
Okay, so should I run any more commands like cleanup before? On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis wrote: > Bootstrap source restarting will always fail bootstrap. You'll need > to restart the blue one too now, I'm afraid. > > On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro > wrote: > > Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has 0 > in > > STREAM-STAGE. > > > > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro > > > wrote: > >> > >> Red one. > >> Gary - both say nothing is happening with no destinations or sources. > >> > >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis > wrote: > >>> > >>> which node rebooted, the red one, or the blue one? > >>> > >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro < > dan.dispal...@gmail.com> > >>> wrote: > >>> > So we are adding another node to the cluster with the latest 0.6 > branch > >>> > (RC1). It seems to be hung in some limbo state. > >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly > >>> > across 4 > >>> > machines, with RF=3. One machine had more load than the others, and > >>> > sure > >>> > enough bootstrapping selected that node. That is the red machine. > >>> > The > >>> > light blue machine is the new machine. > >>> > I have attached a graph to illustrate when the bootstrap process > >>> > started. > >>> > In jconsole the streamingservice status was "performing > >>> > anticompaction..." > >>> > for over 18-24 hrs. It is currently in "nothing is happening". It > >>> > did > >>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted > for > >>> > something unrelated to cassandra. Now the light blue machine appears > to > >>> > be > >>> > getting data, but its growing at virtually the same rate as the other > >>> > machines which makes me think it is part of the cluster and not > >>> > actually > >>> > streaming data from the machine its supposed to. > >>> > Any other ideas on how to debug? > >>> > > >>> > -- > >>> > Dan Di Spaltro > >>> > > >> > >> > >> > >> -- > >> Dan Di Spaltro > > > > > > > > -- > > Dan Di Spaltro > > > -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
There shouldn't be anything to clean up. (The temporary streaming files it anticompacted are automatically removed on restart) On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro wrote: > Okay, so should I run any more commands like cleanup before? > > On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis wrote: >> >> Bootstrap source restarting will always fail bootstrap. You'll need >> to restart the blue one too now, I'm afraid. >> >> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro >> wrote: >> > Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has 0 >> > in >> > STREAM-STAGE. >> > >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro >> > >> > wrote: >> >> >> >> Red one. >> >> Gary - both say nothing is happening with no destinations or sources. >> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis >> >> wrote: >> >>> >> >>> which node rebooted, the red one, or the blue one? >> >>> >> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro >> >>> >> >>> wrote: >> >>> > So we are adding another node to the cluster with the latest 0.6 >> >>> > branch >> >>> > (RC1). It seems to be hung in some limbo state. >> >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly >> >>> > across 4 >> >>> > machines, with RF=3. One machine had more load than the others, >> >>> > and >> >>> > sure >> >>> > enough bootstrapping selected that node. That is the red machine. >> >>> > The >> >>> > light blue machine is the new machine. >> >>> > I have attached a graph to illustrate when the bootstrap process >> >>> > started. >> >>> > In jconsole the streamingservice status was "performing >> >>> > anticompaction..." >> >>> > for over 18-24 hrs. It is currently in "nothing is happening". It >> >>> > did >> >>> > have 1 active STREAM-STAGE task, but the machine had to be rebooted >> >>> > for >> >>> > something unrelated to cassandra. Now the light blue machine appears >> >>> > to >> >>> > be >> >>> > getting data, but its growing at virtually the same rate as the >> >>> > other >> >>> > machines which makes me think it is part of the cluster and not >> >>> > actually >> >>> > streaming data from the machine its supposed to. >> >>> > Any other ideas on how to debug? >> >>> > >> >>> > -- >> >>> > Dan Di Spaltro >> >>> > >> >> >> >> >> >> >> >> -- >> >> Dan Di Spaltro >> > >> > >> > >> > -- >> > Dan Di Spaltro >> > > > > > -- > Dan Di Spaltro >
Re: Stalled Bootstrapping Process
But I didn't restart the red one. On Thu, Apr 1, 2010 at 12:18 PM, Jonathan Ellis wrote: > There shouldn't be anything to clean up. (The temporary streaming > files it anticompacted are automatically removed on restart) > > On Thu, Apr 1, 2010 at 2:17 PM, Dan Di Spaltro > wrote: > > Okay, so should I run any more commands like cleanup before? > > > > On Thu, Apr 1, 2010 at 12:09 PM, Jonathan Ellis > wrote: > >> > >> Bootstrap source restarting will always fail bootstrap. You'll need > >> to restart the blue one too now, I'm afraid. > >> > >> On Thu, Apr 1, 2010 at 2:01 PM, Dan Di Spaltro > > >> wrote: > >> > Before the Red one rebooted it had 1 active STREAM-STAGE. Now it has > 0 > >> > in > >> > STREAM-STAGE. > >> > > >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro > >> > > >> > wrote: > >> >> > >> >> Red one. > >> >> Gary - both say nothing is happening with no destinations or sources. > >> >> > >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis > >> >> wrote: > >> >>> > >> >>> which node rebooted, the red one, or the blue one? > >> >>> > >> >>> On Thu, Apr 1, 2010 at 11:26 AM, Dan Di Spaltro > >> >>> > >> >>> wrote: > >> >>> > So we are adding another node to the cluster with the latest 0.6 > >> >>> > branch > >> >>> > (RC1). It seems to be hung in some limbo state. > >> >>> > Before bootstrapping our cluster had 50-60GB spread fairly evenly > >> >>> > across 4 > >> >>> > machines, with RF=3. One machine had more load than the others, > >> >>> > and > >> >>> > sure > >> >>> > enough bootstrapping selected that node. That is the red > machine. > >> >>> > The > >> >>> > light blue machine is the new machine. > >> >>> > I have attached a graph to illustrate when the bootstrap process > >> >>> > started. > >> >>> > In jconsole the streamingservice status was "performing > >> >>> > anticompaction..." > >> >>> > for over 18-24 hrs. It is currently in "nothing is happening". > It > >> >>> > did > >> >>> > have 1 active STREAM-STAGE task, but the machine had to be > rebooted > >> >>> > for > >> >>> > something unrelated to cassandra. Now the light blue machine > appears > >> >>> > to > >> >>> > be > >> >>> > getting data, but its growing at virtually the same rate as the > >> >>> > other > >> >>> > machines which makes me think it is part of the cluster and not > >> >>> > actually > >> >>> > streaming data from the machine its supposed to. > >> >>> > Any other ideas on how to debug? > >> >>> > > >> >>> > -- > >> >>> > Dan Di Spaltro > >> >>> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Dan Di Spaltro > >> > > >> > > >> > > >> > -- > >> > Dan Di Spaltro > >> > > > > > > > > > -- > > Dan Di Spaltro > > > -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro wrote: > But I didn't restart the red one. >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro >> >> > >> >> > wrote: >> >> >> >> >> >> Red one. >> >> >> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis >> >> >> wrote: >> >> >>> >> >> >>> which node rebooted, the red one, or the blue one? I'm confused.
Re: Stalled Bootstrapping Process
Sorry I meant the red one restarted about a day ago. The graph shows the dip in disk space. But it no where near returned to the previous amount of disk usage. I was referring to how the red one didn't reclaim all its space (I figure about 60gb actually belong on that machine) Is that normal (its currently taking up about 100gb)? 2 minutes ago, I restarted the blue one. Now the streamservice task is performing anti-compaction on the red one. On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis wrote: > > On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro > wrote: > > But I didn't restart the red one. > > >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Red one. > >> >> >> > >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis > >> >> >> wrote: > >> >> >>> > >> >> >>> which node rebooted, the red one, or the blue one? > > I'm confused. -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
So it looks like its still performing anti-compaction. The compactionmanager is the best way to track this? On Thu, Apr 1, 2010 at 12:31 PM, Dan Di Spaltro wrote: > Sorry I meant the red one restarted about a day ago. The graph shows > the dip in disk space. But it no where near returned to the previous > amount of disk usage. I was referring to how the red one didn't > reclaim all its space (I figure about 60gb actually belong on that > machine) Is that normal (its currently taking up about 100gb)? > > 2 minutes ago, I restarted the blue one. > > Now the streamservice task is performing anti-compaction on the red one. > > On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis wrote: >> >> On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro >> wrote: >> > But I didn't restart the red one. >> >> >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> Red one. >> >> >> >> >> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> which node rebooted, the red one, or the blue one? >> >> I'm confused. > > -- > Dan Di Spaltro > -- Dan Di Spaltro
Re: Stalled Bootstrapping Process
Right. On Thu, Apr 1, 2010 at 3:15 PM, Dan Di Spaltro wrote: > So it looks like its still performing anti-compaction. The > compactionmanager is the best way to track this? > > On Thu, Apr 1, 2010 at 12:31 PM, Dan Di Spaltro > wrote: >> Sorry I meant the red one restarted about a day ago. The graph shows >> the dip in disk space. But it no where near returned to the previous >> amount of disk usage. I was referring to how the red one didn't >> reclaim all its space (I figure about 60gb actually belong on that >> machine) Is that normal (its currently taking up about 100gb)? >> >> 2 minutes ago, I restarted the blue one. >> >> Now the streamservice task is performing anti-compaction on the red one. >> >> On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis wrote: >>> >>> On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro >>> wrote: >>> > But I didn't restart the red one. >>> >>> >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro >>> >> >> > >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> Red one. >>> >> >> >> >>> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis >>> >> >> >> wrote: >>> >> >> >>> >>> >> >> >>> which node rebooted, the red one, or the blue one? >>> >>> I'm confused. >> >> -- >> Dan Di Spaltro >> > > > > -- > Dan Di Spaltro >
Re: Stalled Bootstrapping Process
Seems to be doing more stuff now. Ive attached an updated screenshot. On Thu, Apr 1, 2010 at 1:16 PM, Jonathan Ellis wrote: > Right. > > On Thu, Apr 1, 2010 at 3:15 PM, Dan Di Spaltro > wrote: >> So it looks like its still performing anti-compaction. The >> compactionmanager is the best way to track this? >> >> On Thu, Apr 1, 2010 at 12:31 PM, Dan Di Spaltro >> wrote: >>> Sorry I meant the red one restarted about a day ago. The graph shows >>> the dip in disk space. But it no where near returned to the previous >>> amount of disk usage. I was referring to how the red one didn't >>> reclaim all its space (I figure about 60gb actually belong on that >>> machine) Is that normal (its currently taking up about 100gb)? >>> >>> 2 minutes ago, I restarted the blue one. >>> >>> Now the streamservice task is performing anti-compaction on the red one. >>> >>> On Thu, Apr 1, 2010 at 12:25 PM, Jonathan Ellis wrote: On Thu, Apr 1, 2010 at 2:22 PM, Dan Di Spaltro wrote: > But I didn't restart the red one. >> >> > On Thu, Apr 1, 2010 at 11:57 AM, Dan Di Spaltro >> >> > >> >> > wrote: >> >> >> >> >> >> Red one. >> >> >> >> >> >> On Thu, Apr 1, 2010 at 11:55 AM, Jonathan Ellis >> >> >> >> >> >> wrote: >> >> >>> >> >> >>> which node rebooted, the red one, or the blue one? I'm confused. >>> >>> -- >>> Dan Di Spaltro >>> >> >> >> >> -- >> Dan Di Spaltro >> > -- Dan Di Spaltro <>
Re: Read Performance
I don't have the additional hardware to try to isolate this issue atm, so I decided to push some code that performs 20% of reads directly from cassandra. The cache hit rate has gone up to about 88% now and it's still climbing, albeit slowly. There remains plenty of free cache space. So far, the average time to multi_get those 20 rows is still hovering around 35-45ms. I'll report back with more info as it comes in. On Thu, Apr 1, 2010 at 12:06 AM, Cemal Dalar wrote: > Hi James, > > I don't know how to get the below statistics data and calculate the access > times (read/write in ms) in your previous mails. Can you explain a little? > Iike to work on it also. > > CD > > > On Thu, Apr 1, 2010 at 4:15 AM, Jonathan Ellis wrote: > >> On Wed, Mar 31, 2010 at 6:21 PM, James Golick >> wrote: >> > Keyspace: ActivityFeed >> > Read Count: 699443 >> > Read Latency: 16.11017477192566 ms. >> >> > Column Family: Events >> > Read Count: 232378 >> > Read Latency: 0.396 ms. >> > Row cache capacity: 50 >> > Row cache size: 62768 >> > Row cache hit rate: 0.007716049382716049 >> >> This says that >> >> - recent queries to Events are much faster than the lifetime average >> for your Keyspace >> - even though you have almost no row cache hits (~1700 out of 232000 >> reads) >> >> Not sure what to make of that, tbh. If it were me I would try to >> reproduce on a test machine w/o all that pesky live traffic confusing >> things. >> >> -Jonathan >> > >
Re: Read Performance
Taking our flamewar offline. :-D On Thu, Apr 1, 2010 at 1:36 PM, James Golick wrote: > I don't have the additional hardware to try to isolate this issue atm You'd be able to spin up hardware to isolate that issue on AWS. ;) --Joe
Re: Read Performance
Or rackspace. ;) On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: > Taking our flamewar offline. :-D > > On Thu, Apr 1, 2010 at 1:36 PM, James Golick wrote: >> I don't have the additional hardware to try to isolate this issue atm > > You'd be able to spin up hardware to isolate that issue on AWS. ;) > > --Joe >
Re: Read Performance
Damnit! On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: > Or rackspace. ;) > > On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: > > Taking our flamewar offline. :-D > > > > On Thu, Apr 1, 2010 at 1:36 PM, James Golick > wrote: > >> I don't have the additional hardware to try to isolate this issue atm > > > > You'd be able to spin up hardware to isolate that issue on AWS. ;) > > > > --Joe > > >
Re: Stalled Bootstrapping Process
I would turn debug logging on globally on the new node, that will answer more questions than just the streaming package.
Re: Read Performance
pwned. On Thu, Apr 1, 2010 at 2:09 PM, James Golick wrote: > Damnit! > > > On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: > >> Or rackspace. ;) >> >> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: >> > Taking our flamewar offline. :-D >> > >> > On Thu, Apr 1, 2010 at 1:36 PM, James Golick >> wrote: >> >> I don't have the additional hardware to try to isolate this issue atm >> > >> > You'd be able to spin up hardware to isolate that issue on AWS. ;) >> > >> > --Joe >> > >> > >
Proxy instances?
Is it possible to have Cassandra instances that serve only as proxies to the rest of the cluster, but have no storage themselves? Maybe with a keyspace length of 0?
Re: Proxy instances?
On Thu, Apr 1, 2010 at 7:19 PM, David King wrote: > Is it possible to have Cassandra instances that serve only as proxies to > the rest of the cluster, but have no storage themselves? Maybe with a > keyspace length of 0? contrib/client_only is what you're looking for. -Brandon
Creating a Total Ordered Queue in Cassandra
I'm in the process of implementing a Totally Ordered Queue in Cassandra, and wanted to bounce my ideas off the list and also see if there are any other suggestions. I've come up with an external source of ID's that are always increasing (but not monotonic), and I've also used external synchronization to ensure only one writer to a given queue. And I handle de-duping in the app. My current solution is : (simplified) Use the "QueueId", to Key into a row of a CF. Then, every column in that CF corresponds to a new entry in the Queue, with a custom Comparator to sort the columns by my external ID that is always increasing. Technically I never delete data from the Queue, and I just page through it from a given ID using a SliceRange, etc. Obviously the problem being that the row needs to get compacted. so then I started bucketizing with multiple rows for a given queue (for example one per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...) Does this seem reasonable? It's solvable, but is starting to seem complicated to implement... It would be very easy if I didn't have to have multiple buckets.. My other thought is to store one entry per row, and perform get_range_slices and specify a KeyRange, with the OrderPreservingPartitioner. But it isn't exactly clear to me what the Order of the keys are in this system, so I don't know how to construct my key and queries appropriately... Is this Lexical String Order? Or? So for example.. Assuming my QueueId's are longs, and my ID's are also longs.. My key would be (in Java): long queueId; long msgId; key = "" + queueId + ":" + msgId; And if I wanted to do a query my key range might be from start = "" + queueId + ":0" end = "" + queueId + ":" + Long.MAX_VALUE; (Will I have to left pad the msgIds with 0's)? And is this going to be efficient if my msgId isn't monotonically increasing? Thanks, -JD
Re: Creating a Total Ordered Queue in Cassandra
you mention never deleting from the queue, so what purpose is this serving? (if you don't pop off the front, is it really a queue?) seems if guaranteed order of messages is required, there are many other projects which are focused towards that problem (rabbitmq, kestrel, activemq, etc) or am i misunderstanding your needs here? -keith On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis wrote: > I'm in the process of implementing a Totally Ordered Queue in Cassandra, and > wanted to bounce my ideas off the list and also see if there are any other > suggestions. > > I've come up with an external source of ID's that are always increasing (but > not monotonic), and I've also used external synchronization to ensure only > one writer to a given queue. And I handle de-duping in the app. > > > My current solution is : (simplified) > > Use the "QueueId", to Key into a row of a CF. > Then, every column in that CF corresponds to a new entry in the Queue, with > a custom Comparator to sort the columns by my external ID that is always > increasing. > > Technically I never delete data from the Queue, and I just page through it > from a given ID using a SliceRange, etc. > > Obviously the problem being that the row needs to get compacted. so then I > started bucketizing with multiple rows for a given queue (for example one > per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...) > > Does this seem reasonable? It's solvable, but is starting to seem > complicated to implement... It would be very easy if I didn't have to have > multiple buckets.. > > > > My other thought is to store one entry per row, and perform get_range_slices > and specify a KeyRange, with the OrderPreservingPartitioner. > But it isn't exactly clear to me what the Order of the keys are in this > system, so I don't know how to construct my key and queries appropriately... > Is this Lexical String Order? Or? > > So for example.. Assuming my QueueId's are longs, and my ID's are also > longs.. My key would be (in Java): > > long queueId; > long msgId; > > key = "" + queueId + ":" + msgId; > > And if I wanted to do a query my key range might be from > start = "" + queueId + ":0" > end = "" + queueId + ":" + Long.MAX_VALUE; > > (Will I have to left pad the msgIds with 0's)? > > And is this going to be efficient if my msgId isn't monotonically > increasing? > > Thanks, > -JD > > > > > > > > > > > > >
Re: Re: compression
hi, Great! thanks to Rao and Tatu :) I will test them and let you know what I found. regards, Cao Jiguang - 发件人:Tatu Saloranta 发送日期:2010-04-02 01:08:52 收件人:u...@cassandra.apache.org 抄送: 主题:Re: compression On Thu, Apr 1, 2010 at 8:27 AM, Rao Venugopal wrote: > To Cao Jiguang > > I was watching this presentation on bigtable yesterday > http://video.google.com/videoplay?docid=7278544055668715642# > > and Jeff mentioned that they compared three different compression libraries > BMDiff, LZO and gzip.�� Apparently, gzip was the most cpu intensive and they > ended up going with BMDiff. > I didn't find any Open source / Free implementation of BMDiff but I found > LZO. > http://www.oberhumer.com/opensource/lzo/ Another IMO good alternative is LZF -- it has characteristics similar to LZO. Gzip (i.e. deflate) is a two-phase compressor, with usual lempel-ziv first, then huffman (oldest statistical encoding). LZO, LZF and most other newer simpler but less compressing variants usually only do lempel-ziv. Why LZF? Because there are simple Java free+open implementations: H2 has codec, I ported it to Voldemort, and I think there was talk of generalizing one from H2 as stand-alone codec for reuse. Possibly others may have ported it for other libs/frameworks too (there were multiple jira issues for adding some of these to hadoop). Block format itself is simple, and it is possible to decode adjacent blocks separately by skipping encoded blocks without decoding: this can be used to allow some level of random access (access random block, decode it, access something inside the block). Performance-wise simpler codecs are fast enough to add less overhead than fastest parsing of textual formats (json, xml), but more importantly, they are MUCH faster to write (once again, not much more overhead than format encoding). It is compression speed that really kills gzip, esp. since it is often server that has to do it, for small-requests, large-responses. -+ Tatu +-
Re: Read Performance
Well, folks, I'm feeling a little stupid right now (adding to the injury inflicted by one Mr. Stump :-P). So, here's the story. The cache hit rate is up around 97% now. The ruby code is down to around 20-25ms to multiget the 20 rows. I did some profiling, though, and realized that a lot of time was being spent in thrift. Turns out, that's where pretty much all the time was going. I just ran the same test using java (scala) and the load is taking around 2-4ms. On Thu, Apr 1, 2010 at 4:37 PM, Peter Chang wrote: > pwned. > > > On Thu, Apr 1, 2010 at 2:09 PM, James Golick wrote: > >> Damnit! >> >> >> On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: >> >>> Or rackspace. ;) >>> >>> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: >>> > Taking our flamewar offline. :-D >>> > >>> > On Thu, Apr 1, 2010 at 1:36 PM, James Golick >>> wrote: >>> >> I don't have the additional hardware to try to isolate this issue atm >>> > >>> > You'd be able to spin up hardware to isolate that issue on AWS. ;) >>> > >>> > --Joe >>> > >>> >> >> >
Re: Read Performance
On Thu, Apr 1, 2010 at 9:37 PM, James Golick wrote: > Well, folks, I'm feeling a little stupid right now (adding to the injury > inflicted by one Mr. Stump :-P). > > So, here's the story. The cache hit rate is up around 97% now. The ruby > code is down to around 20-25ms to multiget the 20 rows. I did some > profiling, though, and realized that a lot of time was being spent in > thrift. Turns out, that's where pretty much all the time was going. > > I just ran the same test using java (scala) and the load is taking around > 2-4ms. > That's with the binary accelerated thrift for ruby? -Brandon
Re: Creating a Total Ordered Queue in Cassandra
You are correct, it is not a queue in the classic sense... I'm storing the entire "conversation" with a client in perpetuity, and then playing it back in the order received. Rabbitmq/activemq etc all have about the same throughput 3-6K persistent messages/sec, and are not good for storing the conversation forever... Also I can easily scale cassandra past that message rate and not have to worry about which message broker/cluster I'm connecting to/has the conversation/etc. On Thu, Apr 1, 2010 at 7:02 PM, Keith Thornhill wrote: > you mention never deleting from the queue, so what purpose is this > serving? (if you don't pop off the front, is it really a queue?) > > seems if guaranteed order of messages is required, there are many > other projects which are focused towards that problem (rabbitmq, > kestrel, activemq, etc) > > or am i misunderstanding your needs here? > > -keith > > On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis > wrote: > > I'm in the process of implementing a Totally Ordered Queue in Cassandra, > and > > wanted to bounce my ideas off the list and also see if there are any > other > > suggestions. > > > > I've come up with an external source of ID's that are always increasing > (but > > not monotonic), and I've also used external synchronization to ensure > only > > one writer to a given queue. And I handle de-duping in the app. > > > > > > My current solution is : (simplified) > > > > Use the "QueueId", to Key into a row of a CF. > > Then, every column in that CF corresponds to a new entry in the Queue, > with > > a custom Comparator to sort the columns by my external ID that is always > > increasing. > > > > Technically I never delete data from the Queue, and I just page through > it > > from a given ID using a SliceRange, etc. > > > > Obviously the problem being that the row needs to get compacted. so then > I > > started bucketizing with multiple rows for a given queue (for example one > > per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...) > > > > Does this seem reasonable? It's solvable, but is starting to seem > > complicated to implement... It would be very easy if I didn't have to > have > > multiple buckets.. > > > > > > > > My other thought is to store one entry per row, and perform > get_range_slices > > and specify a KeyRange, with the OrderPreservingPartitioner. > > But it isn't exactly clear to me what the Order of the keys are in this > > system, so I don't know how to construct my key and queries > appropriately... > > Is this Lexical String Order? Or? > > > > So for example.. Assuming my QueueId's are longs, and my ID's are also > > longs.. My key would be (in Java): > > > > long queueId; > > long msgId; > > > > key = "" + queueId + ":" + msgId; > > > > And if I wanted to do a query my key range might be from > > start = "" + queueId + ":0" > > end = "" + queueId + ":" + Long.MAX_VALUE; > > > > (Will I have to left pad the msgIds with 0's)? > > > > And is this going to be efficient if my msgId isn't monotonically > > increasing? > > > > Thanks, > > -JD > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Read Performance
Yes. J. Sent from my iPhone. On 2010-04-01, at 9:21 PM, Brandon Williams wrote: On Thu, Apr 1, 2010 at 9:37 PM, James Golick wrote: Well, folks, I'm feeling a little stupid right now (adding to the injury inflicted by one Mr. Stump :-P). So, here's the story. The cache hit rate is up around 97% now. The ruby code is down to around 20-25ms to multiget the 20 rows. I did some profiling, though, and realized that a lot of time was being spent in thrift. Turns out, that's where pretty much all the time was going. I just ran the same test using java (scala) and the load is taking around 2-4ms. That's with the binary accelerated thrift for ruby? -Brandon
Re: Creating a Total Ordered Queue in Cassandra
Since twitter is everyone's favorite analogy: It's like twitter, but faster and with bigger messages that I may need to go back and replay in order to mine for more details at a later date. Thus, I call it a queue, because the order of messages is important.. But not anything like a message broker/pub-sub/topic/ etc... -JD On Thu, Apr 1, 2010 at 9:43 PM, Jeremy Davis wrote: > > You are correct, it is not a queue in the classic sense... I'm storing the > entire "conversation" with a client in perpetuity, and then playing it back > in the order received. > > Rabbitmq/activemq etc all have about the same throughput 3-6K persistent > messages/sec, and are not good for storing the conversation forever... Also > I can easily scale cassandra past that message rate and not have to worry > about which message broker/cluster I'm connecting to/has the > conversation/etc. > > > > > On Thu, Apr 1, 2010 at 7:02 PM, Keith Thornhill wrote: > >> you mention never deleting from the queue, so what purpose is this >> serving? (if you don't pop off the front, is it really a queue?) >> >> seems if guaranteed order of messages is required, there are many >> other projects which are focused towards that problem (rabbitmq, >> kestrel, activemq, etc) >> >> or am i misunderstanding your needs here? >> >> -keith >> >> On Thu, Apr 1, 2010 at 6:32 PM, Jeremy Davis >> wrote: >> > I'm in the process of implementing a Totally Ordered Queue in Cassandra, >> and >> > wanted to bounce my ideas off the list and also see if there are any >> other >> > suggestions. >> > >> > I've come up with an external source of ID's that are always increasing >> (but >> > not monotonic), and I've also used external synchronization to ensure >> only >> > one writer to a given queue. And I handle de-duping in the app. >> > >> > >> > My current solution is : (simplified) >> > >> > Use the "QueueId", to Key into a row of a CF. >> > Then, every column in that CF corresponds to a new entry in the Queue, >> with >> > a custom Comparator to sort the columns by my external ID that is always >> > increasing. >> > >> > Technically I never delete data from the Queue, and I just page through >> it >> > from a given ID using a SliceRange, etc. >> > >> > Obviously the problem being that the row needs to get compacted. so then >> I >> > started bucketizing with multiple rows for a given queue (for example >> one >> > per day (again I'm simplifying))...(so the Key is now "QueueId+Day"...) >> > >> > Does this seem reasonable? It's solvable, but is starting to seem >> > complicated to implement... It would be very easy if I didn't have to >> have >> > multiple buckets.. >> > >> > >> > >> > My other thought is to store one entry per row, and perform >> get_range_slices >> > and specify a KeyRange, with the OrderPreservingPartitioner. >> > But it isn't exactly clear to me what the Order of the keys are in this >> > system, so I don't know how to construct my key and queries >> appropriately... >> > Is this Lexical String Order? Or? >> > >> > So for example.. Assuming my QueueId's are longs, and my ID's are also >> > longs.. My key would be (in Java): >> > >> > long queueId; >> > long msgId; >> > >> > key = "" + queueId + ":" + msgId; >> > >> > And if I wanted to do a query my key range might be from >> > start = "" + queueId + ":0" >> > end = "" + queueId + ":" + Long.MAX_VALUE; >> > >> > (Will I have to left pad the msgIds with 0's)? >> > >> > And is this going to be efficient if my msgId isn't monotonically >> > increasing? >> > >> > Thanks, >> > -JD >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >
best practice for migrating data
when adding/changing a column to a column family for existing data in cassandra, what's a good way to do it? thanks, -aj-- AJ Chen, PhD Chair, Semantic Web SIG, sdforum.org http://web2express.org twitter @web2express Palo Alto, CA, USA
Re: Creating a Total Ordered Queue in Cassandra
On Thu, Apr 1, 2010 at 9:43 PM, Jeremy Davis wrote: > > You are correct, it is not a queue in the classic sense... I'm storing the > entire "conversation" with a client in perpetuity, and then playing it back > in the order received. > > Rabbitmq/activemq etc all have about the same throughput 3-6K persistent > messages/sec, and are not good for storing the conversation forever... Also > I can easily scale cassandra past that message rate and not have to worry > about which message broker/cluster I'm connecting to/has the > conversation/etc. Also: I think RabbitMQ specifically does not have distributed message stores -- each message lives in just one queue node, meaning that when it is down (or gets wiped out), so are messages for that particular queue. Otherwise it seems like a really nice queuing system. The other potential concern is that all message metadata for it has to fit in central memory (message payload can be persisted I think) of the host that owns message. So while RabbitMQ and ActiveMQ are obviously better matches for queuing (with very powerful semantics, optional transactionality, etc. etc. etc.) Cassandra seems to have better distribution and fault-tolerance properties. This could be useful for some scenarios. In fact I wonder if "traditional" MQs could be considered quite a bit like RDBMSs regarding scalability, regarding distribution, horizontal scaling (or lack thereof) by adding nodes, and cost of ACID features (high expresive power vs simple scalability). I am actually also interested in similar aspects; using queue name and sequence identifier for implementing queue-like constructs, and was happy to see this question. But in my case, I would want to eventually also delete messages, so I would not have to rely as much on monotonically increasing ids aspect. This would allow many-senders-single-receiver use case, with little or no external synchronization. -+ Tatu +-