Storing files in blob into Cassandra

2011-06-22 Thread Damien Picard
Hi, I have to store some files (Images, documents, etc.) for my users in a webapp. I use Cassandra for all of my data and I would like to know if this is a good idea to store these files into blob on a Cassandra CF ? Is there some contraindications, or special things to know to achieve this ? Tha

Re: Storing files in blob into Cassandra

2011-06-22 Thread Sasha Dolgy
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Storing-photos-images-docs-etc-td6078278.html Of significance from that link (which was great until feeling lucky was removed...): Google of terms cassandra large files + feeling lucky http://www.google.com/search?q=cassandra+large+

Re: solandra or pig or....?

2011-06-22 Thread Sasha Dolgy
First, thanks everyone for the input. Appreciate it. The number crunching would already have been completed, and all statistics per game defined, and inserted into the appropriate CF/row/cols ... So, that being said, Solandra appears to be the right way to go ... except, this would require that

Re: Storing files in blob into Cassandra

2011-06-22 Thread Sylvain Lebresne
Let's be more precise in saying that this all depends on the expected size of the documents. If you know that the documents will be on the few hundreds kilobytes mark on average and no more than a few megabytes (say < 5MB, even though there is no magic number), then storing them as blob will work p

Re: Storing files in blob into Cassandra

2011-06-22 Thread Damien Picard
>store your images / documents / etc. somewhere and reference them >in Cassandra. That's the consensus that's been bandied about on this >list quite frequently Thank you for your answers. I think I have to detail my configuration. On every server of my cluster, I deploy : - a Cassandra node -

Strange Connection error of nodetool

2011-06-22 Thread 박상길
Hi. I'm running 5 cassandra nodes. Say, the addresses are 112.234.123.111 ~ 112.234.123.115; the real address is different. When I run nodetool, the one node of address 112.234.123.112 has failed to connect. Showing error message like this. iPark:~ hayarobi$ nodetool --host 112.234.123.112 ri

BytesType vs. UTF8Type

2011-06-22 Thread Jeesoo Shin
BytesType vs. UTF8Type. which is better in performance? I assume Bytes be faster in compare.. but how much faster is it? For large large large data set, will it have significant different? I love to use UTF8 and be able to read value from cli. :-) *IF* it doesn't degrade performance too much. ps

Re: Storing files in blob into Cassandra

2011-06-22 Thread aaron morton
> I think I have to detail my configuration. On every server of my cluster, I > deploy : > - a Cassandra node > - a Tomcat instance > - the webapp, deployed on Tomcat > - Apache httpd, in front of Tomcat with mod_jakarta You will have a bunch of services on the machine competing with each oth

Re: BytesType vs. UTF8Type

2011-06-22 Thread Sylvain Lebresne
On Wed, Jun 22, 2011 at 11:19 AM, Jeesoo Shin wrote: > BytesType vs. UTF8Type. which is better in performance? > I assume Bytes be faster in compare.. but how much faster is it? They don't differ at all as far as comparison is involved. They actually use the exact same function to do the compare.

Re: solandra or pig or....?

2011-06-22 Thread Jake Luciani
Well solandra is running Cassandra so you can use Cassandra as you do today, but index some of the data in solr. On Jun 22, 2011, at 3:41 AM, Sasha Dolgy wrote: > First, thanks everyone for the input. Appreciate it. The number > crunching would already have been completed, and all statistics

Re: solandra or pig or....?

2011-06-22 Thread Santiago Basulto
Wouldn't it be useful to store your data somewhere structured (Cassandra is obviously an option) and then use MapReduce to store statistics? 2011/6/22 Jake Luciani : > Well solandra is running Cassandra so you can use Cassandra as you do today, > but index some of the data in solr. > > On Jun 22

Re: Storing files in blob into Cassandra

2011-06-22 Thread Damien Picard
2011/6/22 aaron morton > I think I have to detail my configuration. On every server of my cluster, I > deploy : > - a Cassandra node > - a Tomcat instance > - the webapp, deployed on Tomcat > - Apache httpd, in front of Tomcat with mod_jakarta > > > You will have a bunch of services on the ma

Column value type

2011-06-22 Thread osishkin osishkin
Is there a limitation on the data type of a column value (not column name) in cassandra? I'm saving data using a pycassa client, for a UTF8 column family, and I get an error when I try saving integer data values. Only when convert the values to string can I save the data. Looking at the pycassa cod

AW: Column value type

2011-06-22 Thread Roland Gude
There is a comparator type (fort he name) and a validation type (for the value) If you have set the validation to be UTF8 you can only store data that is valid UTF8 there. The default validation is BytesType so it should accept everything unless otherwise specified. I cannot tell anything regard

Re: Secondary indexes performance

2011-06-22 Thread Wojciech Pietrzok
OK, got some results (below). 2 nodes, one on localhost, second on LAN, reading with ConsistencyLevel.ONE, buffer_size=512 rows (that's how many rows pycassa will get on one connection, than it will use last row_id as start row for next query) Queries types: 1) get_range - just added limit of 1024

Re: Storing files in blob into Cassandra

2011-06-22 Thread aaron morton
> If the Cassandra JVM is down, Tomcat and Httpd will continue to handle > requests. And Pelops will redirect these requests to another Cassandra node > on another server (maybe am I wrong with this assertion). I was thinking of the server been turned off / broken / rebooting / disconnected fro

OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
I woke up this morning to all 4 of 4 of my cassandra instances reporting they were down in my cluster. I quickly started them all, and everything seems fine. I'm doing a postmortem now, but it appears they all OOM'd at roughly the same time, which was not reported in any cassandra log, but I disc

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
We had a similar problem a last month and found that the OS eventually in the end killed the Cassandra process on each of our nodes ... I've upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but i do see consumption levels rising consistently from one day to the next on each node .

Re: Storing files in blob into Cassandra

2011-06-22 Thread Damien Picard
In this case, the load balancer has to detect (or is configured) that the server is down and does not route request to this one anymore. 2011/6/22 aaron morton > If the Cassandra JVM is down, Tomcat and Httpd will continue to handle > requests. And Pelops will redirect these requests to another

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
Well, I managed to run 50 days before an OOM, so any changes I make will take a while to test ;-) I've seen the GCInspector log lines appear periodically in my logs, but I didn't see a correlation with the crash. I'll read the instructions on how to properly do a rolling upgrade today, practice o

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
Yes ... this is because it was the OS that killed the process, and wasn't related to Cassandra "crashing". Reviewing our monitoring, we saw that memory utilization was pegged at 100% for days and days before it was finally killed because 'apt' was fighting for resource. At least, that's as far as

Re: Cassandra Clients for Java

2011-06-22 Thread Daniel Colchete
Thank you Vivek. I'll start playing with the clients today. Thank you very much! Best, Daniel On Tue, Jun 21, 2011 at 9:33 AM, Vivek Mishra wrote: > Hi Daniel, > > Just saw your email regarding kundera download. > > > > Kundera snapshot jar is available at: > > > > > http://kundera.googlecode.c

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
I was wondering/I figured that /var/log/kern indicated the OS was killing java (versus an internal OOM). The nodetool repair is interesting. My application never deletes, so I didn't bother running it. But, if that helps prevent OOMs as well, I'll add it to the crontab (plan A is still upgr

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Jake Luciani
Are you running with the default heap settings? what else is running on the boxes? On Wed, Jun 22, 2011 at 9:06 AM, William Oberman wrote: > I was wondering/I figured that /var/log/kern indicated the OS was killing > java (versus an internal OOM). > > The nodetool repair is interesting. My app

Re: insufficient space to compact even the two smallest files, aborting

2011-06-22 Thread Héctor Izquierdo Seliva
Hi All. I set the compaction threshold at minimum 2, maximum 2 and try to run compact, but it's not doing anything. There are over 69 sstables now, read performance is horrible, and it's taking an insane amount of space. Maybe I don't quite get how the new per bucket stuff works, but I think this i

Re: Storing Accounting Data

2011-06-22 Thread Edward Capriolo
On Tue, Jun 21, 2011 at 10:58 PM, AJ wrote: > ** > On 6/21/2011 3:36 PM, Stephen Connolly wrote: > > writes are not atomic. > > the first side can succeed at quorum, and the second side can fail > completely... you'll know it failed, but now what... you retry, still > failed... erh I'll store it

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread William Oberman
The CLI is posted, I assume that's the defaults (I didn't touch anything). The machines basically just run cassandra (and standard Centos5 background stuff). will On Wed, Jun 22, 2011 at 9:49 AM, Jake Luciani wrote: > Are you running with the default heap settings? what else is running on the >

rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
I'm running 0.7.4 from rpm (riptano). If I do a yum upgrade, it's trying to do 0.7.6. To get 0.8.x I have to do "install apache-cassandra08". But that is going to install two copies. Is there a semi-official way of properly upgrading to 0.8 via rpm? -- Will Oberman Civic Science, Inc. 3030 Pe

No Transactions: An Example

2011-06-22 Thread Trevor Smith
Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transa

Re: Storing Accounting Data

2011-06-22 Thread Oleg Anastastasyev
> > Is C* suitable for storing customer account (financial) data, as well as > billing, payroll, etc? This is a new company so migration is not an > issue... starting from scratch. If you need only store them - then yes, but if you require transactions spanning multiple rows or column families

Re: No Transactions: An Example

2011-06-22 Thread Sasha Dolgy
I'd implement the concept of a bank account using counters in a counter column family. one row per account ... each column for transaction data and one column for the actual balance. just so long as you use whole numbers ... no one needs pennies anymore. -sd On Wed, Jun 22, 2011 at 4:18 PM, Trevo

Re: Storing Accounting Data

2011-06-22 Thread Sasha Dolgy
but you can store the -details- of a transaction as json data and do some sanity checks to validate that the data you currently have stored aligns with the recorded transactions. maybe a batch job run every 24 hours ... On Wed, Jun 22, 2011 at 4:19 PM, Oleg Anastastasyev wrote: >> >> Is C* suita

Re: Storing Accounting Data

2011-06-22 Thread Edward Capriolo
On Wed, Jun 22, 2011 at 10:19 AM, Oleg Anastastasyev wrote: > > > > Is C* suitable for storing customer account (financial) data, as well as > > billing, payroll, etc? This is a new company so migration is not an > > issue... starting from scratch. > > If you need only store them - then yes, but

Re: No Transactions: An Example

2011-06-22 Thread Trevor Smith
Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor On Wed, Jun 22, 2011 at 10:23 AM, Sasha Dolgy wrote: > I'd implement the concept of a bank account using counters in a > counter column family. one

Hector Object Mapper

2011-06-22 Thread Daniel Colchete
Good day everyone, a few days ago I was suggested to use the Hector Object Mapper here but it seems that the code wasn't upgraded to support Cassandra 0.8 yet. There is a reference for it here: https://github.com/rantav/hector/wiki/Versioning. The project's history

Re: No Transactions: An Example

2011-06-22 Thread Sasha Dolgy
I would still maintain a record of the transaction ... so that I can do analysis post to determine if/when problems occurred ... On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith wrote: > Sasha, > How would you deal with a transfer between accounts in which only one half > of the operation was succes

Re: No Transactions: An Example

2011-06-22 Thread Trevor Smith
Right -- that's the part that I am more interested in fleshing out in this post. Must one have background jobs checking the integrity of all transactions at some time interval? This gets hairy pretty quick with bank transactions (one unrolled transaction could cause many others to become unrolled.

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
I just did a remove then install, and it seems to work. For those of you out there with JMX issues, the default port moved from 8080 to 7199 (which includes the internal default to nodetool). I was confused why nodetool ring would fail on some boxes and not others. I had to add -p depending on t

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
I have a question about auto_bootstrap. When I originally brought up the cluser, I did: -seed with auto_boot = false -1,2,3 with auto_boot = true Now that I'm doing a rolling upgrade, do I set them all to auto_boot = true? Or does the seed stay false? Or should I mark them all false? I have ma

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread Jonathan Ellis
Doesn't matter. auto_bootstrap only applies to first start ever. On Wed, Jun 22, 2011 at 10:48 AM, William Oberman wrote: > I have a question about auto_bootstrap.  When I originally brought up the > cluser, I did: > -seed with auto_boot = false > -1,2,3 with auto_boot = true > > Now that I'm do

simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
The way compaction works, "x" same-sized files are merged into a new SSTable. This repeats itself and the SSTable get bigger and bigger. So what is the upper limit?? If you are not deleting stuff fast enough, wouldn't the SSTable sizes grow indefinitely? I ask because we have some rather

Re: Strange Connection error of nodetool

2011-06-22 Thread Nick Bailey
This is almost certainly caused by the weird connection process JMX uses. JMX actually uses a 2 connection process, the second connection is determined by the 'JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="' setting in your cassandra-env.sh configuration file. By default that setting is commente

Re: simple question about merged SSTable sizes

2011-06-22 Thread Eric tamme
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby wrote: > > The way compaction works,  "x" same-sized files are merged into a new > SSTable.  This repeats itself and the SSTable get bigger and bigger. > > So what is the upper limit??     If you are not deleting stuff fast enough, > wouldn't the

Re: simple question about merged SSTable sizes

2011-06-22 Thread Edward Capriolo
Yes, if you are not deleting fast enough they will grow. This is not specifically a cassandra problem /var/log/messages has the same issue. There is a JIRA ticket about having a maximum size for SSTables, so they always stay manageable You fall into a small trap when you force major compaction in

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
Thanks for the explanation. I'm still a bit "skeptical". So if you really needed to control the maximum size of compacted SSTables, you need to delete data at such a rate that the new files created by compaction are less than or equal to the sum of the segments being merged. Is anyone else

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
So the take-away is try to avoid major compactions at all costs! Thanks Ed and Eric. On Jun 22, 2011, at 7:00 PM, Edward Capriolo wrote: > Yes, if you are not deleting fast enough they will grow. This is not > specifically a cassandra problem /var/log/messages has the same issue. > > There

Re: port 8080

2011-06-22 Thread Nate McCall
There are a couple of pre-canned scripts for local mult-node clusters, particularly: https://github.com/pcmanus/ccm On Tue, Jun 21, 2011 at 6:35 AM, Sasha Dolgy wrote: > Personally speaking, I do not run JMX on 8080, and never have.  The > tools, like cassandra-cli and nodetool expect it to be on

Re: No Transactions: An Example

2011-06-22 Thread Dominic Williams
Hi Trevor, I hope to post on my practical experiences in this area soon - we rely heavily on complex serialized operations in FightMyMonster.com. Probably the most simple serialized operation we do is updating nugget balances when, for example, there has been a trade of monsters. Currently we use

Re: simple question about merged SSTable sizes

2011-06-22 Thread Ryan King
On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby wrote: > Thanks for the explanation.  I'm still a bit "skeptical". > > So if you really needed to control the maximum size of compacted SSTables,   > you need to delete data at such a rate that the new files created by > compaction are less than or

Re: simple question about merged SSTable sizes

2011-06-22 Thread Eric tamme
>> Second, compacting such large files is an IO killer.    What can be tuned >> other than compaction_threshold to help optimize this and prevent the files >> from getting too big? >> >> Thanks! > > Just a personal implementation note - I make heavy use of column TTL, so I have very specifically t

Re: Storing files in blob into Cassandra

2011-06-22 Thread mcasandra
Speaking purely from my personal experience, I haven't found cassandra optimal for storing big fat rows. Even if it is only 100s of KB I didn't find cassandra suitable for it. In my case I am looking at 400 writes + 400 reads per sec and grow 20%-30% every ear with file sizes from 70k-300k. What I

Re: simple question about merged SSTable sizes

2011-06-22 Thread Edward Capriolo
I would not say avoid major compactions at all cost. In the old days < 0.6.5 IIRC the only way to clear tombstones was a major compaction. The nice thing about major compaction is if you have a situation with 4 SSTables at 2GB each (that is total 8GB). Under normal write conditions it could be mor

Re: Hector Object Mapper

2011-06-22 Thread Nate McCall
The current release of Hector Object Mapper works fine with the most recent Hector and Apache Cassandra 0.8.0 releases: http://repo2.maven.org/maven2/me/prettyprint/hector-object-mapper/1.1-01/ On Wed, Jun 22, 2011 at 10:10 AM, Daniel Colchete wrote: > Good day everyone, > > a few days ago I was

Re: Hector Object Mapper

2011-06-22 Thread Daniel Colchete
Great! Thanks! Best, Daniel On Wed, Jun 22, 2011 at 2:25 PM, Nate McCall wrote: > The current release of Hector Object Mapper works fine with the most > recent Hector and Apache Cassandra 0.8.0 releases: > http://repo2.maven.org/maven2/me/prettyprint/hector-object-mapper/1.1-01/ > > On Wed, Jun

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread William Oberman
Thanks Jonathan. I'm sure it's been true for everyone else as well, but the rolling upgrade seems to have worked like a charm for me (other than the JMX port # changing initial confusion). One minor thing that probably particular to my case: when I removed the old package, it unlinked my symlink

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
Thanks Ryan. Done that : ) 1 TB is the striped size.We might look into bigger disks for our blades. On Jun 22, 2011, at 7:09 PM, Ryan King wrote: > On Wed, Jun 22, 2011 at 10:00 AM, Jonathan Colby > wrote: >> Thanks for the explanation. I'm still a bit "skeptical". >> >> So if you rea

unsubscribe

2011-06-22 Thread Carey Hollenbeck
unsubscribe From: William Oberman [mailto:ober...@civicscience.com] Sent: Wednesday, June 22, 2011 1:46 PM To: user@cassandra.apache.org Subject: Re: rpm from 0.7.x -> 0.8? Thanks Jonathan. I'm sure it's been true for everyone else as well, but the rolling upgrade seems to have worked lik

Re: simple question about merged SSTable sizes

2011-06-22 Thread Jonathan Colby
Awesome tip on TTL. We can really use this as a catch-all to make sure all columns are purged based on time. Fits our use-case good. I forgot this feature existed. On Jun 22, 2011, at 7:11 PM, Eric tamme wrote: >>> Second, compacting such large files is an IO killer.What can be tuned >>

bulk load

2011-06-22 Thread Stephen Pope
According to the README.txt in examples/bmt BinaryMemtable is being deprecated. What's the recommended way to do bulk loading? Cheers, Steve

RE: bulk load

2011-06-22 Thread Stephen Pope
Awesome, thanks! -Original Message- From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] Sent: Wednesday, June 22, 2011 3:08 PM To: user@cassandra.apache.org Subject: Re: bulk load This ticket's outcome replaces what BMT was supposed to do: https://issues.apache.org/jira/browse/CASSAN

Re: bulk load

2011-06-22 Thread Jeremy Hanna
This ticket's outcome replaces what BMT was supposed to do: https://issues.apache.org/jira/browse/CASSANDRA-1278 0.8.1 is being voted on now and will hopefully be out in the next day or two. You can try it out with the 0.8-branch if you want - looking near the bottom of the comments on the ticke

Munin plugins stupid question

2011-06-22 Thread Janne Jalkanen
Heya! I know I should probably be able to figure this out on my own, but... The Cassandra Munin plugins (all of them) define in their storageproxy_latency.conf the following (this is from a 0.6 config): read_latency.jmxObjectName org.apache.cassandra.db:type=StorageProxy read_latency.jmxAttribu

99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
I'm planning on using Cassandra as a product's core data store, and it is imperative that it never goes down or loses data, even in the event of a data center failure. This uptime requirement ("five nines": 99.999% uptime) w/ WAN capabilities is largely what led me to choose Cassandra over other N

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Ryan King
On Wed, Jun 22, 2011 at 2:24 PM, Les Hazlewood wrote: > I'm planning on using Cassandra as a product's core data store, and it is > imperative that it never goes down or loses data, even in the event of a > data center failure.  This uptime requirement ("five nines": 99.999% uptime) > w/ WAN capab

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
Just to be clear: I understand that resources like [1] and [2] exist, and I've read them. I'm just wondering if there are any 'gotchas' that might be missing from that documentation that should be considered and if there are any recommendations in addition to these documents. Thanks, Les [1] h

Re: Atomicity Strategies

2011-06-22 Thread AJ
On 4/9/2011 7:52 PM, aaron morton wrote: My understanding of what they did with locking (based on the examples) was to achieve a level of transaction isolation http://en.wikipedia.org/wiki/Isolation_(database_systems) I think the

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
I understand that every environment is different and it always 'depends' :) But recommending settings and techniques based on an existing real production environment (like the user's suggestion to run nodetool repair as a regular cron job) is always a better starting point for a new Cassandra eval

Re: BloomFilterFalsePositives equals 1.0

2011-06-22 Thread Chris Burroughs
To be precise, you made n requests for non-existent keys, got n negative responses, and BloomFilterFalsePositives also went up by n? On 06/21/2011 11:06 PM, Preston Chang wrote: > Hi,all: > I have a problem with bloom filter. When made a test which tried to get > some nonexistent keys, it see

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Chris Burroughs
On 06/22/2011 08:53 AM, Sasha Dolgy wrote: > Yes ... this is because it was the OS that killed the process, and > wasn't related to Cassandra "crashing". Reviewing our monitoring, we > saw that memory utilization was pegged at 100% for days and days > before it was finally killed because 'apt' was

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Sasha Dolgy
Implement monitoring and be proactive...that will stop you waking up to a big surprise. i'm sure there were symltoms leading up to all 4 nodes going down. willing to wager that each node went down at different times and not all went down at once... On Jun 22, 2011 11:50 PM, "Les Hazlewood" wrote

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Will Oberman
Sadly, they all went down within minutes of each other. Sent from my iPhone On Jun 22, 2011, at 6:16 PM, Sasha Dolgy wrote: Implement monitoring and be proactive...that will stop you waking up to a big surprise. i'm sure there were symltoms leading up to all 4 nodes going down. willing t

Re: No Transactions: An Example

2011-06-22 Thread AJ
I think Sasha's idea is worth studying more. Here is a supporting read referenced in the O'Reilly Cassandra book that talks about alternatives to 2-phase commit and synchronous transactions: http://www.eaipatterns.com/ramblings/18_starbucks.html If it can be done without locks and the busines

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
http://www.twitpic.com/5fdabn http://www.twitpic.com/5fdbdg i do love a good graph. two of the weekly memory utilization graphs for 2 of the 4 servers from this ring... week 21 was a nice week ... the week before 0.8.0 went out proper. since then, bumped up to 0.8 and have seen a steady increase

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Chris Burroughs
On 06/22/2011 05:33 PM, Les Hazlewood wrote: > Just to be clear: > > I understand that resources like [1] and [2] exist, and I've read them. I'm > just wondering if there are any 'gotchas' that might be missing from that > documentation that should be considered and if there are any recommendatio

Re: Secondary indexes performance

2011-06-22 Thread aaron morton
> it will probably be better to denormalize and store > some precomputed data Yes, if you know there are queries you need to serve it is better to support those directly in the data model. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Chris Burroughs
Do all of the reductions in Used on that graph correspond to node restarts? My Zabbix for reference: http://img194.imageshack.us/img194/383/2weekmem.png On 06/22/2011 06:35 PM, Sasha Dolgy wrote: > http://www.twitpic.com/5fdabn > http://www.twitpic.com/5fdbdg > > i do love a good graph. two of

Re: OOM (or, what settings to use on AWS large?)

2011-06-22 Thread Sasha Dolgy
yes. each one corresponds with taking a node down for various reasons. i think more people should show their graphs. it's great. hoping Oberman has some.so we can see what his look like ,, On Thu, Jun 23, 2011 at 12:40 AM, Chris Burroughs wrote: > Do all of the reductions in Used on that g

Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-22 Thread Thoku Hansen
I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore strategy. Background: I have a cluster running in EC2. Its nodes are configured like so: * Instance type: m1.xlarge * Cassandra commit log writ

Re: insufficient space to compact even the two smallest files, aborting

2011-06-22 Thread aaron morton
Setting them to 2 and 2 means compaction can only ever compact 2 files at time, so it will be worse off. Lets the try following: - restore the compactions settings to the default 4 and 32 - run `ls -lah` in the data dir and grab the output - run `nodetool flush` this will trigger minor compactio

NTS Replication Strategy - only replicating to a subset of data centers

2011-06-22 Thread AJ
I'm just double-checking, but when using NTS, is it required to specify ALL the data centers in the strategy_options attribute? IOW, I do NOT want replication to ALL data centers; only a two of the three. So, if my property file snitch describes all of the existing data centers and nodes as:

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Edward Capriolo
Committing to that many 9s is going to be impossible since as far as I know no internet service provier will sla you more the 2 9s . You can not have more uptime then your isp. On Wednesday, June 22, 2011, Chris Burroughs wrote: > On 06/22/2011 05:33 PM, Les Hazlewood wrote: >> Just to be clear:

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Peter Lin
you have to use multiple data centers to really deliver 4 or 5 9's of service On Wed, Jun 22, 2011 at 7:09 PM, Edward Capriolo wrote: > Committing to that many 9s is going to be impossible since as far as I > know no internet service provier will sla you more the 2 9s . You can > not have more u

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
> > > > > [1] http://www.datastax.com/docs/0.8/operations/index > > [2] http://wiki.apache.org/cassandra/Operations > > > > Well if they new some secret gotcha the dutiful cassandra operators of > the world would update the wiki. > As I am new to the Cassandra community, I don't know how 'dutifull

Re: Strange Connection error of nodetool

2011-06-22 Thread aaron morton
Check the list here http://wiki.apache.org/cassandra/JmxGotchas I *think* the jmx server tells the client to connect back on another host/port. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jun 2011, at 21:02, 박상

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
On Wed, Jun 22, 2011 at 4:11 PM, Peter Lin wrote: > you have to use multiple data centers to really deliver 4 or 5 9's of > service > We do, hence my question, as well as my choice of Cassandra :) Best, Les

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread mcasandra
In my opinion 5 9s don't matter. It's the number of impacted customers. You might be down during peak for 5 mts causing 1000s of customer turn aways while you might be down during night causing only few customer turn aways. There is no magic bullet. It's all about learning and improving. You will

Re: unsubscribe

2011-06-22 Thread aaron morton
http://wiki.apache.org/cassandra/FAQ#unsubscribe - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23 Jun 2011, at 06:02, Carey Hollenbeck wrote: > unsubscribe > > From: William Oberman [mailto:ober...@civicscience.com] > Sent: Wednesda

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Peter Lin
so having multiple data centers is step 1 of 4/5 9's. I've worked on some services that had 3-4 9's SLA. Getting there is really tough as others have stated. you have to auditing built into your service, capacity metrics, capacity planning, some kind of real-time monitoring, staff to respond to al

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
Forget the 5 9's - I apologize for even writing that. It was my shorthand way of saying 'this can never go down'. I'm not asking for philosophical advice - I've been doing large scale enterprise deployments for over 10 years. I 'get' the 'it depends' and 'do your homework' philosophy. All I'm a

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread mcasandra
Start with reading comments on cassandra.yaml and http://wiki.apache.org/cassandra/Operations http://wiki.apache.org/cassandra/Operations As far as I know there is no comprehensive list for performance tuning. More specifically common setting applicable to everyone. For most part issues revolve

Re: Atomicity Strategies

2011-06-22 Thread aaron morton
Atomic on a single machine yes. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23 Jun 2011, at 09:42, AJ wrote: > On 4/9/2011 7:52 PM, aaron morton wrote: >> My understanding of what they did with locking (based on the examples) was >>

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
I have architected, built and been responsible for systems that support 4-5 9s for years. This discussion is not about how to do that generally. It was intended to be about concrete techniques that have been found valuable when deploying Cassandra in HA environments beyond what is documented in [

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
Yep, that was [2] on my existing list. Thanks very much for actually addressing my question - it is greatly appreciated! If anyone else has examples they'd like to share (like their own cron techniques, or JVM settings and why, etc), I'd love to hear them! Best regards, Les On Wed, Jun 22, 201

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-22 Thread aaron morton
> 1. Is it feasible to run directly against a Cassandra data directory restored > from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS > snapshot). I dont have experience with the EBS snapshot, but I've never been a fan of OS level snapshots that are not coordinated with

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread mcasandra
Les Hazlewood wrote: > > I have architected, built and been responsible for systems that support > 4-5 > 9s for years. > So have most of us. But probably by now it should be clear that no technology can provide concrete recommendations. They can only provide what might be helpful which varies

Re: rpm from 0.7.x -> 0.8?

2011-06-22 Thread Nick Bailey
That looks like a packaging bug. The package manually creates the commitlog/data/saved_caches directories under the /var/lib/cassandra/ directory. It really doesn't need to though, since as long as it sets the permissions correctly on /var/lib/cassandra/ those directories will get created automatic

Re: Atomicity Strategies

2011-06-22 Thread AJ
Thanks Aaron! On 6/22/2011 5:25 PM, aaron morton wrote: Atomic on a single machine yes. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23 Jun 2011, at 09:42, AJ wrote: On 4/9/2011 7:52 PM, aaron morton wrote: My understanding of wha

Is LOCAL_QUORUM as strong as QUORUM?

2011-06-22 Thread AJ
Quorum read/writes guarantees consistency. But, when a keyspace spans multiple data centers, does local quorum read/writes also guarantee consistency? I'm thinking maybe not if two data centers get partitioned. Thanks!

Re: 99.999% uptime - Operations Best Practices?

2011-06-22 Thread Les Hazlewood
On Wed, Jun 22, 2011 at 4:35 PM, mcasandra wrote: > might be helpful which varies from env to env. That's why I suggest look at > the comments in cassandra.yaml and see which are applicable in your > scenario. I learn something new everytime I read it. > Yep, and this was awesome - thanks very m

Re: Is LOCAL_QUORUM as strong as QUORUM?

2011-06-22 Thread mcasandra
LOCAL_QUORUM gurantees consistency in the local data center only. Other replica nodes in the same DC and other DC not part of the QUORUM will be eventually consistent. If you want to ensure consistency accross DCs you can use EACH_QUORUM but keep in mind the latency involved assuming DCs are not lo

  1   2   >