Do you have a good reference for maintenance scripts for Cassandra ring? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media
[image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Apr 3, 2012 at 4:37 AM, aaron morton <aa...@thelastpickle.com>wrote: > If you have a workload with overwrites you will end up with some data > needing compaction. Running a nightly manual compaction would remove this, > but it will also soak up some IO so it may not be the best solution. > > I do not know if Leveled compaction would result in a smaller disk load > for the same workload. > > I agree with other people, turn on compaction. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 3/04/2012, at 9:19 AM, Yiming Sun wrote: > > Yup Jeremiah, I learned a hard lesson on how cassandra behaves when it > runs out of disk space :-S. I didn't try the compression, but when it > ran out of disk space, or near running out, compaction would fail because > it needs space to create some tmp data files. > > I shall get a tatoo that says keep it around 50% -- this is valuable tip. > > -- Y. > > On Sun, Apr 1, 2012 at 11:25 PM, Jeremiah Jordan < > jeremiah.jor...@morningstar.com> wrote: > >> Is that 80% with compression? If not, the first thing to do is turn on >> compression. Cassandra doesn't behave well when it runs out of disk space. >> You really want to try and stay around 50%, 60-70% works, but only if it >> is spread across multiple column families, and even then you can run into >> issues when doing repairs. >> >> -Jeremiah >> >> >> >> On Apr 1, 2012, at 9:44 PM, Yiming Sun wrote: >> >> Thanks Aaron. Well I guess it is possible the data files from >> sueprcolumns could've been reduced in size after compaction. >> >> This bring yet another question. Say I am on a shoestring budget and >> can only put together a cluster with very limited storage space. The first >> iteration of pushing data into cassandra would drive the disk usage up into >> the 80% range. As time goes by, there will be updates to the data, and >> many columns will be overwritten. If I just push the updates in, the disks >> will run out of space on all of the cluster nodes. What would be the best >> way to handle such a situation if I cannot to buy larger disks? Do I need >> to delete the rows/columns that are going to be updated, do a compaction, >> and then insert the updates? Or is there a better way? Thanks >> >> -- Y. >> >> On Sat, Mar 31, 2012 at 3:28 AM, aaron morton <aa...@thelastpickle.com>wrote: >> >>> does cassandra 1.0 perform some default compression? >>> >>> No. >>> >>> The on disk size depends to some degree on the work load. >>> >>> If there are a lot of overwrites or deleted you may have rows/columns >>> that need to be compacted. You may have some big old SSTables that have not >>> been compacted for a while. >>> >>> There is some overhead involved in the super columns: the super col >>> name, length of the name and the number of columns. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 29/03/2012, at 9:47 AM, Yiming Sun wrote: >>> >>> Actually, after I read an article on cassandra 1.0 compression just now >>> ( >>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), >>> I am more puzzled. In our schema, we didn't specify any compression >>> options -- does cassandra 1.0 perform some default compression? or is the >>> data reduction purely because of the schema change? Thanks. >>> >>> -- Y. >>> >>> On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun <yiming....@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> We are trying to estimate the amount of storage we need for a >>>> production cassandra cluster. While I was doing the calculation, I noticed >>>> a very dramatic difference in terms of storage space used by cassandra data >>>> files. >>>> >>>> Our previous setup consists of a single-node cassandra 0.8.x with no >>>> replication, and the data is stored using supercolumns, and the data files >>>> total about 534GB on disk. >>>> >>>> A few weeks ago, I put together a cluster consisting of 3 nodes >>>> running cassandra 1.0 with replication factor of 2, and the data is >>>> flattened out and stored using regular columns. And the aggregated data >>>> file size is only 488GB (would be 244GB if no replication). >>>> >>>> This is a very dramatic reduction in terms of storage needs, and is >>>> certainly good news in terms of how much storage we need to provision. >>>> However, because of the dramatic reduction, I also would like to make sure >>>> it is absolutely correct before submitting it - and also get a sense of why >>>> there was such a difference. -- I know cassandra 1.0 does data compression, >>>> but does the schema change from supercolumn to regular column also help >>>> reduce storage usage? Thanks. >>>> >>>> -- Y. >>>> >>> >>> >>> >> >> > >
<<tokLogo.png>>