Re: hazelcast
Hi, I'm using it as a complement of cassandra, to avoid "duplicate" searches and duplicate content in a given moment in time. It works really nice by now, no critical issues, at least the functionallity I'm using from it. -- //GK german.kond...@gmail.com // sites http://twitter.com/germanklf http://ar.linkedin.com/in/germankondolf On Fri, Dec 10, 2010 at 2:50 PM, B. Todd Burruss wrote: > http://www.hazelcast.com/product.jsp > > has anyone tested hazelcast as a distributed locking mechanism for java > clients? seems very attractive on the surface. >
Re: hazelcast
I don't know much about Zookeeper, but as far as I read, it is out of JVM process. Hazelcast is just a framework and you can programmatically start and shutdown the cluster, it's just an xml to configure it. Hazelcast also provides good caching features to integrate with Hibernate, distributed executors, clusterized queues, distributed events, and so on. I don't know if that is supported by Zookeeper, I think not, because is not the main goal of it. On Fri, Dec 10, 2010 at 4:49 PM, B. Todd Burruss wrote: > thx for the feedback. regarding locking, has anyone done a comparison to > zookeeper? does zookeeper provide functionality over hazelcast? > > On 12/10/2010 11:08 AM, Norman Maurer wrote: >> >> Hi there, >> >> I'm not using it atm but plan to in my next project. It really looks nice >> :) >> >> Bye, >> Norman >> >> 2010/12/10 Germán Kondolf: >>> >>> Hi, I'm using it as a complement of cassandra, to avoid "duplicate" >>> searches and duplicate content in a given moment in time. >>> It works really nice by now, no critical issues, at least the >>> functionallity I'm using from it. >>> >>> -- >>> //GK >>> german.kond...@gmail.com >>> // sites >>> http://twitter.com/germanklf >>> http://ar.linkedin.com/in/germankondolf >>> >>> On Fri, Dec 10, 2010 at 2:50 PM, B. Todd Burruss >>> wrote: >>>> >>>> http://www.hazelcast.com/product.jsp >>>> >>>> has anyone tested hazelcast as a distributed locking mechanism for java >>>> clients? seems very attractive on the surface. >>>> > -- //GK german.kond...@gmail.com // sites http://twitter.com/germanklf http://www.facebook.com/germanklf http://ar.linkedin.com/in/germankondolf
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
Be careful with the unlimited value on ulimit, you could end up with a unresponsive server... I mean, you could not even connect via ssh if you don't have enough handles. On Thu, Dec 16, 2010 at 9:59 AM, Amin Sakka, Novapost < amin.sa...@novapost.fr> wrote: > > I increased the amount of the allowed file descriptors to "unlimted". > Now, I get exactly the same exception after 3.50 rows : > > *CustomTThreadPoolServer.java (line 104) Transport error occurred during > acceptance of message.* > *org.apache.thrift.transport.TTransportException: > java.net.SocketException: Too many open files* > * > * > What worries me is this / by zero exception when I try to restart cassandra > ! At least, I want to backup the 3.50 rows to continue then my > insertion, is there a way to do this? > > * > Exception encountered during startup. > java.lang.ArithmeticException: / by zero > at > org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) > > * > > > Thanks. > * > * > > > > > > 2010/12/15 Jake Luciani > > >> http://www.riptano.com/docs/0.6/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files >> >> >> >> On Wed, Dec 15, 2010 at 11:13 AM, Amin Sakka, Novapost < >> amin.sa...@novapost.fr> wrote: >> >>> *Hello,* >>> *I'm using cassandra 0.7.0 rc1, a single node configuration, replication >>> factor 1, random partitioner, 2 GO heap size.* >>> *I ran my hector client to insert 5.000.000 rows but after a couple of >>> hours, the following Exception occurs : * >>> >>> >>> WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line >>> 104) Transport error occurred during acceptance of message. >>> org.apache.thrift.transport.TTransportException: >>> java.net.SocketException: Too many open files >>> at >>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) >>> at >>> org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) >>> at >>> org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) >>> at >>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >>> at >>> org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) >>> at >>> org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) >>> at >>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) >>> at >>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) >>> Caused by: java.net.SocketException: Too many open files >>> at java.net.PlainSocketImpl.socketAccept(Native Method) >>> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) >>> at java.net.ServerSocket.implAccept(ServerSocket.java:453) >>> at java.net.ServerSocket.accept(ServerSocket.java:421) >>> at >>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) >>> >>> >>> *When I try to restart Cassandra, I have the following exception :* >>> >>> >>> ERROR 16:42:26,573 Exception encountered during startup. >>> java.lang.ArithmeticException: / by zero >>> at >>> org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) >>> at >>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:225) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:246) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) >>> at >>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) >>> at org.apache.cassandra.db.Table.initCf(Table.java:341) >>> at org.apache.cassandra.db.Table.(Table.java:283) >>> at org.apache.cassandra.db.Table.open(Table.java:114) >>> at >>> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) >>> at >>> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) >>> at >>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) >>> at >>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) >>> >>> >>> I am looking for advice on how to debug this. >>> >>> Thanks, >>> >>> -- >>> >>> Amin >>> >>> >>> >>> >>> >> > > > -- > > Amin SAKKA > Research and Development Engineer > 32 rue de Paradis, 75010 Paris > *Tel:* +33 (0)6 34 14 19 25 > *Mail:* amin.sa...@novapost.fr > *Web:* www.novapost.fr / www.novapost-rh.fr > > > > > -- //GK german.kond...@gmail.com // sites http://twitter.com/germanklf http://www.facebook.com/germanklf http://ar.linkedin.com/in/germank
Re: Too many open files Exception + java.lang.ArithmeticException: / by zero
Indeed Hector has a connection pool behind it, I think it uses 50 connectios per node. But also uses a node to discover the others, I assume that, as I saw connections from my app to nodes that I didn't configure in Hector. So, you may check the fds in OS level to see if there is a bottleneck there. On Thu, Dec 16, 2010 at 2:39 PM, Amin Sakka, Novapost wrote: > > I'm using a unique client instance (using Hector) and a unique connection to > cassandra. > For each insertion I'm using a new mutator and then I release it. > I have 473 sstable "Data.db", the average size of each is 30Mo. > > > > 2010/12/16 Ryan King >> >> Are you creating a new connection for each row you insert (and if so >> are you closing it)? >> >> -ryan >> >> On Wed, Dec 15, 2010 at 8:13 AM, Amin Sakka, Novapost >> wrote: >> > Hello, >> > I'm using cassandra 0.7.0 rc1, a single node configuration, replication >> > factor 1, random partitioner, 2 GO heap size. >> > I ran my hector client to insert 5.000.000 rows but after a couple of >> > hours, >> > the following Exception occurs : >> > >> > WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line >> > 104) >> > Transport error occurred during acceptance of message. >> > org.apache.thrift.transport.TTransportException: >> > java.net.SocketException: >> > Too many open files >> > at >> > >> > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124) >> > at >> > >> > org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67) >> > at >> > >> > org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38) >> > at >> > >> > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31) >> > at >> > >> > org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98) >> > at >> > >> > org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120) >> > at >> > >> > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229) >> > at >> > >> > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) >> > Caused by: java.net.SocketException: Too many open files >> > at java.net.PlainSocketImpl.socketAccept(Native Method) >> > at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) >> > at java.net.ServerSocket.implAccept(ServerSocket.java:453) >> > at java.net.ServerSocket.accept(ServerSocket.java:421) >> > at >> > >> > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119) >> > >> > When I try to restart Cassandra, I have the following exception : >> > >> > ERROR 16:42:26,573 Exception encountered during startup. >> > java.lang.ArithmeticException: / by zero >> > at >> > >> > org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) >> > at >> > >> > org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) >> > at >> > >> > org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) >> > at >> > >> > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:225) >> > at >> > >> > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) >> > at >> > >> > org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306) >> > at >> > >> > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:246) >> > at >> > >> > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449) >> > at >> > >> > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437) >> > at org.apache.cassandra.db.Table.initCf(Table.java:341) >> > at org.apache.cassandra.db.Table.(Table.java:283) >> > at org.apache.cassandra.db.Table.open(Table.java:114) >> > at >> > >> > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) >> > at >> > >> > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) >> > at >> > >> > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) >> > at >> > >> > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) >> > >> > I am looking for advice on how to debug this. >> > >> > Thanks, >> > -- >> > >> > Amin >> > >> > >> > >> > >> > > > > > -- > Amin > > > > -- //GK german.kond...@gmail.com // sites http://twitter.com/germanklf http://www.facebook.com/germanklf http://ar.linkedin.com/in/germankondolf
Re: WELCOME to user@cassandra.apache.org
Hmm... what about just paying for it? It cost less than $20 on Amazon for the Kindle version... (http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412). // Germán Kondolf http://twitter.com/germanklf http://code.google.com/p/seide/ // @iPad On 30/12/2010, at 01:26, asil klin wrote: > Can anyone pass me a pdf copy of Cassandra The Definitive Guide ? > > > Thanks. > Asil
Re: Tombstone lifespan after multiple deletions
Maybe it could be taken into account when the compaction is executed, if I only have a consecutive list of uninterrupted tombstones it could only care about the first. It sounds like the-way-it-should-be, maybe as a part of the "row-reduce" process. Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne wrote: > On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn wrote: >> Thanks, Aaron, but I'm not 100% clear. >> >> My situation is this: My use case spins off rows (not columns) that I no >> longer need and want to delete. It is possible that these rows were never >> created in the first place, or were already deleted. This is a very large >> cleanup task that normally deletes a lot of rows, and the last thing that I >> want to do is create tombstones for rows that didn't exist in the first >> place, or lengthen the life on disk of tombstones of rows that are already >> deleted. >> >> So the question is: before I delete, do I have to retrieve the row to see if >> it exists in the first place? > > Yes, in your situation you do. > >> >> >> >> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton >> wrote: >>> >>> AFAIK that's not necessary, there is no need to worry about previous >>> deletes. You can delete stuff that does not even exist, neither batch_mutate >>> or remove are going to throw an error. >>> All the columns that were (roughly speaking) present at your first >>> deletion will be available for GC at the end of the first tombstones life. >>> Same for the second. >>> Say you were to write a col between the two deletes with the same name as >>> one present at the start. The first version of the col is avail for GC after >>> tombstone 1, and the second after tombstone 2. >>> Hope that helps >>> Aaron >>> On 18/01/2011, at 9:37 PM, David Boxenhorn wrote: >>> >>> Thanks. In other words, before I delete something, I should check to see >>> whether it exists as a live row in the first place. >>> >>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King wrote: On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn wrote: > If I delete a row, and later on delete it again, before GCGraceSeconds > has > elapsed, does the tombstone live longer? Each delete is a new tombstone, which should answer your question. -ryan > In other words, if I have the following scenario: > > GCGraceSeconds = 10 days > On day 1 I delete a row > On day 5 I delete the row again > > Will the tombstone be removed on day 10 or day 15? > >>> >> >> >
Re: Tombstone lifespan after multiple deletions
Yes, that's what I meant, but correct me if I'm wrong, when a deletion comes after another deletion for the same row or column will the gc-before count against the last one, isn't it? Maybe knowing that all the subsequent versions of a deletion are deletions too, it could take the first timestamp against the gc-grace-seconds when is reducing & compacting. // Germán Kondolf http://twitter.com/germanklf http://code.google.com/p/seide/ // @i4 On 19/01/2011, at 00:16, Jonathan Ellis wrote: > If you mean that multiple tombstones for the same row or column should > be merged into a single one at compaction time, then yes, that is what > happens. > > On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf > wrote: >> Maybe it could be taken into account when the compaction is executed, >> if I only have a consecutive list of uninterrupted tombstones it could >> only care about the first. It sounds like the-way-it-should-be, maybe >> as a part of the "row-reduce" process. >> >> Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. >> >> //GK >> http://twitter.com/germanklf >> http://code.google.com/p/seide/ >> >> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne >> wrote: >>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn wrote: >>>> Thanks, Aaron, but I'm not 100% clear. >>>> >>>> My situation is this: My use case spins off rows (not columns) that I no >>>> longer need and want to delete. It is possible that these rows were never >>>> created in the first place, or were already deleted. This is a very large >>>> cleanup task that normally deletes a lot of rows, and the last thing that I >>>> want to do is create tombstones for rows that didn't exist in the first >>>> place, or lengthen the life on disk of tombstones of rows that are already >>>> deleted. >>>> >>>> So the question is: before I delete, do I have to retrieve the row to see >>>> if >>>> it exists in the first place? >>> >>> Yes, in your situation you do. >>> >>>> >>>> >>>> >>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton >>>> wrote: >>>>> >>>>> AFAIK that's not necessary, there is no need to worry about previous >>>>> deletes. You can delete stuff that does not even exist, neither >>>>> batch_mutate >>>>> or remove are going to throw an error. >>>>> All the columns that were (roughly speaking) present at your first >>>>> deletion will be available for GC at the end of the first tombstones life. >>>>> Same for the second. >>>>> Say you were to write a col between the two deletes with the same name as >>>>> one present at the start. The first version of the col is avail for GC >>>>> after >>>>> tombstone 1, and the second after tombstone 2. >>>>> Hope that helps >>>>> Aaron >>>>> On 18/01/2011, at 9:37 PM, David Boxenhorn wrote: >>>>> >>>>> Thanks. In other words, before I delete something, I should check to see >>>>> whether it exists as a live row in the first place. >>>>> >>>>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King wrote: >>>>>> >>>>>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn >>>>>> wrote: >>>>>>> If I delete a row, and later on delete it again, before GCGraceSeconds >>>>>>> has >>>>>>> elapsed, does the tombstone live longer? >>>>>> >>>>>> Each delete is a new tombstone, which should answer your question. >>>>>> >>>>>> -ryan >>>>>> >>>>>>> In other words, if I have the following scenario: >>>>>>> >>>>>>> GCGraceSeconds = 10 days >>>>>>> On day 1 I delete a row >>>>>>> On day 5 I delete the row again >>>>>>> >>>>>>> Will the tombstone be removed on day 10 or day 15? >>>>>>> >>>>> >>>> >>>> >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com
Re: Tombstone lifespan after multiple deletions
On Wed, Jan 19, 2011 at 12:59 AM, Zhu Han wrote: > > > On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf > wrote: >> >> Yes, that's what I meant, but correct me if I'm wrong, when a deletion >> comes after another deletion for the same row or column will the gc-before >> count against the last one, isn't it? >> > IIRC, after compaction. even if the row key is not wiped, all the CF are > replaced by the youngest tombstone. I do not understand very clearly the > benefit of wiping out the whole row as early as possible. > I think it is not a "benefit", but a potencial issue, if you delete columns or rows without checking them before you could make them live as long as you keep issuing deletions, maybe it's a strange use-case, but certainly Cassandra provides new non-traditional ways of processing high-volume of information. As the original example depicted clearly: day 1 -> insert Row1.Col1 day 2 -> delete Row1.Col1 day 11 (before gc-grace-seconds) -> delete Row1.Col1 In the last command I've extended the life of a tombstone, maybe the check before the deletion could have a performance impact in the process, so I think it might be handled server-side instead of client-side. //GK http://twitter.com/germanklf http://code.google.com/p/seide/ >> >> Maybe knowing that all the subsequent versions of a deletion are deletions >> too, it could take the first timestamp against the gc-grace-seconds when is >> reducing & compacting. >> >> // Germán Kondolf >> http://twitter.com/germanklf >> http://code.google.com/p/seide/ >> // @i4 >> >> On 19/01/2011, at 00:16, Jonathan Ellis wrote: >> >> > If you mean that multiple tombstones for the same row or column should >> > be merged into a single one at compaction time, then yes, that is what >> > happens. >> > >> > On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf >> > wrote: >> >> Maybe it could be taken into account when the compaction is executed, >> >> if I only have a consecutive list of uninterrupted tombstones it could >> >> only care about the first. It sounds like the-way-it-should-be, maybe >> >> as a part of the "row-reduce" process. >> >> >> >> Is it feasible? Looking into the CASSANDRA-1074 sounds like it should. >> >> >> >> //GK >> >> http://twitter.com/germanklf >> >> http://code.google.com/p/seide/ >> >> >> >> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne >> >> wrote: >> >>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn >> >>> wrote: >> >>>> Thanks, Aaron, but I'm not 100% clear. >> >>>> >> >>>> My situation is this: My use case spins off rows (not columns) that I >> >>>> no >> >>>> longer need and want to delete. It is possible that these rows were >> >>>> never >> >>>> created in the first place, or were already deleted. This is a very >> >>>> large >> >>>> cleanup task that normally deletes a lot of rows, and the last thing >> >>>> that I >> >>>> want to do is create tombstones for rows that didn't exist in the >> >>>> first >> >>>> place, or lengthen the life on disk of tombstones of rows that are >> >>>> already >> >>>> deleted. >> >>>> >> >>>> So the question is: before I delete, do I have to retrieve the row to >> >>>> see if >> >>>> it exists in the first place? >> >>> >> >>> Yes, in your situation you do. >> >>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton >> >>>> >> >>>> wrote: >> >>>>> >> >>>>> AFAIK that's not necessary, there is no need to worry about previous >> >>>>> deletes. You can delete stuff that does not even exist, neither >> >>>>> batch_mutate >> >>>>> or remove are going to throw an error. >> >>>>> All the columns that were (roughly speaking) present at your first >> >>>>> deletion will be available for GC at the end of the first tombstones >> >>>>> life. >> >>>>> Same for the second. >> >>>>> Say you were to write a col between the two deletes with the same
Re: Tombstone lifespan after multiple deletions
On Wed, Jan 19, 2011 at 11:52 AM, Jonathan Ellis wrote: > On Wed, Jan 19, 2011 at 6:41 AM, Germán Kondolf > wrote: >> As the original example depicted clearly: >> day 1 -> insert Row1.Col1 >> day 2 -> delete Row1.Col1 >> day 11 (before gc-grace-seconds) -> delete Row1.Col1 >> >> In the last command I've extended the life of a tombstone, maybe the >> check before the deletion could have a performance impact in the >> process, so I think it might be handled server-side instead of >> client-side. > > It has performance implications no matter where you do it, which is > why we're not going to do it on the server. :) > > "Writes [or deletes] don't cause reads" is a basic design decision. > This is a much bigger win than the very narrow corner case of being > able to remove a tombstone marker a little earlier. > I totally agree on that, I'll never propose a read before a write server-side, my bad, I didn't make that clear. The idea is that in the reduce process during a compaction we could change the logic to take the oldest expiration time instead of the youngest, I should take a look to the code to see if it's feasible. A workaround just by configuration is to reduce the gc-grace-seconds enough to avoid this undesired "tombstone-keep-alive". //GK http://twitter.com/germanklf http://code.google.com/p/seide/ > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >