Re: hazelcast

2010-12-10 Thread Germán Kondolf
Hi, I'm using it as a complement of cassandra, to avoid "duplicate"
searches and duplicate content in a given moment in time.
It works really nice by now, no critical issues, at least the
functionallity I'm using from it.

-- 
//GK
german.kond...@gmail.com
// sites
http://twitter.com/germanklf
http://ar.linkedin.com/in/germankondolf

On Fri, Dec 10, 2010 at 2:50 PM, B. Todd Burruss  wrote:
> http://www.hazelcast.com/product.jsp
>
> has anyone tested hazelcast as a distributed locking mechanism for java
> clients?  seems very attractive on the surface.
>


Re: hazelcast

2010-12-10 Thread Germán Kondolf
I don't know much about Zookeeper, but as far as I read, it is out of
JVM process.
Hazelcast is just a framework and you can programmatically start and
shutdown the cluster, it's just an xml to configure it.

Hazelcast also provides good caching features to integrate with
Hibernate, distributed executors, clusterized queues, distributed
events, and so on. I don't know if that is supported by Zookeeper, I
think not, because is not the main goal of it.

On Fri, Dec 10, 2010 at 4:49 PM, B. Todd Burruss  wrote:
> thx for the feedback.  regarding locking, has anyone done a comparison to
> zookeeper?  does zookeeper provide functionality over hazelcast?
>
> On 12/10/2010 11:08 AM, Norman Maurer wrote:
>>
>> Hi there,
>>
>> I'm not using it atm but plan to in my next project. It really looks nice
>> :)
>>
>> Bye,
>> Norman
>>
>> 2010/12/10 Germán Kondolf:
>>>
>>> Hi, I'm using it as a complement of cassandra, to avoid "duplicate"
>>> searches and duplicate content in a given moment in time.
>>> It works really nice by now, no critical issues, at least the
>>> functionallity I'm using from it.
>>>
>>> --
>>> //GK
>>> german.kond...@gmail.com
>>> // sites
>>> http://twitter.com/germanklf
>>> http://ar.linkedin.com/in/germankondolf
>>>
>>> On Fri, Dec 10, 2010 at 2:50 PM, B. Todd Burruss
>>>  wrote:
>>>>
>>>> http://www.hazelcast.com/product.jsp
>>>>
>>>> has anyone tested hazelcast as a distributed locking mechanism for java
>>>> clients?  seems very attractive on the surface.
>>>>
>



-- 
//GK
german.kond...@gmail.com
// sites
http://twitter.com/germanklf
http://www.facebook.com/germanklf
http://ar.linkedin.com/in/germankondolf


Re: Too many open files Exception + java.lang.ArithmeticException: / by zero

2010-12-16 Thread Germán Kondolf
Be careful with the unlimited value on ulimit, you could end up with a
unresponsive server... I mean, you could not even connect via ssh if you
don't have enough handles.

On Thu, Dec 16, 2010 at 9:59 AM, Amin Sakka, Novapost <
amin.sa...@novapost.fr> wrote:

>
> I increased the amount of the allowed file descriptors to "unlimted".
> Now, I get exactly the same exception after 3.50 rows :
>
> *CustomTThreadPoolServer.java (line 104) Transport error occurred during
> acceptance of message.*
> *org.apache.thrift.transport.TTransportException:
> java.net.SocketException: Too many open files*
> *
> *
> What worries me is this / by zero exception when I try to restart cassandra
> ! At least, I want to backup the 3.50 rows to continue then my
> insertion, is there a way to do this?
>
> *
>  Exception encountered during startup.
> java.lang.ArithmeticException: / by zero
>  at
> org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233)
>
> *
>
>
> Thanks.
> *
> *
>
>
>
>
>
> 2010/12/15 Jake Luciani 
>
>
>> http://www.riptano.com/docs/0.6/troubleshooting/index#java-reports-an-error-saying-there-are-too-many-open-files
>>
>>
>>
>> On Wed, Dec 15, 2010 at 11:13 AM, Amin Sakka, Novapost <
>> amin.sa...@novapost.fr> wrote:
>>
>>> *Hello,*
>>> *I'm using cassandra 0.7.0 rc1, a single node configuration, replication
>>> factor 1, random partitioner, 2 GO heap size.*
>>> *I ran my hector client to insert 5.000.000 rows but after a couple of
>>> hours, the following Exception occurs : *
>>>
>>>
>>>  WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line
>>> 104) Transport error occurred during acceptance of message.
>>> org.apache.thrift.transport.TTransportException:
>>> java.net.SocketException: Too many open files
>>>  at
>>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124)
>>> at
>>> org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67)
>>>  at
>>> org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38)
>>> at
>>> org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>>>  at
>>> org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98)
>>> at
>>> org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120)
>>>  at
>>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229)
>>> at
>>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>>> Caused by: java.net.SocketException: Too many open files
>>> at java.net.PlainSocketImpl.socketAccept(Native Method)
>>> at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
>>>  at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>>> at java.net.ServerSocket.accept(ServerSocket.java:421)
>>>  at
>>> org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119)
>>>
>>>
>>> *When I try to restart Cassandra, I have the following exception :*
>>>
>>>
>>> ERROR 16:42:26,573 Exception encountered during startup.
>>> java.lang.ArithmeticException: / by zero
>>> at
>>> org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233)
>>>  at
>>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284)
>>> at
>>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200)
>>>  at
>>> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:225)
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449)
>>>  at
>>> org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306)
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:246)
>>>  at
>>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449)
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437)
>>>  at org.apache.cassandra.db.Table.initCf(Table.java:341)
>>> at org.apache.cassandra.db.Table.(Table.java:283)
>>>  at org.apache.cassandra.db.Table.open(Table.java:114)
>>> at
>>> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138)
>>>  at
>>> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>>> at
>>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
>>>  at
>>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>>>
>>>
>>> I am looking for advice on how to debug this.
>>>
>>> Thanks,
>>>
>>> --
>>>
>>> Amin
>>>
>>>
>>>
>>>
>>>
>>
>
>
> --
>
> Amin SAKKA
> Research and Development Engineer
> 32 rue de Paradis, 75010 Paris
> *Tel:* +33 (0)6 34 14 19 25
> *Mail:* amin.sa...@novapost.fr
> *Web:* www.novapost.fr / www.novapost-rh.fr
>
>
>
>
>


-- 
//GK
german.kond...@gmail.com
// sites
http://twitter.com/germanklf
http://www.facebook.com/germanklf
http://ar.linkedin.com/in/germank

Re: Too many open files Exception + java.lang.ArithmeticException: / by zero

2010-12-16 Thread Germán Kondolf
Indeed Hector has a connection pool behind it, I think it uses 50
connectios per node.
But also uses a node to discover the others, I assume that, as I saw
connections from my app to nodes that I didn't configure in Hector.

So, you may check the fds in OS level to see if there is a bottleneck there.

On Thu, Dec 16, 2010 at 2:39 PM, Amin Sakka, Novapost
 wrote:
>
> I'm using a unique client instance (using Hector) and a unique connection to
> cassandra.
> For each insertion I'm using a new mutator and then I release it.
> I have 473  sstable "Data.db", the average size of each is 30Mo.
>
>
>
> 2010/12/16 Ryan King 
>>
>> Are you creating a new connection for each row you insert (and if so
>> are you closing it)?
>>
>> -ryan
>>
>> On Wed, Dec 15, 2010 at 8:13 AM, Amin Sakka, Novapost
>>  wrote:
>> > Hello,
>> > I'm using cassandra 0.7.0 rc1, a single node configuration, replication
>> > factor 1, random partitioner, 2 GO heap size.
>> > I ran my hector client to insert 5.000.000 rows but after a couple of
>> > hours,
>> > the following Exception occurs :
>> >
>> >  WARN [main] 2010-12-15 16:38:53,335 CustomTThreadPoolServer.java (line
>> > 104)
>> > Transport error occurred during acceptance of message.
>> > org.apache.thrift.transport.TTransportException:
>> > java.net.SocketException:
>> > Too many open files
>> > at
>> >
>> > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124)
>> > at
>> >
>> > org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:67)
>> > at
>> >
>> > org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:38)
>> > at
>> >
>> > org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
>> > at
>> >
>> > org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98)
>> > at
>> >
>> > org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:120)
>> > at
>> >
>> > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:229)
>> > at
>> >
>> > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>> > Caused by: java.net.SocketException: Too many open files
>> > at java.net.PlainSocketImpl.socketAccept(Native Method)
>> > at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
>> > at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>> > at java.net.ServerSocket.accept(ServerSocket.java:421)
>> > at
>> >
>> > org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119)
>> >
>> > When I try to restart Cassandra, I have the following exception :
>> >
>> > ERROR 16:42:26,573 Exception encountered during startup.
>> > java.lang.ArithmeticException: / by zero
>> > at
>> >
>> > org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233)
>> > at
>> >
>> > org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284)
>> > at
>> >
>> > org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200)
>> > at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:225)
>> > at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449)
>> > at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:306)
>> > at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:246)
>> > at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:449)
>> > at
>> >
>> > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:437)
>> > at org.apache.cassandra.db.Table.initCf(Table.java:341)
>> > at org.apache.cassandra.db.Table.(Table.java:283)
>> > at org.apache.cassandra.db.Table.open(Table.java:114)
>> > at
>> >
>> > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138)
>> > at
>> >
>> > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>> > at
>> >
>> > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216)
>> > at
>> >
>> > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>> >
>> > I am looking for advice on how to debug this.
>> >
>> > Thanks,
>> > --
>> >
>> > Amin
>> >
>> >
>> >
>> >
>> >
>
>
>
> --
> Amin
>
>
>
>



-- 
//GK
german.kond...@gmail.com
// sites
http://twitter.com/germanklf
http://www.facebook.com/germanklf
http://ar.linkedin.com/in/germankondolf


Re: WELCOME to user@cassandra.apache.org

2010-12-29 Thread Germán Kondolf
Hmm... what about just paying for it?

It cost less than $20 on Amazon for the Kindle version...
(http://www.amazon.com/Cassandra-Definitive-Guide-Eben-Hewitt/dp/1449390412).

// Germán Kondolf
http://twitter.com/germanklf
http://code.google.com/p/seide/
// @iPad

On 30/12/2010, at 01:26, asil klin  wrote:

> Can anyone pass me a pdf copy of Cassandra The Definitive Guide ?
>
>
> Thanks.
> Asil


Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread Germán Kondolf
Maybe it could be taken into account when the compaction is executed,
if I only have a consecutive list of uninterrupted tombstones it could
only care about the first. It sounds like the-way-it-should-be, maybe
as a part of the "row-reduce" process.

Is it feasible? Looking into the CASSANDRA-1074 sounds like it should.

//GK
http://twitter.com/germanklf
http://code.google.com/p/seide/

On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne  wrote:
> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn  wrote:
>> Thanks, Aaron, but I'm not 100% clear.
>>
>> My situation is this: My use case spins off rows (not columns) that I no
>> longer need and want to delete. It is possible that these rows were never
>> created in the first place, or were already deleted. This is a very large
>> cleanup task that normally deletes a lot of rows, and the last thing that I
>> want to do is create tombstones for rows that didn't exist in the first
>> place, or lengthen the life on disk of tombstones of rows that are already
>> deleted.
>>
>> So the question is: before I delete, do I have to retrieve the row to see if
>> it exists in the first place?
>
> Yes, in your situation you do.
>
>>
>>
>>
>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton 
>> wrote:
>>>
>>> AFAIK that's not necessary, there is no need to worry about previous
>>> deletes. You can delete stuff that does not even exist, neither batch_mutate
>>> or remove are going to throw an error.
>>> All the columns that were (roughly speaking) present at your first
>>> deletion will be available for GC at the end of the first tombstones life.
>>> Same for the second.
>>> Say you were to write a col between the two deletes with the same name as
>>> one present at the start. The first version of the col is avail for GC after
>>> tombstone 1, and the second after tombstone 2.
>>> Hope that helps
>>> Aaron
>>> On 18/01/2011, at 9:37 PM, David Boxenhorn  wrote:
>>>
>>> Thanks. In other words, before I delete something, I should check to see
>>> whether it exists as a live row in the first place.
>>>
>>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King  wrote:

 On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn 
 wrote:
 > If I delete a row, and later on delete it again, before GCGraceSeconds
 > has
 > elapsed, does the tombstone live longer?

 Each delete is a new tombstone, which should answer your question.

 -ryan

 > In other words, if I have the following scenario:
 >
 > GCGraceSeconds = 10 days
 > On day 1 I delete a row
 > On day 5 I delete the row again
 >
 > Will the tombstone be removed on day 10 or day 15?
 >
>>>
>>
>>
>


Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread Germán Kondolf
Yes, that's what I meant, but correct me if I'm wrong, when a deletion comes 
after another deletion for the same row or column will the gc-before count 
against the last one, isn't it?

Maybe knowing that all the subsequent versions of a deletion are deletions too, 
it could take the first timestamp against the gc-grace-seconds when is reducing 
& compacting.

// Germán Kondolf
http://twitter.com/germanklf
http://code.google.com/p/seide/
// @i4

On 19/01/2011, at 00:16, Jonathan Ellis  wrote:

> If you mean that multiple tombstones for the same row or column should
> be merged into a single one at compaction time, then yes, that is what
> happens.
> 
> On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf
>  wrote:
>> Maybe it could be taken into account when the compaction is executed,
>> if I only have a consecutive list of uninterrupted tombstones it could
>> only care about the first. It sounds like the-way-it-should-be, maybe
>> as a part of the "row-reduce" process.
>> 
>> Is it feasible? Looking into the CASSANDRA-1074 sounds like it should.
>> 
>> //GK
>> http://twitter.com/germanklf
>> http://code.google.com/p/seide/
>> 
>> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne  
>> wrote:
>>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn  wrote:
>>>> Thanks, Aaron, but I'm not 100% clear.
>>>> 
>>>> My situation is this: My use case spins off rows (not columns) that I no
>>>> longer need and want to delete. It is possible that these rows were never
>>>> created in the first place, or were already deleted. This is a very large
>>>> cleanup task that normally deletes a lot of rows, and the last thing that I
>>>> want to do is create tombstones for rows that didn't exist in the first
>>>> place, or lengthen the life on disk of tombstones of rows that are already
>>>> deleted.
>>>> 
>>>> So the question is: before I delete, do I have to retrieve the row to see 
>>>> if
>>>> it exists in the first place?
>>> 
>>> Yes, in your situation you do.
>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton 
>>>> wrote:
>>>>> 
>>>>> AFAIK that's not necessary, there is no need to worry about previous
>>>>> deletes. You can delete stuff that does not even exist, neither 
>>>>> batch_mutate
>>>>> or remove are going to throw an error.
>>>>> All the columns that were (roughly speaking) present at your first
>>>>> deletion will be available for GC at the end of the first tombstones life.
>>>>> Same for the second.
>>>>> Say you were to write a col between the two deletes with the same name as
>>>>> one present at the start. The first version of the col is avail for GC 
>>>>> after
>>>>> tombstone 1, and the second after tombstone 2.
>>>>> Hope that helps
>>>>> Aaron
>>>>> On 18/01/2011, at 9:37 PM, David Boxenhorn  wrote:
>>>>> 
>>>>> Thanks. In other words, before I delete something, I should check to see
>>>>> whether it exists as a live row in the first place.
>>>>> 
>>>>> On Tue, Jan 18, 2011 at 9:24 AM, Ryan King  wrote:
>>>>>> 
>>>>>> On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn 
>>>>>> wrote:
>>>>>>> If I delete a row, and later on delete it again, before GCGraceSeconds
>>>>>>> has
>>>>>>> elapsed, does the tombstone live longer?
>>>>>> 
>>>>>> Each delete is a new tombstone, which should answer your question.
>>>>>> 
>>>>>> -ryan
>>>>>> 
>>>>>>> In other words, if I have the following scenario:
>>>>>>> 
>>>>>>> GCGraceSeconds = 10 days
>>>>>>> On day 1 I delete a row
>>>>>>> On day 5 I delete the row again
>>>>>>> 
>>>>>>> Will the tombstone be removed on day 10 or day 15?
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com



Re: Tombstone lifespan after multiple deletions

2011-01-19 Thread Germán Kondolf
On Wed, Jan 19, 2011 at 12:59 AM, Zhu Han  wrote:
>
>
> On Wed, Jan 19, 2011 at 11:35 AM, Germán Kondolf 
> wrote:
>>
>> Yes, that's what I meant, but correct me if I'm wrong, when a deletion
>> comes after another deletion for the same row or column will the gc-before
>> count against the last one, isn't it?
>>
> IIRC, after compaction. even if the row key is not wiped, all the CF are
> replaced by the youngest tombstone.  I do not understand very clearly the
> benefit of wiping out the whole row as early as possible.
>

I think it is not a "benefit", but a potencial issue, if you delete
columns or rows without checking them before you could make them live
as long as you keep issuing deletions, maybe it's a strange use-case,
but certainly Cassandra provides new non-traditional ways of
processing high-volume of information.

As the original example depicted clearly:
day 1 -> insert Row1.Col1
day 2 -> delete Row1.Col1
day 11 (before gc-grace-seconds) -> delete Row1.Col1

In the last command I've extended the life of a tombstone, maybe the
check before the deletion could have a performance impact in the
process, so I think it might be handled server-side instead of
client-side.

//GK
http://twitter.com/germanklf
http://code.google.com/p/seide/

>>
>> Maybe knowing that all the subsequent versions of a deletion are deletions
>> too, it could take the first timestamp against the gc-grace-seconds when is
>> reducing & compacting.
>>
>> // Germán Kondolf
>> http://twitter.com/germanklf
>> http://code.google.com/p/seide/
>> // @i4
>>
>> On 19/01/2011, at 00:16, Jonathan Ellis  wrote:
>>
>> > If you mean that multiple tombstones for the same row or column should
>> > be merged into a single one at compaction time, then yes, that is what
>> > happens.
>> >
>> > On Tue, Jan 18, 2011 at 7:53 PM, Germán Kondolf
>> >  wrote:
>> >> Maybe it could be taken into account when the compaction is executed,
>> >> if I only have a consecutive list of uninterrupted tombstones it could
>> >> only care about the first. It sounds like the-way-it-should-be, maybe
>> >> as a part of the "row-reduce" process.
>> >>
>> >> Is it feasible? Looking into the CASSANDRA-1074 sounds like it should.
>> >>
>> >> //GK
>> >> http://twitter.com/germanklf
>> >> http://code.google.com/p/seide/
>> >>
>> >> On Tue, Jan 18, 2011 at 10:55 AM, Sylvain Lebresne
>> >>  wrote:
>> >>> On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn 
>> >>> wrote:
>> >>>> Thanks, Aaron, but I'm not 100% clear.
>> >>>>
>> >>>> My situation is this: My use case spins off rows (not columns) that I
>> >>>> no
>> >>>> longer need and want to delete. It is possible that these rows were
>> >>>> never
>> >>>> created in the first place, or were already deleted. This is a very
>> >>>> large
>> >>>> cleanup task that normally deletes a lot of rows, and the last thing
>> >>>> that I
>> >>>> want to do is create tombstones for rows that didn't exist in the
>> >>>> first
>> >>>> place, or lengthen the life on disk of tombstones of rows that are
>> >>>> already
>> >>>> deleted.
>> >>>>
>> >>>> So the question is: before I delete, do I have to retrieve the row to
>> >>>> see if
>> >>>> it exists in the first place?
>> >>>
>> >>> Yes, in your situation you do.
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton
>> >>>> 
>> >>>> wrote:
>> >>>>>
>> >>>>> AFAIK that's not necessary, there is no need to worry about previous
>> >>>>> deletes. You can delete stuff that does not even exist, neither
>> >>>>> batch_mutate
>> >>>>> or remove are going to throw an error.
>> >>>>> All the columns that were (roughly speaking) present at your first
>> >>>>> deletion will be available for GC at the end of the first tombstones
>> >>>>> life.
>> >>>>> Same for the second.
>> >>>>> Say you were to write a col between the two deletes with the same

Re: Tombstone lifespan after multiple deletions

2011-01-19 Thread Germán Kondolf
On Wed, Jan 19, 2011 at 11:52 AM, Jonathan Ellis  wrote:
> On Wed, Jan 19, 2011 at 6:41 AM, Germán Kondolf
>  wrote:
>> As the original example depicted clearly:
>> day 1 -> insert Row1.Col1
>> day 2 -> delete Row1.Col1
>> day 11 (before gc-grace-seconds) -> delete Row1.Col1
>>
>> In the last command I've extended the life of a tombstone, maybe the
>> check before the deletion could have a performance impact in the
>> process, so I think it might be handled server-side instead of
>> client-side.
>
> It has performance implications no matter where you do it, which is
> why we're not going to do it on the server. :)
>
> "Writes [or deletes] don't cause reads" is a basic design decision.
> This is a much bigger win than the very narrow corner case of being
> able to remove a tombstone marker a little earlier.
>

I totally agree on that, I'll never propose a read before a write
server-side, my bad, I didn't make that clear.

The idea is that in the reduce process during a compaction we could
change the logic to take the oldest expiration time instead of the
youngest, I should take a look to the code to see if it's feasible.

A workaround just by configuration is to reduce the gc-grace-seconds
enough to avoid this undesired "tombstone-keep-alive".

//GK
http://twitter.com/germanklf
http://code.google.com/p/seide/
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>