Re: re-execution of failed queries with rpc_timeout

Edward Capriolo Tue, 16 Apr 2013 14:16:06 -0700

Q: The newer versions of Cassandra include extra information in the
exception, I **think** you can use that information to determine how many
machines the operation succeeded on. However I do not think that
information means you can make counters that timed out "bulletproof"



On Tue, Apr 16, 2013 at 5:08 PM, aaron morton <[email protected]>wrote:

> If you are using Counters you need to do everything you can to avoid
> timeouts. In the worse case we do not know where it has been applied. The
> increment is applied on a lead and then replicated to the others, if the
> coordinator is not  the lead it may not know if the increments was applied
> at all.
>
> Start by reducing the size of the updates. Larger batches do not always
> mean better performance.
>
>  In all other cases, the rpc_timeout might be thrown from a remote node
>> (not the one I'm connected to), and hence some parts of the update will be
>> performed and others parts will not.
>>
> TimedOutException is always thrown from the coordinator you are connected
> to.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/04/2013, at 1:38 PM, Moty Kosharovsky <[email protected]> wrote:
>
> Sorry, not LOCAL QUORUM, I meant "ANY" quorum.
>
>
> On Mon, Apr 15, 2013 at 4:12 AM, Moty Kosharovsky <[email protected]>wrote:
>
>> Hello,
>>
>> I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk
>> 1.6.0_35. Our application constantly writes large updates with cql. Once in
>> a while, an rpc_time will occur.
>>
>> Since a lot of the information is counters, its impossible for me to
>> understand if the updates complete partially on rpc_timeout, or cassandra
>> somehow rolls back the change completely, and hence I can't tell if I
>> should re-execute the query on rpc_timeout (with double processing being a
>> bigger concern than missing updates).
>>
>> I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
>> rpc_timeout will always mean that the update was not processes as a whole.
>> In all other cases, the rpc_timeout might be thrown from a remote node (not
>> the one I'm connected to), and hence some parts of the update will be
>> performed and others parts will not.
>>
>> Anyone solved this issue before?
>>
>> Kind Regards,
>> Kosha
>>
>
>
>

Re: re-execution of failed queries with rpc_timeout

Reply via email to