Your experience, then, is expected (although 20m delay seems excessive, and
is a sign you may be overloading your cluster, which may be expected with
an unthrottled bulk load like that).

When you insert with consistency ONE on RF > 1, that means your query
returns after one node confirms the write. The write will attempt to go out
to the other nodes that are responsible for that row, but the coordinator
does not bother waiting for the response. If your nodes are overloaded,
they may not accept the write at all; failures may result in hinted handoff
being used, or just the write being dropped in general.

At the end of your load, you likely have nodes missing writes. Look for
dropped MUTATION messages in your nodetool tpstats. For operations that
cannot tolerate this, you need to write and read with a higher consistency
level.

Consistency is achieved over time via hinted handoff, read repair, and
other mechanics (assuming you're not running a repair in between). Your
cluster will gradually return to consistency, *provided your nodes do not
suffer any downtime or exceed the hint window in terms of unavailability*.



On Fri, Nov 6, 2015 at 10:58 AM, Greg Traub <randomciti...@gmail.com> wrote:

> Vidur,
>
> Forgive me if I'm getting this wrong as I'm exceptionally new to Cassandra.
>
> By consistency, if you mean the USING CONSISTENCY clause, then I'm not
> specifying it which, per the CQL documentation, means a default of ONE.
>
> On Fri, Nov 6, 2015 at 1:49 PM, Vidur Malik <vi...@shopkeep.com> wrote:
>
>> What is your query consistency?
>>
>> On Fri, Nov 6, 2015 at 1:47 PM, Greg Traub <randomciti...@gmail.com>
>> wrote:
>>
>>> Cassandra users,
>>>
>>> I have a 4 node Cassandra cluster set up.  All nodes are in a single
>>> rack and distribution center.  I have a loader program which loads 40
>>> million rows into a table in a keyspace with a replication factor of 3.
>>> Immediately after inserting the rows (after the loader program finishes),
>>> if I SELECT count(*) from the table, the result is less than 40 million.
>>> If I run our dumper program to retrieve all rows, it is less than 40
>>> million.  However, if I wait roughly 20 minutes, the count eventually
>>> reaches 40 million rows and the dumper program returns all 40 million.
>>>
>>> If I do the same thing in a keyspace where the replication factor is 1,
>>> I don't have any "stabilization" time and the 40 million rows are
>>> immediately available.
>>>
>>> I've modified the loading and dumping programs to use both the Thrift
>>> Java driver and the CQL Java driver and neither seems to make a difference.
>>>
>>> I'm very new to Cassandra and my questions are, what may be causing this
>>> delay in all rows being available and how might I lessen/eliminate this
>>> delay?
>>>
>>> Thanks,
>>> Greg
>>>
>>
>>
>>
>> --
>>
>> Vidur Malik
>>
>> [image: ShopKeep] <http://www.shopkeep.com>
>>
>> 800.820.9814
>> <8008209814> [image: ShopKeep] <https://www.facebook.com/ShopKeepPOS> [image:
>> ShopKeep] <https://twitter.com/shopkeep> [image: ShopKeep]
>> <https://instagram.com/shopkeep/>
>>
>
>

Reply via email to