A fix for those who suffer from GC storm by tombstones

2014-10-07 Thread Takenori Sato
Hi,

I have filed a fix as CASSANDRA-8038, which would be a good news for those
who has suffered from overwhelming GC or OOM by tombstones.

Appreciate your feedbacks!

Thanks,
Takenori


Re: Dynamic schema modification an anti-pattern?

2014-10-07 Thread Colin
Anti-pattern.  Dynamically altering the schema won't scale and is bad ju ju.

--
Colin Clark 
+1-320-221-9531
 

> On Oct 6, 2014, at 10:56 PM, Todd Fast  wrote:
> 
> There is a team at my work building a entity-attribute-value (EAV) store 
> using Cassandra. There is a column family, called Entity, where the partition 
> key is the UUID of the entity, and the columns are the attributes names with 
> their values. Each entity will contain hundreds to thousands of attributes, 
> out of a list of up to potentially ten thousand known attribute names.
> 
> However, instead of using wide rows with dynamic columns (and serializing 
> type info with the value), they are trying to use a static column family and 
> modifying the schema dynamically as new named attributes are created.
> 
> (I believe one of the main drivers of this approach is to use collection 
> columns for certain attributes, and perhaps to preserve type metadata for a 
> given attribute.)
> 
> This approach goes against everything I've seen and done in Cassandra, and is 
> generally an anti-pattern for most persistence stores, but I want to gather 
> feedback before taking the next step with the team.
> 
> Do others consider this approach an anti-pattern, and if so, what are the 
> practical downsides?
> 
> For one, this means that the Entity schema would contain the superset of all 
> columns for all rows. What is the impact of having thousands of columns names 
> in the schema? And what are the implications of modifying the schema 
> dynamically on a decent sized cluster (5 nodes now, growing to 10s later) 
> under load?
> 
> Thanks,
> Todd


Re: Dynamic schema modification an anti-pattern?

2014-10-07 Thread DuyHai Doan
Furthermore, dynamically altering the schema will prevent adding new node
to the cluster. I've faced a similar issue recently. While the new node is
joining the cluster, data are streamed from old to new node. If the
application alter the schema on the fly (DROP TABLE, DROP COLUMN ) the
data stream arriving at the new node cannot be processed because the schema
has changed (table dropped, column dropped). The streaming is then stalled
and new node remains on JOINING state forever

 It can be a serious blocker for scaling the cluster

On Tue, Oct 7, 2014 at 9:41 AM, Colin  wrote:

> Anti-pattern.  Dynamically altering the schema won't scale and is bad ju
> ju.
>
> --
> *Colin Clark*
> +1-320-221-9531
>
>
> On Oct 6, 2014, at 10:56 PM, Todd Fast  wrote:
>
> There is a team at my work building a entity-attribute-value (EAV) store
> using Cassandra. There is a column family, called Entity, where the
> partition key is the UUID of the entity, and the columns are the attributes
> names with their values. Each entity will contain hundreds to thousands of
> attributes, out of a list of up to potentially ten thousand known attribute
> names.
>
> However, instead of using wide rows with dynamic columns (and serializing
> type info with the value), they are trying to use a static column family
> and modifying the schema dynamically as new named attributes are created.
>
> (I believe one of the main drivers of this approach is to use collection
> columns for certain attributes, and perhaps to preserve type metadata for a
> given attribute.)
>
> This approach goes against everything I've seen and done in Cassandra, and
> is generally an anti-pattern for most persistence stores, but I want to
> gather feedback before taking the next step with the team.
>
> Do others consider this approach an anti-pattern, and if so, what are the
> practical downsides?
>
> For one, this means that the Entity schema would contain the superset of
> all columns for all rows. What is the impact of having thousands of columns
> names in the schema? And what are the implications of modifying the schema
> dynamically on a decent sized cluster (5 nodes now, growing to 10s later)
> under load?
>
> Thanks,
> Todd
>
>


Re: Multi-DC Repairs and Token Questions

2014-10-07 Thread Alain RODRIGUEZ
Hi guys, sorry about digging this up, but, is this bug also affecting 1.2.x
versions ? I can't see this being backported to 1.2 on the Jira. Was this
bug introduced in 2.0 ?

Anyway, how does nodetool repair -pr behave on a multi DC env, does it make
cross DC repairs or not ? Should we remove the "pr" option in a multi DC
context to remove entropy between DCs ? I mean a repair -pr is supposed to
repair the primary range for the current node, does it also repair
corresponding primary range in other DCs ?

Thanks for insight around this.

2014-06-03 8:06 GMT+02:00 Nick Bailey :

> See https://issues.apache.org/jira/browse/CASSANDRA-7317
>
>
> On Mon, Jun 2, 2014 at 8:57 PM, Matthew Allen 
> wrote:
>
>> Hi Rameez, Chovatia, (sorry I initially replied to Dwight individually)
>>
>> SN_KEYSPACE and MY_KEYSPACE are just typos (was try to mask out
>> identifiable information), they are same keyspace.
>>
>> Keyspace: SN_KEYSPACE:
>>   Replication Strategy:
>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>   Durable Writes: true
>> Options: [DC_VIC:2, DC_NSW:2]
>>
>> In a nutshell, replication is working as expected, I'm just confused
>> about token range assignments in a Multi-DC environment and how repairs
>> should work
>>
>> From
>> http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configGenTokens_c.html,
>> it specifies
>>
>> *"Multiple data center deployments: calculate the tokens for each
>> data center so that the hash range is evenly divided for the nodes in each
>> data center"*
>>
>> Given that nodetool -repair isn't multi-dc aware, in our production 18
>> node cluster (9 nodes in each DC), which of the following token ranges
>> should be used (Murmur3 Partitioner) ?
>>
>> Token range divided evenly over the 2 DC's/18 nodes as below ?
>>
>> Node DC_NSWDC_VIC
>> 1'-9223372036854775808''-8198552921648689608'
>> 2'-7173733806442603408''-6148914691236517208'
>> 3'-5124095576030431008''-4099276460824344808'
>> 4'-3074457345618258608''-2049638230412172408'
>> 5'-1024819115206086208''-8'
>> 6'1024819115206086192' '2049638230412172392'
>> 7'3074457345618258592' '4099276460824344792'
>> 8'5124095576030430992' '6148914691236517192'
>> 9'7173733806442603392' '8198552921648689592'
>>
>> Or An offset used for DC_VIC (i.e. DC_NSW + 100) ?
>>
>> Node DC_NSW DC_VIC
>> 1 '-9223372036854775808''-9223372036854775708'
>> 2 '-7173733806442603407''-7173733806442603307'
>> 3 '-5124095576030431006''-5124095576030430906'
>> 4 '-3074457345618258605''-3074457345618258505'
>> 5 '-1024819115206086204''-1024819115206086104'
>> 6 '1024819115206086197' '1024819115206086297'
>> 7 '3074457345618258598' '3074457345618258698'
>> 8 '5124095576030430999' '5124095576030431099'
>> 9 '7173733806442603400' '7173733806442603500'
>>
>> It's too late for me to switch to vnodes, hope that makes sense, thanks
>>
>> Matt
>>
>>
>>
>> On Thu, May 29, 2014 at 12:01 AM, Rameez Thonnakkal 
>> wrote:
>>
>>> as Chovatia mentioned, the keyspaces seems to be different.
>>> try "Describe keyspace SN_KEYSPACE" and "describe keyspace MY_KEYSPACE"
>>> from CQL.
>>> This will give you an idea about how many replicas are there for these
>>> keyspaces.
>>>
>>>
>>>
>>> On Wed, May 28, 2014 at 11:49 AM, chovatia jaydeep <
>>> chovatia_jayd...@yahoo.co.in> wrote:
>>>
 What is your partition type? Is
 it org.apache.cassandra.dht.Murmur3Partitioner?
 In your repair command i do see there are two different KeySpaces 
 "MY_KEYSPACE"
 and "SN_KEYSPACE", are these two separate key spaces or typo?

 -jaydeep


   On Tuesday, 27 May 2014 10:26 PM, Matthew Allen <
 matthew.j.al...@gmail.com> wrote:


 Hi,

 Am a bit confused regarding data ownership in a multi-dc environment.

 I have the following setup in a test cluster with a keyspace with
 (placement_strategy = 'NetworkTopologyStrategy' and strategy_options =
 {'DC_NSW':2,'DC_VIC':2};)

 Datacenter: DC_NSW
 ==
 Replicas: 2
 Address RackStatus State   Load
 OwnsToken

 0
 nsw1  rack1   Up Normal  1007.43 MB  100.00%
 -9223372036854775808
 nsw2  rack1   Up Normal  1008.08 MB  100.00% 0


 Datacenter: DC_VIC
 ==
 Replicas: 2
 Address RackStatus State   Load
 OwnsToken

 100
 vic1   rack1   Up Normal  1015.1 MB   100.00%
 -9223372036854775708
 vic2   rack1   Up Normal  1015.13 MB  100.00%
 100

 My understanding is that both Datacenters have a complete copy of the
 data, but when I run a repair -pr on each of the nodes, the vic hosts only
 take a couple of seconds, while the ns

Re: A fix for those who suffer from GC storm by tombstones

2014-10-07 Thread DuyHai Doan
Hello Takenori

 Read Repair belongs to the Anti-Entropy procedures to ensure that
eventually, data from all replicas do converge. Tombstones are data
(deletion marker) so they need to be exchanged between replicas. By
skipping tombstone you prevent the data convergence with regard to
deletion.

On Tue, Oct 7, 2014 at 9:13 AM, Takenori Sato  wrote:

> Hi,
>
> I have filed a fix as CASSANDRA-8038, which would be a good news for those
> who has suffered from overwhelming GC or OOM by tombstones.
>
> Appreciate your feedbacks!
>
> Thanks,
> Takenori
>


Doubts with the values of the parameter broadcast_rpc_address

2014-10-07 Thread Ricard Mestre Subirats
Hi everyone,

We were working with Cassandra clusters in 2.0 version and now we want to work 
with clusters in 2.1 version. We configure the Cassandra.yaml as we configured 
it in the previous version, but at the moment of start the service there is a 
fatal error. The log tells us that if you configure to 0.0.0.0 rpc_address, the 
broadcast_rpc_address has to be set too. But we don't know possible values for 
this parameter.

Can anyone explain us the functionality of this new parameter and a possible 
value?

Thank you very much!

Ricard



AVISO DE CONFIDENCIALIDAD.
Este correo y la informaci?n contenida o adjunta al mismo es privada y 
confidencial y va dirigida exclusivamente a su destinatario. everis informa a 
quien pueda haber recibido este correo por error que contiene informaci?n 
confidencial cuyo uso, copia, reproducci?n o distribuci?n est? expresamente 
prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por 
error, le rogamos lo ponga en conocimiento del emisor y proceda a su 
eliminaci?n sin copiarlo, imprimirlo o utilizarlo de ning?n modo.

CONFIDENTIALITY WARNING.
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. everis informs to whom 
it may receive it in error that it contains privileged information and its use, 
copy, reproduction or distribution is prohibited. If you are not an intended 
recipient of this E-mail, please notify the sender, delete it and do not read, 
act upon, print, disclose, copy, retain or redistribute any portion of this 
E-mail.


Re: Dynamic schema modification an anti-pattern?

2014-10-07 Thread Peter Lin
Statically defining columsn using EAV table approach is totally a wrong fit
for Cassandra.

Taking a step back, EAV tables generally don't scale at no matter the
database. I've done this on SqlServer, Oracle and DB2. Many products that
use EAV approach like master data management products suffer from this
issue. On RDBMS, people get around this by creating indexed (aka
materialized) views to make queries perform "mostly" acceptable when
there's millions of rows.

Ignoring all of that, the biggest issue with EAV approach is long term
maintenance and performance. If the data model or object model is
moderately complex, you end up with an explosion of queries to rebuild each
object instance. Something as simple as reading 100 objects could balloon
to 1000 queries. Having seen this first hand over the years, don't go there
unless your object model is trivially simple without any lists/sets/maps,
have less than 10 fields and the database will be less than 100K records.

The proper way to use Cassandra for dynamic object model use case is to use
dynamic columns. The downside though is this, CQL is not ideally suited to
this use case, which means using thrift. All "SQL inspired" languages
suffer from this design limitation. To do this type of stuff, you want a
strongly typed API that lets you control the exactly type of column/value
goes in and comes out. I've mentioned this in the past on the mailing list.
It's one of the biggest advantages of Cassandra, which other NoSql db's
can't do or do poorly. Don't take my word for it, go look at MDM products
from IBM, Oracle and Tibco to see how slow those systems run with millions
of records using EAV design.

When you combine bi-temporal support, the queries get even slower and
management gets more ugly over time. It's one of the reasons I built a
temporal database on top of Cassandra instead of MySql, Postgresql, Sql
Server, Oracle, Sybase or DB2. If you want to talk about specifics of
exactly how EAV approach doesn't scale, email me directly since it's
probably off topic woolfel AT gmail DOT com





On Mon, Oct 6, 2014 at 11:56 PM, Todd Fast  wrote:

> There is a team at my work building a entity-attribute-value (EAV) store
> using Cassandra. There is a column family, called Entity, where the
> partition key is the UUID of the entity, and the columns are the attributes
> names with their values. Each entity will contain hundreds to thousands of
> attributes, out of a list of up to potentially ten thousand known attribute
> names.
>
> However, instead of using wide rows with dynamic columns (and serializing
> type info with the value), they are trying to use a static column family
> and modifying the schema dynamically as new named attributes are created.
>
> (I believe one of the main drivers of this approach is to use collection
> columns for certain attributes, and perhaps to preserve type metadata for a
> given attribute.)
>
> This approach goes against everything I've seen and done in Cassandra, and
> is generally an anti-pattern for most persistence stores, but I want to
> gather feedback before taking the next step with the team.
>
> Do others consider this approach an anti-pattern, and if so, what are the
> practical downsides?
>
> For one, this means that the Entity schema would contain the superset of
> all columns for all rows. What is the impact of having thousands of columns
> names in the schema? And what are the implications of modifying the schema
> dynamically on a decent sized cluster (5 nodes now, growing to 10s later)
> under load?
>
> Thanks,
> Todd
>


Re: Bitmaps

2014-10-07 Thread Eduardo Cusa
The bitmap updates will be daily.

I'll watch the video..

Regards
Eduardo






On Mon, Oct 6, 2014 at 6:04 PM, DuyHai Doan  wrote:

> Yes this one, not Ooyala sorry. Very inventive usage of C* indeed. Thanks
> for the links
>
> On Mon, Oct 6, 2014 at 11:01 PM, Peter Sanford 
> wrote:
>
>> On Mon, Oct 6, 2014 at 1:56 PM, DuyHai Doan  wrote:
>>
>>> Isn't there a video of Ooyala at some past Cassandra Summit
>>> demonstrating usage of Cassandra for text search using Trigram ? AFAIK they
>>> were storing kind of bitmap to perform OR & AND operations on trigram
>>>
>>
>> That sounds like the talk Matt Stump gave at the 2013 SF Summit.
>>
>> Video:  https://www.youtube.com/watch?v=E92u4FXGiAM
>> Slides: http://www.slideshare.net/planetcassandra/1-matt-stump
>>
>
>


Re: Multi-DC Repairs and Token Questions

2014-10-07 Thread Paulo Ricardo Motta Gomes
This related issue might be of interest:
https://issues.apache.org/jira/browse/CASSANDRA-7450

In 1.2 "-pr" option does make cross DC repairs, but you must ensure that
all nodes from all datacenter execute repair, otherwise some ranges will be
missing. This fix enables -pr and -local together, which was disabled in
2.0 because it didn't work (it also does not work in 1.2).

On Tue, Oct 7, 2014 at 5:46 AM, Alain RODRIGUEZ  wrote:

> Hi guys, sorry about digging this up, but, is this bug also affecting
> 1.2.x versions ? I can't see this being backported to 1.2 on the Jira. Was
> this bug introduced in 2.0 ?
>
> Anyway, how does nodetool repair -pr behave on a multi DC env, does it
> make cross DC repairs or not ? Should we remove the "pr" option in a multi
> DC context to remove entropy between DCs ? I mean a repair -pr is supposed
> to repair the primary range for the current node, does it also repair
> corresponding primary range in other DCs ?
>
> Thanks for insight around this.
>
> 2014-06-03 8:06 GMT+02:00 Nick Bailey :
>
>> See https://issues.apache.org/jira/browse/CASSANDRA-7317
>>
>>
>> On Mon, Jun 2, 2014 at 8:57 PM, Matthew Allen 
>> wrote:
>>
>>> Hi Rameez, Chovatia, (sorry I initially replied to Dwight individually)
>>>
>>> SN_KEYSPACE and MY_KEYSPACE are just typos (was try to mask out
>>> identifiable information), they are same keyspace.
>>>
>>> Keyspace: SN_KEYSPACE:
>>>   Replication Strategy:
>>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>>   Durable Writes: true
>>> Options: [DC_VIC:2, DC_NSW:2]
>>>
>>> In a nutshell, replication is working as expected, I'm just confused
>>> about token range assignments in a Multi-DC environment and how repairs
>>> should work
>>>
>>> From
>>> http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configGenTokens_c.html,
>>> it specifies
>>>
>>> *"Multiple data center deployments: calculate the tokens for
>>> each data center so that the hash range is evenly divided for the nodes in
>>> each data center"*
>>>
>>> Given that nodetool -repair isn't multi-dc aware, in our production 18
>>> node cluster (9 nodes in each DC), which of the following token ranges
>>> should be used (Murmur3 Partitioner) ?
>>>
>>> Token range divided evenly over the 2 DC's/18 nodes as below ?
>>>
>>> Node DC_NSWDC_VIC
>>> 1'-9223372036854775808''-8198552921648689608'
>>> 2'-7173733806442603408''-6148914691236517208'
>>> 3'-5124095576030431008''-4099276460824344808'
>>> 4'-3074457345618258608''-2049638230412172408'
>>> 5'-1024819115206086208''-8'
>>> 6'1024819115206086192' '2049638230412172392'
>>> 7'3074457345618258592' '4099276460824344792'
>>> 8'5124095576030430992' '6148914691236517192'
>>> 9'7173733806442603392' '8198552921648689592'
>>>
>>> Or An offset used for DC_VIC (i.e. DC_NSW + 100) ?
>>>
>>> Node DC_NSW DC_VIC
>>> 1 '-9223372036854775808''-9223372036854775708'
>>> 2 '-7173733806442603407''-7173733806442603307'
>>> 3 '-5124095576030431006''-5124095576030430906'
>>> 4 '-3074457345618258605''-3074457345618258505'
>>> 5 '-1024819115206086204''-1024819115206086104'
>>> 6 '1024819115206086197' '1024819115206086297'
>>> 7 '3074457345618258598' '3074457345618258698'
>>> 8 '5124095576030430999' '5124095576030431099'
>>> 9 '7173733806442603400' '7173733806442603500'
>>>
>>> It's too late for me to switch to vnodes, hope that makes sense, thanks
>>>
>>> Matt
>>>
>>>
>>>
>>> On Thu, May 29, 2014 at 12:01 AM, Rameez Thonnakkal 
>>> wrote:
>>>
 as Chovatia mentioned, the keyspaces seems to be different.
 try "Describe keyspace SN_KEYSPACE" and "describe keyspace MY_KEYSPACE"
 from CQL.
 This will give you an idea about how many replicas are there for these
 keyspaces.



 On Wed, May 28, 2014 at 11:49 AM, chovatia jaydeep <
 chovatia_jayd...@yahoo.co.in> wrote:

> What is your partition type? Is
> it org.apache.cassandra.dht.Murmur3Partitioner?
> In your repair command i do see there are two different KeySpaces 
> "MY_KEYSPACE"
> and "SN_KEYSPACE", are these two separate key spaces or typo?
>
> -jaydeep
>
>
>   On Tuesday, 27 May 2014 10:26 PM, Matthew Allen <
> matthew.j.al...@gmail.com> wrote:
>
>
> Hi,
>
> Am a bit confused regarding data ownership in a multi-dc environment.
>
> I have the following setup in a test cluster with a keyspace with
> (placement_strategy = 'NetworkTopologyStrategy' and strategy_options =
> {'DC_NSW':2,'DC_VIC':2};)
>
> Datacenter: DC_NSW
> ==
> Replicas: 2
> Address RackStatus State   Load
> OwnsToken
>
> 0
> nsw1  rack1   Up Normal  1007.43 MB  100.00%
> -9223372036854775808
> nsw2  rack1   Up Normal 

Re: A fix for those who suffer from GC storm by tombstones

2014-10-07 Thread Robert Coli
On Tue, Oct 7, 2014 at 1:57 AM, DuyHai Doan  wrote:

>  Read Repair belongs to the Anti-Entropy procedures to ensure that
> eventually, data from all replicas do converge. Tombstones are data
> (deletion marker) so they need to be exchanged between replicas. By
> skipping tombstone you prevent the data convergence with regard to
> deletion.
>

Read repair is an optimization. I would probably just disable it in OP's
case and rely entirely on AES repair, because the 8303 approach makes read
repair not actually repair in some cases...

=Rob


Re: Doubts with the values of the parameter broadcast_rpc_address

2014-10-07 Thread Tyler Hobbs
The broadcast_rpc_address should be an IP address that drivers/clients can
connect to.  This is what will show up in the system.peers table under
"rpc_address".  In most cases it should match the value of
broadcast_address (or listen_address, if broadcast_address isn't set).

On Tue, Oct 7, 2014 at 6:04 AM, Ricard Mestre Subirats <
ricard.mestre.subir...@everis.com> wrote:

>  Hi everyone,
>
>
>
> We were working with Cassandra clusters in 2.0 version and now we want to
> work with clusters in 2.1 version. We configure the Cassandra.yaml as we
> configured it in the previous version, but at the moment of start the
> service there is a fatal error. The log tells us that if you configure to
> 0.0.0.0 rpc_address, the broadcast_rpc_address has to be set too. But we
> don’t know possible values for this parameter.
>
>
>
> Can anyone explain us the functionality of this new parameter and a
> possible value?
>
>
>
> Thank you very much!
>
>
>
> Ricard
>
> --
>
> AVISO DE CONFIDENCIALIDAD.
> Este correo y la información contenida o adjunta al mismo es privada y
> confidencial y va dirigida exclusivamente a su destinatario. everis informa
> a quien pueda haber recibido este correo por error que contiene información
> confidencial cuyo uso, copia, reproducción o distribución está expresamente
> prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por
> error, le rogamos lo ponga en conocimiento del emisor y proceda a su
> eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo.
>
> CONFIDENTIALITY WARNING.
> This message and the information contained in or attached to it are
> private and confidential and intended exclusively for the addressee. everis
> informs to whom it may receive it in error that it contains privileged
> information and its use, copy, reproduction or distribution is prohibited.
> If you are not an intended recipient of this E-mail, please notify the
> sender, delete it and do not read, act upon, print, disclose, copy, retain
> or redistribute any portion of this E-mail.
>



-- 
Tyler Hobbs
DataStax 


Re: IN versus multiple asynchronous queries

2014-10-07 Thread Tyler Hobbs
Also note that with an IN clause, if there is a failure fetching one of the
partitions, the entire request will fail and will need to be retried.  If
you use concurrent async queries, you'll only need to retry one small
request.

On Mon, Oct 6, 2014 at 1:14 PM, DuyHai Doan  wrote:

> "Definitely better to not make the coordinator hold on to that memory
> while it waits for other requests to come back" --> You get it. When
> loading big documents, you risk starving the heap quickly, triggering long
> GC cycle on the coordinator etc...
>
> On Mon, Oct 6, 2014 at 6:22 PM, Robert Wille  wrote:
>
>>  As far as latency is concerned, it seems like it wouldn't matter very
>> much if the coordinator has to wait for all the responses to come back, or
>> the client waits for all the responses to come back. I’ve got the same
>> latency either way.
>>
>>  I would assume that 50 coordinations is more expensive than one
>> coordination that does 50 times the work, but that’s probably insignificant
>> when compared to the actual fetching of the data from the SSTables.
>>
>>  I do see the point about putting stress on coordinator memory. In
>> general, the documents will be very small, but there will occasionally be
>> some rather large ones, potentially several megabytes in size. Definitely
>> better to not make the coordinator hold on to that memory while it waits
>> for other requests to come back.
>>
>>  Robert
>>
>>  On Oct 4, 2014, at 8:34 AM, DuyHai Doan  wrote:
>>
>>  Definitely 50 concurrent queries, possibly in async mode.
>>
>>  If you're using the IN clause with 50 values, the coordinator will
>> block, waiting for 50 partitions to be fetched from different nodes (worst
>> case = 50 nodes) before responding to client. In addition to the very  high
>> latency, you'll put the stress on the coordinator memory.
>>
>>
>>
>> On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille  wrote:
>>
>>> I have a table of small documents (less than 1K) that are often accessed
>>> together as a group. The group size is always less than 50. Which produces
>>> less load on the server, one query using an IN clause to get all 50 back
>>> together, or 50 concurrent queries? Which one is fastest?
>>>
>>> Thanks
>>>
>>> Robert
>>>
>>>
>>
>>
>


-- 
Tyler Hobbs
DataStax 


MIssing data in range query

2014-10-07 Thread Owen Kim
Hello,

I'm running Cassandra 1.2.16 with supercolumns and Hector.

create column family CFName

  with column_type = 'Super'

  and comparator = 'UTF8Type'

  and subcomparator = 'UTF8Type'

  and default_validation_class = 'UTF8Type'

  and key_validation_class = 'UTF8Type'

  and read_repair_chance = 0.2

  and dclocal_read_repair_chance = 0.0

  and populate_io_cache_on_flush = false

  and gc_grace = 43200

  and min_compaction_threshold = 4

  and max_compaction_threshold = 32

  and replicate_on_write = true

  and compaction_strategy =
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'

  and caching = 'KEYS_ONLY';


I'm doing a adding a time series supercolumn then doing a slice query over
this super column. I'm really just trying to see if any data is in the time
slice so I'm doing a slice query with limit 1. The insert isn't at the data
bounds.

However, sometimes, nothing shows up in the time slice, even 8 seconds
after the insert. I'm doing quorum reads and writes so I'd expect
consistent results but the slice query comes up empty, even if there have
been multiple inserts.

I'm not sure what's happening here and trying to narrow down suspects. Can
key caching produce stale results? Do slice queries have different
consistency guarantees?


Re: MIssing data in range query

2014-10-07 Thread Robert Coli
On Tue, Oct 7, 2014 at 1:38 PM, Owen Kim  wrote:

> I'm running Cassandra 1.2.16 with supercolumns and Hector.
>

Slightly non-responsive response :

In general supercolumn use is not recommended. It makes it more difficult
to get support when one uses a feature no one else uses.

=Rob


Re: MIssing data in range query

2014-10-07 Thread Owen Kim
I'm aware. I've had the system up since pre-composite columns and haven't
had the cycles to do a major data and schema migration.

And that's not "slightly" non-responsive.

On Tue, Oct 7, 2014 at 1:49 PM, Robert Coli  wrote:

> On Tue, Oct 7, 2014 at 1:38 PM, Owen Kim  wrote:
>
>> I'm running Cassandra 1.2.16 with supercolumns and Hector.
>>
>
> Slightly non-responsive response :
>
> In general supercolumn use is not recommended. It makes it more difficult
> to get support when one uses a feature no one else uses.
>
> =Rob
>
>


Re: MIssing data in range query

2014-10-07 Thread Robert Coli
On Tue, Oct 7, 2014 at 2:03 PM, Owen Kim  wrote:

> I'm aware. I've had the system up since pre-composite columns and haven't
> had the cycles to do a major data and schema migration.
>
> And that's not "slightly" non-responsive.
>

"There may be unknown bugs in the code you're using, especially because no
one else uses it" is in fact slightly responsive. While I'm sure it does
grate to be told that one should not be using a feature one cannot choose
to not-use, I consider "don't use them" responsive to every question about
supercolumns since 2010, unless the asker pre-emptively states they know
this fact. I assure you that my meta-response is infinitely more responsive
than the total non-response you were otherwise likely to receive...

... aaanyway ...

Probably you are just hitting an edge case in the 1.2 era rewrite of
supercolumns which no one else has ever encountered because no one uses
them. For the record, I do not believe either of your hypotheses (key cache
or slice queries having different guarantees) are likely to be implicated.
One of them is trivial to test : create a test CF with the key cache
disabled and try to repro there.

Instead of attempting to debug by yourself, or on the user list (which will
be full of people not-using supercolumns) I suggest filing an JIRA with
reproduction steps, and then mentioning the URL on this thread for future
googlers.

=Rob


Re: MIssing data in range query

2014-10-07 Thread Owen Kim
Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement of
that. Though, I didn't intend for the question to be "about" supercolumns.

It is possible I'm hitting an odd edge case though I'm having trouble
reproducing the issue in a controlled environment since there seems to be a
timing element to it, or at least it's not consistently happening. I
haven't been able to reproduce it on a single node test cluster. I'm moving
on to test a larger one now.

On Tue, Oct 7, 2014 at 2:39 PM, Robert Coli  wrote:

> On Tue, Oct 7, 2014 at 2:03 PM, Owen Kim  wrote:
>
>> I'm aware. I've had the system up since pre-composite columns and haven't
>> had the cycles to do a major data and schema migration.
>>
>> And that's not "slightly" non-responsive.
>>
>
> "There may be unknown bugs in the code you're using, especially because no
> one else uses it" is in fact slightly responsive. While I'm sure it does
> grate to be told that one should not be using a feature one cannot choose
> to not-use, I consider "don't use them" responsive to every question about
> supercolumns since 2010, unless the asker pre-emptively states they know
> this fact. I assure you that my meta-response is infinitely more responsive
> than the total non-response you were otherwise likely to receive...
>
> ... aaanyway ...
>
> Probably you are just hitting an edge case in the 1.2 era rewrite of
> supercolumns which no one else has ever encountered because no one uses
> them. For the record, I do not believe either of your hypotheses (key cache
> or slice queries having different guarantees) are likely to be implicated.
> One of them is trivial to test : create a test CF with the key cache
> disabled and try to repro there.
>
> Instead of attempting to debug by yourself, or on the user list (which
> will be full of people not-using supercolumns) I suggest filing an JIRA
> with reproduction steps, and then mentioning the URL on this thread for
> future googlers.
>
> =Rob
>
>
>


Re: A fix for those who suffer from GC storm by tombstones

2014-10-07 Thread Takenori Sato
DuyHi and Rob, Thanks for your feedbacks.

Yeah, that's exactly the point I found. Some may want to run read repair even 
on tombstones as before, but others not like Rob and us.

Personally, I take read repaid as a nice to have feature, especially for 
tombstones, where a regular repair is anyway enforced.

So with this fix, I expect that a user can choose a better, manageable risk as 
needed. The good news is, the improvement for performance is significant!

- Takenori

iPhoneから送信

2014/10/08 3:18、Robert Coli  のメッセージ:

> 
>> On Tue, Oct 7, 2014 at 1:57 AM, DuyHai Doan  wrote:
>>  Read Repair belongs to the Anti-Entropy procedures to ensure that 
>> eventually, data from all replicas do converge. Tombstones are data 
>> (deletion marker) so they need to be exchanged between replicas. By skipping 
>> tombstone you prevent the data convergence with regard to deletion. 
> 
> Read repair is an optimization. I would probably just disable it in OP's case 
> and rely entirely on AES repair, because the 8303 approach makes read repair 
> not actually repair in some cases...
> 
> =Rob
>  


Re: MIssing data in range query

2014-10-07 Thread Robert Coli
On Tue, Oct 7, 2014 at 3:11 PM, Owen Kim  wrote:

> Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement
> of that. Though, I didn't intend for the question to be "about"
> supercolumns.
>

(Yep, understand tho that if you hadn't been told that advice before, it
would grate a lot less. I will try to remember that "Owen Kim" has received
this piece of info, and will do my best to not repeat it to you... :D)


> It is possible I'm hitting an odd edge case though I'm having trouble
> reproducing the issue in a controlled environment since there seems to be a
> timing element to it, or at least it's not consistently happening. I
> haven't been able to reproduce it on a single node test cluster. I'm moving
> on to test a larger one now.
>

Right, my hypothesis is that there is something within the supercolumn
write path which differs from the non-supercolumn write path. In theory
this should be less possible since the 1.2 era supercolumn rewrite.

To be clear, are you reading back via PK? No secondary indexes involved,
right? The only bells your symptoms are ringing are secondary index bugs...

=Rob