Token function in CQL for composite partition key

2015-01-07 Thread Ajay
Hi,

I have a column family as below:

(Wide row design)
CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);

Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
2015-01-07 14, how do I use the token function in the CQL.

Thanks
Ajay


Re: deletedAt and localDeletion

2015-01-07 Thread Kais Ahmed
Thanks Ryan

2015-01-06 20:21 GMT+01:00 Ryan Svihla :

> If you look at the source there are some useful comments regarding those
> specifics
> https://github.com/apache/cassandra/blob/8d8fed52242c34b477d0384ba1d1ce3978efbbe8/src/java/org/apache/cassandra/db/DeletionTime.java
>
>
> /** * A timestamp (typically in microseconds since the unix epoch,
> although this is not enforced) after which * data should be considered
> deleted. If set to Long.MIN_VALUE, this implies that the data has not been
> marked * for deletion at all. */ public final long markedForDeleteAt; /**
> * The local server timestamp, in seconds since the unix epoch, at which
> this tombstone was created. This is * only used for purposes of purging
> the tombstone after gc_grace_seconds have elapsed. */ public final int
> localDeletionTime;
>
> On Mon, Jan 5, 2015 at 6:13 AM, Kais Ahmed  wrote:
>
>> Hi all,
>>
>> Can anyone explain what mine deletedAt and localDeletion in
>> SliceQueryFilter log.
>>
>> SliceQueryFilter.java (line 225) Read 6 live and 2688 tombstoned cells in
>> ks.mytable (see tombstone_warn_threshold). 10 columns was requested,
>> slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=
>> 2147483647}
>>
>> Thanks,
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


Re: Token function in CQL for composite partition key

2015-01-07 Thread Sylvain Lebresne
On Wed, Jan 7, 2015 at 10:18 AM, Ajay  wrote:

> Hi,
>
> I have a column family as below:
>
> (Wide row design)
> CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
> KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);
>
> Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
> 2015-01-07 14, how do I use the token function in the CQL.
>

>From that description, it doesn't appear to me that you need the token
function. Just do 3 queries for each hour, each queries being something
along the lines of
  SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ...

For completness sake, I should note that you could do that with a single
query by using an IN on the hour column, but it's actually not a better
solution (provided you submit the 3 queries in an asynchronous fashion at
least) in that case because of reason explained here:
https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7
.

--
Sylvain

>


Re: Token function in CQL for composite partition key

2015-01-07 Thread Ajay
Thanks.

Basically there are two access patterns:
1) For last 1 hour (or more if last batch failed for some reason), get the
clicks data for all Ads. But it seems not possible as Ad Id is part of
Partition key.
2) For last 1 hour (or more if last batch failed for some reason),  get the
clicks data for a specific Ad Id(one or more may be).

How do we support 1 and 2 with a same data model? (I thought to use Ad ID +
Hour data as Partition key to avoid hotspots)

Thanks
Ajay


On Wed, Jan 7, 2015 at 6:34 PM, Sylvain Lebresne 
wrote:

> On Wed, Jan 7, 2015 at 10:18 AM, Ajay  wrote:
>
>> Hi,
>>
>> I have a column family as below:
>>
>> (Wide row design)
>> CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY
>> KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC);
>>
>> Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to
>> 2015-01-07 14, how do I use the token function in the CQL.
>>
>
> From that description, it doesn't appear to me that you need the token
> function. Just do 3 queries for each hour, each queries being something
> along the lines of
>   SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ...
>
> For completness sake, I should note that you could do that with a single
> query by using an IN on the hour column, but it's actually not a better
> solution (provided you submit the 3 queries in an asynchronous fashion at
> least) in that case because of reason explained here:
> https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7
> .
>
> --
> Sylvain
>
>>
>


Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Asit KAUSHIK
HI All,

We are trying to integrate elasticsearch with Cassandra and as the river
plugin uses select * from any table it seems to be bad performance choice.
So i was thinking of inserting into elasticsearch using Cassandra trigger.
So i wanted your view does a Cassandra Trigger impacts the performance of
read/Write of Cassandra.

Also any other way you guys achieve this please guide me. I am struck on
this .

Regards
Asit


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread DuyHai Doan
Be very very careful not to perform blocking calls to ElasticSearch in your
trigger otherwise you will kill C* performance. The biggest danger of the
triggers in their current state is that they are on the write path.

In your trigger, you can try to push the mutation asynchronously to ES but
in this case it will mean managing a thread pool and all related issues.

Not even mentioning atomicity issues like: what happen if the update to ES
fails  or the connection times out ? etc ...

As an alternative, instead of implementing yourself the integration with
ES, you can have a look at Datastax Enterprise integration of Cassandra
with Apache Solr (not free) or some open-source alternatives like Stratio
or TupleJump fork of Cassandra with Lucene integration.

On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK 
wrote:

> HI All,
>
> We are trying to integrate elasticsearch with Cassandra and as the river
> plugin uses select * from any table it seems to be bad performance choice.
> So i was thinking of inserting into elasticsearch using Cassandra trigger.
> So i wanted your view does a Cassandra Trigger impacts the performance of
> read/Write of Cassandra.
>
> Also any other way you guys achieve this please guide me. I am struck on
> this .
>
> Regards
> Asit
>
>


Re: Re: Is it possible to implement a interface to replace a row in cassandra using cassandra.thrift?

2015-01-07 Thread Ryan Svihla
really depends on your code for error handling, and since you're using
thrift it really depends on the client, if you're doing client side
timestamps then it's not related to time issues.

On Tue, Jan 6, 2015 at 8:19 PM,  wrote:

> Hi,
>
> I found that in my function, both delete and update  use the client side
> timestamp.
>
> The update timestamp should  be always bigger than the deletion timestamp.
>
>
> I wonder why the update failed in some cases?
>
>
> thank you.
>
>
> - 原始邮件 -
> 发件人:Ryan Svihla 
> 收件人:user@cassandra.apache.org, yhq...@sina.com
> 主题:Re: Is it possible to implement a interface to replace a row in
> cassandra using cassandra.thrift?
> 日期:2015年01月06日 23点34分
>
> replies inline
>
> On Tue, Jan 6, 2015 at 2:28 AM,  wrote:
>
> Hi, all:
>
> I use cassandra.thrift to implement a replace row interface in this
> way:
>
> First use batch_mutate to delete that row, then use batch_mutate to
> insert a new row.
>
> I always find that after call this interface, the row is not exist.
>
>
> Then I doubt that it is the problem caused by the deletion, because
> the deleteion has a timestamp set by the client.
>
> Maybe the time is not so sync between the client and cassandra server
> (1 or more seconds diff).
>
>
> It's a distributed database so time synchronization really really matters
> so use NTP, however if you're using client side timestamps on both the
> insert and the delete it's not going to matter for that use case
>
>
>
> How to solve this?? Is it possible to implement a  interface to
> replace a row in cassandra.???\
>
>
> yeah all updates are this way. Inserts are actually "UPSERTS" and you can
> go ahead and do two updates instead of insert, delete, update.
>
>
>
> Thanks.
>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


-- 

Thanks,
Ryan Svihla


TombstoneOverwhelmingException for few tombstones

2015-01-07 Thread Jens Rantil
Hi,


I have a single partition key that been nagging me because I am receiving 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException. After filing 
https://issues.apache.org/jira/browse/CASSANDRA-8561 I managed to find the 
partition key in question and which machine it was located on (by looking in 
system.log). Since I wanted to see how many tombstones the partition key 
actually had I did:


    nodetool flush mykeyspace mytable


to make sure all changes were written to sstables (not sure this was 
necessary), then


    nodetool getsstables mykeyspace mytable PARTITIONKEY


which listed two sstables. I then had a look at both sstables for my key in 
question using


    sstable2json MYSSTABLE1 -k PARTITIONKEY | jq . > MYSSTABLE1.json
    sstable2json MYSSTABLE2 -k PARTITIONKEY | jq . > MYSSTABLE2.json



(piping through jq to format the json). Both JSON files contains data (so I 
have selected the right key). Only one of the files contains any tombstones


$ cat MYSSTABLE1.json | grep '"t"'|wc -l
    4281
$ cat MYSSTABLE2.json | grep '"t"'|wc -l
       0



But to my surprise, the number of tombstones are nowhere near


tombstone_failure_threshold: 10


Can anyone explain why Cassandra is overwhelmed when I’m nowhere near the hard 
limit?


Thanks,
Jens


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Ken Hancock
When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
same problem that you highlight, no different than your good idea of
asynchronously pushing to ES.

Each Cassandra write was indexed independently by each server in the
replication group.  If a node timed out or a mutation was dropped, that
Solr node would have an out-of-sync index.  Doing a solr query such as
count(*) users could return inconsistent results depending on which node
you hit since solr didn't support Cassandra consistency levels.

I haven't seen any blog posts or docs as to whether this intrinsic mismatch
between how Cassandra handles eventual consistency and Solr has ever been
resolved.

Ken


On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan  wrote:

> Be very very careful not to perform blocking calls to ElasticSearch in
> your trigger otherwise you will kill C* performance. The biggest danger of
> the triggers in their current state is that they are on the write path.
>
> In your trigger, you can try to push the mutation asynchronously to ES but
> in this case it will mean managing a thread pool and all related issues.
>
> Not even mentioning atomicity issues like: what happen if the update to ES
> fails  or the connection times out ? etc ...
>
> As an alternative, instead of implementing yourself the integration with
> ES, you can have a look at Datastax Enterprise integration of Cassandra
> with Apache Solr (not free) or some open-source alternatives like Stratio
> or TupleJump fork of Cassandra with Lucene integration.
>
> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK 
> wrote:
>
>> HI All,
>>
>> We are trying to integrate elasticsearch with Cassandra and as the river
>> plugin uses select * from any table it seems to be bad performance choice.
>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>> So i wanted your view does a Cassandra Trigger impacts the performance of
>> read/Write of Cassandra.
>>
>> Also any other way you guys achieve this please guide me. I am struck on
>> this .
>>
>> Regards
>> Asit
>>
>>
>


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Asit KAUSHIK
HI All,

What i intend to do is on every write i would push the code to
elasticsearch using the Trigger. I know it would impact the Cassandra write
but  given that the WRITE is pretty performant on Cassandra would that lag
be a big one.

Also as per my information SOLR  has  limitation of using Nested JSON
documents  which is elasticsearch does seamlessly and hence it was our
preference.

Please Let me know about you thought on this as we are struck on this and i
am looking into Streaming Part of cassandra in hope that i can find
something

Regards
Asit



On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock  wrote:

> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
> same problem that you highlight, no different than your good idea of
> asynchronously pushing to ES.
>
> Each Cassandra write was indexed independently by each server in the
> replication group.  If a node timed out or a mutation was dropped, that
> Solr node would have an out-of-sync index.  Doing a solr query such as
> count(*) users could return inconsistent results depending on which node
> you hit since solr didn't support Cassandra consistency levels.
>
> I haven't seen any blog posts or docs as to whether this intrinsic
> mismatch between how Cassandra handles eventual consistency and Solr has
> ever been resolved.
>
> Ken
>
>
> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan  wrote:
>
>> Be very very careful not to perform blocking calls to ElasticSearch in
>> your trigger otherwise you will kill C* performance. The biggest danger of
>> the triggers in their current state is that they are on the write path.
>>
>> In your trigger, you can try to push the mutation asynchronously to ES
>> but in this case it will mean managing a thread pool and all related issues.
>>
>> Not even mentioning atomicity issues like: what happen if the update to
>> ES fails  or the connection times out ? etc ...
>>
>> As an alternative, instead of implementing yourself the integration with
>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>> with Apache Solr (not free) or some open-source alternatives like Stratio
>> or TupleJump fork of Cassandra with Lucene integration.
>>
>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK 
>> wrote:
>>
>>> HI All,
>>>
>>> We are trying to integrate elasticsearch with Cassandra and as the river
>>> plugin uses select * from any table it seems to be bad performance choice.
>>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>>> So i wanted your view does a Cassandra Trigger impacts the performance
>>> of read/Write of Cassandra.
>>>
>>> Also any other way you guys achieve this please guide me. I am struck on
>>> this .
>>>
>>> Regards
>>> Asit
>>>
>>>
>>
>
>
>
>


Re:

2015-01-07 Thread Ryan Svihla
Something to start considering is the partition key (first part of your
primary key) drives your model more than anything. So if you're querying
for all of X your partition key should probably be X, but there are some
constraints to be mindful of.

The rest of replies inline

On Wed, Jan 7, 2015 at 1:37 AM, Nagesh  wrote:

> Thanks Ryan, Srinivas for you answer.
>
> Finally I have decided to create three column families
>
> 1. product_date_id (mm, dd, prodid) PRIMARY KEY ((mm), dd, prodid)
> - Record the arrival date on updates of a product
> - Get list of products that are recently added/updated Ex: [(mm, dd) >
> (2014, 06)]
>

Could just be product_date and include the entire product graph needed,
this is a tradeoff, and frequently it's optimal for performance reasons on
the read side, the downside is your usually increasing your write payload.
My thought is do a fully materialized view first and denormalize, and
include the entire product, and if you find the write traffic is too much
consider the index approach here then (which is easier after the fact to
just drop the columns).


> 1. product_status(prodid int, status int) PRIMARY KEY (prodid), INDEX on
> (status)
> - Each time I add a product just insert a record (prodid, defstatus) with
> the condition IF NOT EXISTS, to avoid status being updated, Here I couldnt
> avoid read before write to protect product status
>
> As for protecting product status that's fine, however, you could just do
what most applications do and update regardless of previous status. This
leads into different locking theories and what the right behavior for an
application is, but this is something most people never think twice about
when using MySQL or Oracle, and in the end they update status in
unprotected ways. Something to ponder.

- Update Enable/Disable prodid
> - Get list of product ids with the give status
>
>

List of product ids with a given status query will probably suck using a
2i, think scanning ALL of the nodes to get potentially as little as 2
records (if that fits within SLA however, kudos, just be aware of the
behavior).

Assuming you have large status counts and limited status items, the data
model gets trickier, as there are some rule of thumb style constraints
(varies on hardware and SLA what you can tolerate). Say you had a primary
key of (status, prodid), this would in theory very quickly return all of
the ACTIVE prodids as there may only be a few hundred, but lets say you
want to return all the archived prodids there maybe billions and this would
likely take far far too long to return in one query, not to mention
compaction of such a large partition will be fun, and it'll unbalance your
cluster.

So frequently for this particular corner I end up having to do some form of
sharding to spread status over the cluster and keep sizes of the partition
reasonable (and query in an async fashion to get all of the queries in a
reasonable time).

primary key((status, shardId), prodId)

The shardid can be any up to the reasonable size limits of your hardware
and cluster (say 50k for rule of thumb), and there are a number of
different approaches:

- it can be a random uuid but then you have to track with a separate table
what shardIds there are for that particular status (this is not uncommon)
- it can be a fixed size say 1 and you can just increment the number by
1 (but make sure as you're updating this you're not introducing any fun
state bugs that have to different shards writing to the same number). When
you query you keep increasing the number until you stop getting responses.
This has the downside in that optimization is a bit hard to get right.
Optionally you can have a static column in the table called maxShardId that
once you've done your first query you know how many parallel queries you
have to send out.
- It can be based on some business logic or domain rule that includes some
fixed boundaries, say add a productGroupId in there, and you know from an
application level, how many productGroupIds there are. This has the
downside of not giving you absolute protection against fat partitions, on
the upside it fits your natural domain model and is easier to reason about.



> 2. product_details(prodgrp, prodid, . )
> PRIMARY KEY (prodgrp, prodid)
> - Insert product details in the prodgrp blindly to store recent updates of
> the product details
> - Get list of products in the product group
> - Get details of products for the give ids
>
> "get list of products for a given range of ids" : My queries are answered
> with the above design.
>
> PS: I am still thinking to avoid read before write on product_status. And
> would like to see if there is better way to design using supercolumn
> families or materialized views which I am yet to explore.
>
>
Materialized views are your friend, use them freely but as always being
mindful of real world constraints and goals.


> Regards,
> Nageswara Rao
>
> On Tue, Jan 6, 2015 at 10:53 PM, Ryan Svihla  wrote:
>

Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Ryan Svihla
@Ken So I actually support a lot of the DSE Search users and teach classes
on it, so as long as you're not dropping mutations you're in sync, and if
you're dropping mutations you're probably sized way too small anyway, and
once you run repair (which you should be doing anyway when dropping
mutations) you're back in sync. I actually think because of that the models
work well together.

FWIW the improvement since 3.0 is MASSIVE (it's been what I'd call stable
since 3.2.x and we're on 4.6 now)

@Asit to answer the ES question, it's not really for me to say at all what
the lag will be or to help in advising sizing of ES, so that's probably
more of a question for them.


On Wed, Jan 7, 2015 at 8:56 AM, Asit KAUSHIK 
wrote:

> HI All,
>
> What i intend to do is on every write i would push the code to
> elasticsearch using the Trigger. I know it would impact the Cassandra write
> but  given that the WRITE is pretty performant on Cassandra would that lag
> be a big one.
>
> Also as per my information SOLR  has  limitation of using Nested JSON
> documents  which is elasticsearch does seamlessly and hence it was our
> preference.
>
> Please Let me know about you thought on this as we are struck on this and
> i am looking into Streaming Part of cassandra in hope that i can find
> something
>
> Regards
> Asit
>
>
>
> On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock 
> wrote:
>
>> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
>> same problem that you highlight, no different than your good idea of
>> asynchronously pushing to ES.
>>
>> Each Cassandra write was indexed independently by each server in the
>> replication group.  If a node timed out or a mutation was dropped, that
>> Solr node would have an out-of-sync index.  Doing a solr query such as
>> count(*) users could return inconsistent results depending on which node
>> you hit since solr didn't support Cassandra consistency levels.
>>
>> I haven't seen any blog posts or docs as to whether this intrinsic
>> mismatch between how Cassandra handles eventual consistency and Solr has
>> ever been resolved.
>>
>> Ken
>>
>>
>> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan  wrote:
>>
>>> Be very very careful not to perform blocking calls to ElasticSearch in
>>> your trigger otherwise you will kill C* performance. The biggest danger of
>>> the triggers in their current state is that they are on the write path.
>>>
>>> In your trigger, you can try to push the mutation asynchronously to ES
>>> but in this case it will mean managing a thread pool and all related issues.
>>>
>>> Not even mentioning atomicity issues like: what happen if the update to
>>> ES fails  or the connection times out ? etc ...
>>>
>>> As an alternative, instead of implementing yourself the integration with
>>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>>> with Apache Solr (not free) or some open-source alternatives like Stratio
>>> or TupleJump fork of Cassandra with Lucene integration.
>>>
>>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> > wrote:
>>>
 HI All,

 We are trying to integrate elasticsearch with Cassandra and as the
 river plugin uses select * from any table it seems to be bad performance
 choice. So i was thinking of inserting into elasticsearch using Cassandra
 trigger.
 So i wanted your view does a Cassandra Trigger impacts the performance
 of read/Write of Cassandra.

 Also any other way you guys achieve this please guide me. I am struck
 on this .

 Regards
 Asit


>>>
>>
>>
>>
>>
>


-- 

Thanks,
Ryan Svihla


Keyspace uppercase name issues

2015-01-07 Thread Harel Gliksman
Hi,

We have a Cassandra cluster with Keyspaces that were created using the
thrift api and thei names contain upper case letters.
We are trying to use the new Datastax driver (version 2.1.4, maven's latest
) but encountering some problems due to upper case handling.

Datastax provide this guidance on how to handle lower-upper cases:
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html

However, there seems to be something confusing in the API.

Attached a small java code that reproduces the problem.

Many thanks,
Harel.


Test.java
Description: Binary data


Re: Keyspace uppercase name issues

2015-01-07 Thread Ajay
We noticed the same issue. From the cassandra-cli, it allows to use upper
case or mixed case Keyspace name but from cqlsh it auto converts to lower
case.

Thanks
Ajay

On Wed, Jan 7, 2015 at 9:44 PM, Harel Gliksman  wrote:

> Hi,
>
> We have a Cassandra cluster with Keyspaces that were created using the
> thrift api and thei names contain upper case letters.
> We are trying to use the new Datastax driver (version 2.1.4, maven's
> latest ) but encountering some problems due to upper case handling.
>
> Datastax provide this guidance on how to handle lower-upper cases:
>
> http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html
>
> However, there seems to be something confusing in the API.
>
> Attached a small java code that reproduces the problem.
>
> Many thanks,
> Harel.
>


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Robert Coli
On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK 
wrote:

> We are trying to integrate elasticsearch with Cassandra and as the river
> plugin uses select * from any table it seems to be bad performance choice.
> So i was thinking of inserting into elasticsearch using Cassandra trigger.
> So i wanted your view does a Cassandra Trigger impacts the performance of
> read/Write of Cassandra.
>

I would not use triggers in production in their current form.

=Rob


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Jonathan Haddad
+1.  Don't use triggers.

On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli  wrote:
> On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK 
> wrote:
>>
>> We are trying to integrate elasticsearch with Cassandra and as the river
>> plugin uses select * from any table it seems to be bad performance choice.
>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>> So i wanted your view does a Cassandra Trigger impacts the performance of
>> read/Write of Cassandra.
>
>
> I would not use triggers in production in their current form.
>
> =Rob



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Jack Krupansky
DSE does now have a queue to decouple Cassandra insert and Solr indexing.
It will block only when/if the queue is filled - you can configure the size
of the queue. So, to be clear, DSE no longer has the highlighted problem
mentioned for ES.

-- Jack Krupansky

On Wed, Jan 7, 2015 at 9:46 AM, Ken Hancock  wrote:

> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the
> same problem that you highlight, no different than your good idea of
> asynchronously pushing to ES.
>
> Each Cassandra write was indexed independently by each server in the
> replication group.  If a node timed out or a mutation was dropped, that
> Solr node would have an out-of-sync index.  Doing a solr query such as
> count(*) users could return inconsistent results depending on which node
> you hit since solr didn't support Cassandra consistency levels.
>
> I haven't seen any blog posts or docs as to whether this intrinsic
> mismatch between how Cassandra handles eventual consistency and Solr has
> ever been resolved.
>
> Ken
>
>
> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan  wrote:
>
>> Be very very careful not to perform blocking calls to ElasticSearch in
>> your trigger otherwise you will kill C* performance. The biggest danger of
>> the triggers in their current state is that they are on the write path.
>>
>> In your trigger, you can try to push the mutation asynchronously to ES
>> but in this case it will mean managing a thread pool and all related issues.
>>
>> Not even mentioning atomicity issues like: what happen if the update to
>> ES fails  or the connection times out ? etc ...
>>
>> As an alternative, instead of implementing yourself the integration with
>> ES, you can have a look at Datastax Enterprise integration of Cassandra
>> with Apache Solr (not free) or some open-source alternatives like Stratio
>> or TupleJump fork of Cassandra with Lucene integration.
>>
>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK 
>> wrote:
>>
>>> HI All,
>>>
>>> We are trying to integrate elasticsearch with Cassandra and as the river
>>> plugin uses select * from any table it seems to be bad performance choice.
>>> So i was thinking of inserting into elasticsearch using Cassandra trigger.
>>> So i wanted your view does a Cassandra Trigger impacts the performance
>>> of read/Write of Cassandra.
>>>
>>> Also any other way you guys achieve this please guide me. I am struck on
>>> this .
>>>
>>> Regards
>>> Asit
>>>
>>>
>>
>
>
>
>


Is it possible to delete columns or row using CQLSSTableWriter?

2015-01-07 Thread Benyi Wang
CQLSSTableWriter only accepts an INSERT or UPDATE statement. I'm wondering
whether make it accept DELETE statement.

I need to update my cassandra table with a lot of data everyday.

* I may need to delete a row (given the partition key)
* I may need to delete some columns. For example, there are 20 rows for a
primary key before loading, the new load may have 10 rows only.

Because CQLSSTableWriter will write into a blank table, will DELETE put a
tombstone in the table so that the row in the server will be deleted after
bulk loading?

Thanks.


How to bulkload into a specific data center?

2015-01-07 Thread Benyi Wang
I set up two virtual data centers, one for analytics and one for REST
service. The analytics data center sits top on Hadoop cluster. I want to
bulk load my ETL results into the analytics data center so that the REST
service won't have the heavy load. I'm using CQLTableInputFormat in my
Spark Application, and I gave the nodes in analytics data center as
Intialial address.

However, I found my jobs were connecting to the REST service data center.

How can I specify the data center?


Rebooted cassandra node timing out all requests but recovers after a while

2015-01-07 Thread Anand Somani
Hi,

We have a 3 node cluster (on VM). Eg. host1, host2, host3. One of the VM
rebooted (host1) and when host1 came up it would see the others as down and
the others (host2 and host3) see it as down. So we restarted host2 and now
the ring seems fine(everybody sees everybody as up).

But now the clients timeout talking to host1. Have not figured out what is
causing it. There is nothing in the logs that indicates a problem. Looking
for indicators/help on what debug/tracing to turn on to find out what could
be causing it.

Now this happens only when a VM reboots (not otherwise), also it seems to
have recovered itself after some hours!!( or restarts) not sure which one.

This is 1.2.15, we are using ssl and cassandra authorizers.

Thanks
Anand


Re: Rebooted cassandra node timing out all requests but recovers after a while

2015-01-07 Thread Duncan Sands

Hi Anand,

On 08/01/15 02:02, Anand Somani wrote:

Hi,

We have a 3 node cluster (on VM). Eg. host1, host2, host3. One of the VM
rebooted (host1) and when host1 came up it would see the others as down and the
others (host2 and host3) see it as down. So we restarted host2 and now the ring
seems fine(everybody sees everybody as up).

But now the clients timeout talking to host1. Have not figured out what is
causing it. There is nothing in the logs that indicates a problem. Looking for
indicators/help on what debug/tracing to turn on to find out what could be
causing it.

Now this happens only when a VM reboots (not otherwise), also it seems to have
recovered itself after some hours!!( or restarts) not sure which one.

This is 1.2.15, we are using ssl and cassandra authorizers.


perhaps time is not synchronized between the nodes to begin with, and eventually 
becomes synchronized.


Ciao, Duncan.


Re: Keyspace uppercase name issues

2015-01-07 Thread Harel Gliksman
Thanks Ajay for your reply,

My problem is not with the cqlsh interface, but with the java Datastax
driver.
It seems that for cqlsh, one needs to simply quote names that contain upper
cases.
With the driver, I experience inconsistent handling of upper case. Either I
am doing something wrong, or there's some minor bug.

On Wed, Jan 7, 2015 at 6:29 PM, Ajay  wrote:

> We noticed the same issue. From the cassandra-cli, it allows to use upper
> case or mixed case Keyspace name but from cqlsh it auto converts to lower
> case.
>
> Thanks
> Ajay
>
> On Wed, Jan 7, 2015 at 9:44 PM, Harel Gliksman 
> wrote:
>
>> Hi,
>>
>> We have a Cassandra cluster with Keyspaces that were created using the
>> thrift api and thei names contain upper case letters.
>> We are trying to use the new Datastax driver (version 2.1.4, maven's
>> latest ) but encountering some problems due to upper case handling.
>>
>> Datastax provide this guidance on how to handle lower-upper cases:
>>
>> http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html
>>
>> However, there seems to be something confusing in the API.
>>
>> Attached a small java code that reproduces the problem.
>>
>> Many thanks,
>> Harel.
>>
>
>