Token function in CQL for composite partition key
Hi, I have a column family as below: (Wide row design) CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC); Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to 2015-01-07 14, how do I use the token function in the CQL. Thanks Ajay
Re: deletedAt and localDeletion
Thanks Ryan 2015-01-06 20:21 GMT+01:00 Ryan Svihla : > If you look at the source there are some useful comments regarding those > specifics > https://github.com/apache/cassandra/blob/8d8fed52242c34b477d0384ba1d1ce3978efbbe8/src/java/org/apache/cassandra/db/DeletionTime.java > > > /** * A timestamp (typically in microseconds since the unix epoch, > although this is not enforced) after which * data should be considered > deleted. If set to Long.MIN_VALUE, this implies that the data has not been > marked * for deletion at all. */ public final long markedForDeleteAt; /** > * The local server timestamp, in seconds since the unix epoch, at which > this tombstone was created. This is * only used for purposes of purging > the tombstone after gc_grace_seconds have elapsed. */ public final int > localDeletionTime; > > On Mon, Jan 5, 2015 at 6:13 AM, Kais Ahmed wrote: > >> Hi all, >> >> Can anyone explain what mine deletedAt and localDeletion in >> SliceQueryFilter log. >> >> SliceQueryFilter.java (line 225) Read 6 live and 2688 tombstoned cells in >> ks.mytable (see tombstone_warn_threshold). 10 columns was requested, >> slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion= >> 2147483647} >> >> Thanks, >> > > > > -- > > Thanks, > Ryan Svihla > >
Re: Token function in CQL for composite partition key
On Wed, Jan 7, 2015 at 10:18 AM, Ajay wrote: > Hi, > > I have a column family as below: > > (Wide row design) > CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY > KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC); > > Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to > 2015-01-07 14, how do I use the token function in the CQL. > >From that description, it doesn't appear to me that you need the token function. Just do 3 queries for each hour, each queries being something along the lines of SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ... For completness sake, I should note that you could do that with a single query by using an IN on the hour column, but it's actually not a better solution (provided you submit the 3 queries in an asynchronous fashion at least) in that case because of reason explained here: https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7 . -- Sylvain >
Re: Token function in CQL for composite partition key
Thanks. Basically there are two access patterns: 1) For last 1 hour (or more if last batch failed for some reason), get the clicks data for all Ads. But it seems not possible as Ad Id is part of Partition key. 2) For last 1 hour (or more if last batch failed for some reason), get the clicks data for a specific Ad Id(one or more may be). How do we support 1 and 2 with a same data model? (I thought to use Ad ID + Hour data as Partition key to avoid hotspots) Thanks Ajay On Wed, Jan 7, 2015 at 6:34 PM, Sylvain Lebresne wrote: > On Wed, Jan 7, 2015 at 10:18 AM, Ajay wrote: > >> Hi, >> >> I have a column family as below: >> >> (Wide row design) >> CREATE TABLE clicks (hour text,adId int,itemId int,time timeuuid,PRIMARY >> KEY((adId, hour), time, itemId)) WITH CLUSTERING ORDER BY (time DESC); >> >> Now to query for a given Ad Id and specific 3 hours say 2015-01-07 11 to >> 2015-01-07 14, how do I use the token function in the CQL. >> > > From that description, it doesn't appear to me that you need the token > function. Just do 3 queries for each hour, each queries being something > along the lines of > SELECT * FROM clicks WHERE adId=... AND hour='2015-01-07 11' AND ... > > For completness sake, I should note that you could do that with a single > query by using an IN on the hour column, but it's actually not a better > solution (provided you submit the 3 queries in an asynchronous fashion at > least) in that case because of reason explained here: > https://medium.com/@foundev/cassandra-query-patterns-not-using-the-in-query-e8d23f9b17c7 > . > > -- > Sylvain > >> >
Are Triggers in Cassandra 2.1.2 performace Hog??
HI All, We are trying to integrate elasticsearch with Cassandra and as the river plugin uses select * from any table it seems to be bad performance choice. So i was thinking of inserting into elasticsearch using Cassandra trigger. So i wanted your view does a Cassandra Trigger impacts the performance of read/Write of Cassandra. Also any other way you guys achieve this please guide me. I am struck on this . Regards Asit
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
Be very very careful not to perform blocking calls to ElasticSearch in your trigger otherwise you will kill C* performance. The biggest danger of the triggers in their current state is that they are on the write path. In your trigger, you can try to push the mutation asynchronously to ES but in this case it will mean managing a thread pool and all related issues. Not even mentioning atomicity issues like: what happen if the update to ES fails or the connection times out ? etc ... As an alternative, instead of implementing yourself the integration with ES, you can have a look at Datastax Enterprise integration of Cassandra with Apache Solr (not free) or some open-source alternatives like Stratio or TupleJump fork of Cassandra with Lucene integration. On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK wrote: > HI All, > > We are trying to integrate elasticsearch with Cassandra and as the river > plugin uses select * from any table it seems to be bad performance choice. > So i was thinking of inserting into elasticsearch using Cassandra trigger. > So i wanted your view does a Cassandra Trigger impacts the performance of > read/Write of Cassandra. > > Also any other way you guys achieve this please guide me. I am struck on > this . > > Regards > Asit > >
Re: Re: Is it possible to implement a interface to replace a row in cassandra using cassandra.thrift?
really depends on your code for error handling, and since you're using thrift it really depends on the client, if you're doing client side timestamps then it's not related to time issues. On Tue, Jan 6, 2015 at 8:19 PM, wrote: > Hi, > > I found that in my function, both delete and update use the client side > timestamp. > > The update timestamp should be always bigger than the deletion timestamp. > > > I wonder why the update failed in some cases? > > > thank you. > > > - 原始邮件 - > 发件人:Ryan Svihla > 收件人:user@cassandra.apache.org, yhq...@sina.com > 主题:Re: Is it possible to implement a interface to replace a row in > cassandra using cassandra.thrift? > 日期:2015年01月06日 23点34分 > > replies inline > > On Tue, Jan 6, 2015 at 2:28 AM, wrote: > > Hi, all: > > I use cassandra.thrift to implement a replace row interface in this > way: > > First use batch_mutate to delete that row, then use batch_mutate to > insert a new row. > > I always find that after call this interface, the row is not exist. > > > Then I doubt that it is the problem caused by the deletion, because > the deleteion has a timestamp set by the client. > > Maybe the time is not so sync between the client and cassandra server > (1 or more seconds diff). > > > It's a distributed database so time synchronization really really matters > so use NTP, however if you're using client side timestamps on both the > insert and the delete it's not going to matter for that use case > > > > How to solve this?? Is it possible to implement a interface to > replace a row in cassandra.???\ > > > yeah all updates are this way. Inserts are actually "UPSERTS" and you can > go ahead and do two updates instead of insert, delete, update. > > > > Thanks. > > > > > -- > > Thanks, > Ryan Svihla > > -- Thanks, Ryan Svihla
TombstoneOverwhelmingException for few tombstones
Hi, I have a single partition key that been nagging me because I am receiving org.apache.cassandra.db.filter.TombstoneOverwhelmingException. After filing https://issues.apache.org/jira/browse/CASSANDRA-8561 I managed to find the partition key in question and which machine it was located on (by looking in system.log). Since I wanted to see how many tombstones the partition key actually had I did: nodetool flush mykeyspace mytable to make sure all changes were written to sstables (not sure this was necessary), then nodetool getsstables mykeyspace mytable PARTITIONKEY which listed two sstables. I then had a look at both sstables for my key in question using sstable2json MYSSTABLE1 -k PARTITIONKEY | jq . > MYSSTABLE1.json sstable2json MYSSTABLE2 -k PARTITIONKEY | jq . > MYSSTABLE2.json (piping through jq to format the json). Both JSON files contains data (so I have selected the right key). Only one of the files contains any tombstones $ cat MYSSTABLE1.json | grep '"t"'|wc -l 4281 $ cat MYSSTABLE2.json | grep '"t"'|wc -l 0 But to my surprise, the number of tombstones are nowhere near tombstone_failure_threshold: 10 Can anyone explain why Cassandra is overwhelmed when I’m nowhere near the hard limit? Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the same problem that you highlight, no different than your good idea of asynchronously pushing to ES. Each Cassandra write was indexed independently by each server in the replication group. If a node timed out or a mutation was dropped, that Solr node would have an out-of-sync index. Doing a solr query such as count(*) users could return inconsistent results depending on which node you hit since solr didn't support Cassandra consistency levels. I haven't seen any blog posts or docs as to whether this intrinsic mismatch between how Cassandra handles eventual consistency and Solr has ever been resolved. Ken On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: > Be very very careful not to perform blocking calls to ElasticSearch in > your trigger otherwise you will kill C* performance. The biggest danger of > the triggers in their current state is that they are on the write path. > > In your trigger, you can try to push the mutation asynchronously to ES but > in this case it will mean managing a thread pool and all related issues. > > Not even mentioning atomicity issues like: what happen if the update to ES > fails or the connection times out ? etc ... > > As an alternative, instead of implementing yourself the integration with > ES, you can have a look at Datastax Enterprise integration of Cassandra > with Apache Solr (not free) or some open-source alternatives like Stratio > or TupleJump fork of Cassandra with Lucene integration. > > On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK > wrote: > >> HI All, >> >> We are trying to integrate elasticsearch with Cassandra and as the river >> plugin uses select * from any table it seems to be bad performance choice. >> So i was thinking of inserting into elasticsearch using Cassandra trigger. >> So i wanted your view does a Cassandra Trigger impacts the performance of >> read/Write of Cassandra. >> >> Also any other way you guys achieve this please guide me. I am struck on >> this . >> >> Regards >> Asit >> >> >
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
HI All, What i intend to do is on every write i would push the code to elasticsearch using the Trigger. I know it would impact the Cassandra write but given that the WRITE is pretty performant on Cassandra would that lag be a big one. Also as per my information SOLR has limitation of using Nested JSON documents which is elasticsearch does seamlessly and hence it was our preference. Please Let me know about you thought on this as we are struck on this and i am looking into Streaming Part of cassandra in hope that i can find something Regards Asit On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock wrote: > When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the > same problem that you highlight, no different than your good idea of > asynchronously pushing to ES. > > Each Cassandra write was indexed independently by each server in the > replication group. If a node timed out or a mutation was dropped, that > Solr node would have an out-of-sync index. Doing a solr query such as > count(*) users could return inconsistent results depending on which node > you hit since solr didn't support Cassandra consistency levels. > > I haven't seen any blog posts or docs as to whether this intrinsic > mismatch between how Cassandra handles eventual consistency and Solr has > ever been resolved. > > Ken > > > On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: > >> Be very very careful not to perform blocking calls to ElasticSearch in >> your trigger otherwise you will kill C* performance. The biggest danger of >> the triggers in their current state is that they are on the write path. >> >> In your trigger, you can try to push the mutation asynchronously to ES >> but in this case it will mean managing a thread pool and all related issues. >> >> Not even mentioning atomicity issues like: what happen if the update to >> ES fails or the connection times out ? etc ... >> >> As an alternative, instead of implementing yourself the integration with >> ES, you can have a look at Datastax Enterprise integration of Cassandra >> with Apache Solr (not free) or some open-source alternatives like Stratio >> or TupleJump fork of Cassandra with Lucene integration. >> >> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> wrote: >> >>> HI All, >>> >>> We are trying to integrate elasticsearch with Cassandra and as the river >>> plugin uses select * from any table it seems to be bad performance choice. >>> So i was thinking of inserting into elasticsearch using Cassandra trigger. >>> So i wanted your view does a Cassandra Trigger impacts the performance >>> of read/Write of Cassandra. >>> >>> Also any other way you guys achieve this please guide me. I am struck on >>> this . >>> >>> Regards >>> Asit >>> >>> >> > > > >
Re:
Something to start considering is the partition key (first part of your primary key) drives your model more than anything. So if you're querying for all of X your partition key should probably be X, but there are some constraints to be mindful of. The rest of replies inline On Wed, Jan 7, 2015 at 1:37 AM, Nagesh wrote: > Thanks Ryan, Srinivas for you answer. > > Finally I have decided to create three column families > > 1. product_date_id (mm, dd, prodid) PRIMARY KEY ((mm), dd, prodid) > - Record the arrival date on updates of a product > - Get list of products that are recently added/updated Ex: [(mm, dd) > > (2014, 06)] > Could just be product_date and include the entire product graph needed, this is a tradeoff, and frequently it's optimal for performance reasons on the read side, the downside is your usually increasing your write payload. My thought is do a fully materialized view first and denormalize, and include the entire product, and if you find the write traffic is too much consider the index approach here then (which is easier after the fact to just drop the columns). > 1. product_status(prodid int, status int) PRIMARY KEY (prodid), INDEX on > (status) > - Each time I add a product just insert a record (prodid, defstatus) with > the condition IF NOT EXISTS, to avoid status being updated, Here I couldnt > avoid read before write to protect product status > > As for protecting product status that's fine, however, you could just do what most applications do and update regardless of previous status. This leads into different locking theories and what the right behavior for an application is, but this is something most people never think twice about when using MySQL or Oracle, and in the end they update status in unprotected ways. Something to ponder. - Update Enable/Disable prodid > - Get list of product ids with the give status > > List of product ids with a given status query will probably suck using a 2i, think scanning ALL of the nodes to get potentially as little as 2 records (if that fits within SLA however, kudos, just be aware of the behavior). Assuming you have large status counts and limited status items, the data model gets trickier, as there are some rule of thumb style constraints (varies on hardware and SLA what you can tolerate). Say you had a primary key of (status, prodid), this would in theory very quickly return all of the ACTIVE prodids as there may only be a few hundred, but lets say you want to return all the archived prodids there maybe billions and this would likely take far far too long to return in one query, not to mention compaction of such a large partition will be fun, and it'll unbalance your cluster. So frequently for this particular corner I end up having to do some form of sharding to spread status over the cluster and keep sizes of the partition reasonable (and query in an async fashion to get all of the queries in a reasonable time). primary key((status, shardId), prodId) The shardid can be any up to the reasonable size limits of your hardware and cluster (say 50k for rule of thumb), and there are a number of different approaches: - it can be a random uuid but then you have to track with a separate table what shardIds there are for that particular status (this is not uncommon) - it can be a fixed size say 1 and you can just increment the number by 1 (but make sure as you're updating this you're not introducing any fun state bugs that have to different shards writing to the same number). When you query you keep increasing the number until you stop getting responses. This has the downside in that optimization is a bit hard to get right. Optionally you can have a static column in the table called maxShardId that once you've done your first query you know how many parallel queries you have to send out. - It can be based on some business logic or domain rule that includes some fixed boundaries, say add a productGroupId in there, and you know from an application level, how many productGroupIds there are. This has the downside of not giving you absolute protection against fat partitions, on the upside it fits your natural domain model and is easier to reason about. > 2. product_details(prodgrp, prodid, . ) > PRIMARY KEY (prodgrp, prodid) > - Insert product details in the prodgrp blindly to store recent updates of > the product details > - Get list of products in the product group > - Get details of products for the give ids > > "get list of products for a given range of ids" : My queries are answered > with the above design. > > PS: I am still thinking to avoid read before write on product_status. And > would like to see if there is better way to design using supercolumn > families or materialized views which I am yet to explore. > > Materialized views are your friend, use them freely but as always being mindful of real world constraints and goals. > Regards, > Nageswara Rao > > On Tue, Jan 6, 2015 at 10:53 PM, Ryan Svihla wrote: >
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
@Ken So I actually support a lot of the DSE Search users and teach classes on it, so as long as you're not dropping mutations you're in sync, and if you're dropping mutations you're probably sized way too small anyway, and once you run repair (which you should be doing anyway when dropping mutations) you're back in sync. I actually think because of that the models work well together. FWIW the improvement since 3.0 is MASSIVE (it's been what I'd call stable since 3.2.x and we're on 4.6 now) @Asit to answer the ES question, it's not really for me to say at all what the lag will be or to help in advising sizing of ES, so that's probably more of a question for them. On Wed, Jan 7, 2015 at 8:56 AM, Asit KAUSHIK wrote: > HI All, > > What i intend to do is on every write i would push the code to > elasticsearch using the Trigger. I know it would impact the Cassandra write > but given that the WRITE is pretty performant on Cassandra would that lag > be a big one. > > Also as per my information SOLR has limitation of using Nested JSON > documents which is elasticsearch does seamlessly and hence it was our > preference. > > Please Let me know about you thought on this as we are struck on this and > i am looking into Streaming Part of cassandra in hope that i can find > something > > Regards > Asit > > > > On Wed, Jan 7, 2015 at 8:16 PM, Ken Hancock > wrote: > >> When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the >> same problem that you highlight, no different than your good idea of >> asynchronously pushing to ES. >> >> Each Cassandra write was indexed independently by each server in the >> replication group. If a node timed out or a mutation was dropped, that >> Solr node would have an out-of-sync index. Doing a solr query such as >> count(*) users could return inconsistent results depending on which node >> you hit since solr didn't support Cassandra consistency levels. >> >> I haven't seen any blog posts or docs as to whether this intrinsic >> mismatch between how Cassandra handles eventual consistency and Solr has >> ever been resolved. >> >> Ken >> >> >> On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: >> >>> Be very very careful not to perform blocking calls to ElasticSearch in >>> your trigger otherwise you will kill C* performance. The biggest danger of >>> the triggers in their current state is that they are on the write path. >>> >>> In your trigger, you can try to push the mutation asynchronously to ES >>> but in this case it will mean managing a thread pool and all related issues. >>> >>> Not even mentioning atomicity issues like: what happen if the update to >>> ES fails or the connection times out ? etc ... >>> >>> As an alternative, instead of implementing yourself the integration with >>> ES, you can have a look at Datastax Enterprise integration of Cassandra >>> with Apache Solr (not free) or some open-source alternatives like Stratio >>> or TupleJump fork of Cassandra with Lucene integration. >>> >>> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> > wrote: >>> HI All, We are trying to integrate elasticsearch with Cassandra and as the river plugin uses select * from any table it seems to be bad performance choice. So i was thinking of inserting into elasticsearch using Cassandra trigger. So i wanted your view does a Cassandra Trigger impacts the performance of read/Write of Cassandra. Also any other way you guys achieve this please guide me. I am struck on this . Regards Asit >>> >> >> >> >> > -- Thanks, Ryan Svihla
Keyspace uppercase name issues
Hi, We have a Cassandra cluster with Keyspaces that were created using the thrift api and thei names contain upper case letters. We are trying to use the new Datastax driver (version 2.1.4, maven's latest ) but encountering some problems due to upper case handling. Datastax provide this guidance on how to handle lower-upper cases: http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html However, there seems to be something confusing in the API. Attached a small java code that reproduces the problem. Many thanks, Harel. Test.java Description: Binary data
Re: Keyspace uppercase name issues
We noticed the same issue. From the cassandra-cli, it allows to use upper case or mixed case Keyspace name but from cqlsh it auto converts to lower case. Thanks Ajay On Wed, Jan 7, 2015 at 9:44 PM, Harel Gliksman wrote: > Hi, > > We have a Cassandra cluster with Keyspaces that were created using the > thrift api and thei names contain upper case letters. > We are trying to use the new Datastax driver (version 2.1.4, maven's > latest ) but encountering some problems due to upper case handling. > > Datastax provide this guidance on how to handle lower-upper cases: > > http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html > > However, there seems to be something confusing in the API. > > Attached a small java code that reproduces the problem. > > Many thanks, > Harel. >
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK wrote: > We are trying to integrate elasticsearch with Cassandra and as the river > plugin uses select * from any table it seems to be bad performance choice. > So i was thinking of inserting into elasticsearch using Cassandra trigger. > So i wanted your view does a Cassandra Trigger impacts the performance of > read/Write of Cassandra. > I would not use triggers in production in their current form. =Rob
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
+1. Don't use triggers. On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli wrote: > On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK > wrote: >> >> We are trying to integrate elasticsearch with Cassandra and as the river >> plugin uses select * from any table it seems to be bad performance choice. >> So i was thinking of inserting into elasticsearch using Cassandra trigger. >> So i wanted your view does a Cassandra Trigger impacts the performance of >> read/Write of Cassandra. > > > I would not use triggers in production in their current form. > > =Rob -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: Are Triggers in Cassandra 2.1.2 performace Hog??
DSE does now have a queue to decouple Cassandra insert and Solr indexing. It will block only when/if the queue is filled - you can configure the size of the queue. So, to be clear, DSE no longer has the highlighted problem mentioned for ES. -- Jack Krupansky On Wed, Jan 7, 2015 at 9:46 AM, Ken Hancock wrote: > When last I looked at Datastax Enterprise (DSE 3.0ish), it exhibits the > same problem that you highlight, no different than your good idea of > asynchronously pushing to ES. > > Each Cassandra write was indexed independently by each server in the > replication group. If a node timed out or a mutation was dropped, that > Solr node would have an out-of-sync index. Doing a solr query such as > count(*) users could return inconsistent results depending on which node > you hit since solr didn't support Cassandra consistency levels. > > I haven't seen any blog posts or docs as to whether this intrinsic > mismatch between how Cassandra handles eventual consistency and Solr has > ever been resolved. > > Ken > > > On Wed, Jan 7, 2015 at 9:05 AM, DuyHai Doan wrote: > >> Be very very careful not to perform blocking calls to ElasticSearch in >> your trigger otherwise you will kill C* performance. The biggest danger of >> the triggers in their current state is that they are on the write path. >> >> In your trigger, you can try to push the mutation asynchronously to ES >> but in this case it will mean managing a thread pool and all related issues. >> >> Not even mentioning atomicity issues like: what happen if the update to >> ES fails or the connection times out ? etc ... >> >> As an alternative, instead of implementing yourself the integration with >> ES, you can have a look at Datastax Enterprise integration of Cassandra >> with Apache Solr (not free) or some open-source alternatives like Stratio >> or TupleJump fork of Cassandra with Lucene integration. >> >> On Wed, Jan 7, 2015 at 2:40 PM, Asit KAUSHIK >> wrote: >> >>> HI All, >>> >>> We are trying to integrate elasticsearch with Cassandra and as the river >>> plugin uses select * from any table it seems to be bad performance choice. >>> So i was thinking of inserting into elasticsearch using Cassandra trigger. >>> So i wanted your view does a Cassandra Trigger impacts the performance >>> of read/Write of Cassandra. >>> >>> Also any other way you guys achieve this please guide me. I am struck on >>> this . >>> >>> Regards >>> Asit >>> >>> >> > > > >
Is it possible to delete columns or row using CQLSSTableWriter?
CQLSSTableWriter only accepts an INSERT or UPDATE statement. I'm wondering whether make it accept DELETE statement. I need to update my cassandra table with a lot of data everyday. * I may need to delete a row (given the partition key) * I may need to delete some columns. For example, there are 20 rows for a primary key before loading, the new load may have 10 rows only. Because CQLSSTableWriter will write into a blank table, will DELETE put a tombstone in the table so that the row in the server will be deleted after bulk loading? Thanks.
How to bulkload into a specific data center?
I set up two virtual data centers, one for analytics and one for REST service. The analytics data center sits top on Hadoop cluster. I want to bulk load my ETL results into the analytics data center so that the REST service won't have the heavy load. I'm using CQLTableInputFormat in my Spark Application, and I gave the nodes in analytics data center as Intialial address. However, I found my jobs were connecting to the REST service data center. How can I specify the data center?
Rebooted cassandra node timing out all requests but recovers after a while
Hi, We have a 3 node cluster (on VM). Eg. host1, host2, host3. One of the VM rebooted (host1) and when host1 came up it would see the others as down and the others (host2 and host3) see it as down. So we restarted host2 and now the ring seems fine(everybody sees everybody as up). But now the clients timeout talking to host1. Have not figured out what is causing it. There is nothing in the logs that indicates a problem. Looking for indicators/help on what debug/tracing to turn on to find out what could be causing it. Now this happens only when a VM reboots (not otherwise), also it seems to have recovered itself after some hours!!( or restarts) not sure which one. This is 1.2.15, we are using ssl and cassandra authorizers. Thanks Anand
Re: Rebooted cassandra node timing out all requests but recovers after a while
Hi Anand, On 08/01/15 02:02, Anand Somani wrote: Hi, We have a 3 node cluster (on VM). Eg. host1, host2, host3. One of the VM rebooted (host1) and when host1 came up it would see the others as down and the others (host2 and host3) see it as down. So we restarted host2 and now the ring seems fine(everybody sees everybody as up). But now the clients timeout talking to host1. Have not figured out what is causing it. There is nothing in the logs that indicates a problem. Looking for indicators/help on what debug/tracing to turn on to find out what could be causing it. Now this happens only when a VM reboots (not otherwise), also it seems to have recovered itself after some hours!!( or restarts) not sure which one. This is 1.2.15, we are using ssl and cassandra authorizers. perhaps time is not synchronized between the nodes to begin with, and eventually becomes synchronized. Ciao, Duncan.
Re: Keyspace uppercase name issues
Thanks Ajay for your reply, My problem is not with the cqlsh interface, but with the java Datastax driver. It seems that for cqlsh, one needs to simply quote names that contain upper cases. With the driver, I experience inconsistent handling of upper case. Either I am doing something wrong, or there's some minor bug. On Wed, Jan 7, 2015 at 6:29 PM, Ajay wrote: > We noticed the same issue. From the cassandra-cli, it allows to use upper > case or mixed case Keyspace name but from cqlsh it auto converts to lower > case. > > Thanks > Ajay > > On Wed, Jan 7, 2015 at 9:44 PM, Harel Gliksman > wrote: > >> Hi, >> >> We have a Cassandra cluster with Keyspaces that were created using the >> thrift api and thei names contain upper case letters. >> We are trying to use the new Datastax driver (version 2.1.4, maven's >> latest ) but encountering some problems due to upper case handling. >> >> Datastax provide this guidance on how to handle lower-upper cases: >> >> http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/ucase-lcase_r.html >> >> However, there seems to be something confusing in the API. >> >> Attached a small java code that reproduces the problem. >> >> Many thanks, >> Harel. >> > >