Re: [HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread Bowen Song via user
Hi Scott, Thank you for pointing this out. I found it too, but I deemed it to be irrelevant because the following reasons: * it was fixed in 4.1.1, as you have correctly pointed out; and * the error message is slightly different, "writevAddresses" vs "writeAddress"; and * it actually go

Re: [HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread C. Scott Andreas
Bowen, thanks for reaching out.My mind immediately jumped to a ticket which has very similar pathology: "CASSANDRA-18110: Streaming progress virtual table lock contention can trigger TCP_USER_TIMEOUT and fail streaming" -- but I see this was fixed in 4.1.1.On Sep 11, 2023, at 2:09 PM, Bowen Son

Re: Help determining pending compactions

2022-11-07 Thread Richard Hesse
Thanks for the tip Eric. We're actually on 3.2 and the issue isn't with the Reaper. The issue is with Cassandra. It will report that a table has pending compactions, but it will never actually start compacting. The pending number stays at that level until we run a manual compaction. -richard On

RE: Help determining pending compactions

2022-11-07 Thread Eric Ferrenbach
We had issues where Reaper would never actually start some repairs. The GUI would say RUNNING but the progress would be 0/. Datastax support said there is a bug and recommended upgrading to 3.2. Upgrading Reaper to 3.2 resolved our issue. Hope this helps. Eric From: Richard Hesse Sent: S

Re: Help determining pending compactions

2022-10-30 Thread Richard Hesse
Sorry about that. 4.0.6 On Sun, Oct 30, 2022, 11:19 AM Dinesh Joshi wrote: > It would be helpful if you could tell us what version of Cassandra you’re > using? > > Dinesh > > > On Oct 30, 2022, at 10:07 AM, Richard Hesse wrote: > > > >  > > Hi, I'm hoping to get some help with a vexing issue w

Re: Help determining pending compactions

2022-10-30 Thread Dinesh Joshi
It would be helpful if you could tell us what version of Cassandra you’re using? Dinesh > On Oct 30, 2022, at 10:07 AM, Richard Hesse wrote: > >  > Hi, I'm hoping to get some help with a vexing issue with one of our > keyspaces. During Reaper repair sessions, one keyspace will end up with >

RE: Help with sudden spike in read requests

2019-02-01 Thread Kenneth Brotman
monitoring and how did you find out it was happening? Is this a DSE cluster or OSS Cassandra cluster? Kenneth Brotman From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] Sent: Friday, February 01, 2019 10:48 AM To: user@cassandra.apache.org Subject: Re: Help with sudden spike in read

Re: Help with sudden spike in read requests

2019-02-01 Thread Subroto Barua
; happening? What changed since it started happening? > > Kenneth Brotman > > From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] > Sent: Friday, February 01, 2019 10:13 AM > To: user@cassandra.apache.org > Subject: Re: Help with sudden spike in read requests > >

RE: Help with sudden spike in read requests

2019-02-01 Thread Kenneth Brotman
, February 01, 2019 10:13 AM To: user@cassandra.apache.org Subject: Re: Help with sudden spike in read requests Vnode is 256 C*: 3.0.15 on m4.4xlarge gp2 vol There are 2 more DCs on bare metal (raid 10 and older machines) attached to this cluster and we have not seen this behavior on on-prem

Re: Help with sudden spike in read requests

2019-02-01 Thread Subroto Barua
Vnode is 256 C*: 3.0.15 on m4.4xlarge gp2 vol There are 2 more DCs on bare metal (raid 10 and older machines) attached to this cluster and we have not seen this behavior on on-prem servers If this event is triggered by some bad query/queries, what is the best way to trap it? Subroto > On Fe

RE: Help with sudden spike in read requests

2019-02-01 Thread Kenneth Brotman
If you had a query that went across the partitions and especially if you had vNodes set high, that would do it. Kenneth Brotman From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] Sent: Friday, February 01, 2019 8:45 AM To: User cassandra.apache.org Subject: Help with sudden spike in

Re: Help in understanding strange cassandra CPU usage

2018-12-09 Thread Michael Shuler
On 12/9/18 4:09 AM, Devaki, Srinivas wrote: > > Cassandra Version: 2.2.4 There have been over 300 bug fixes and improvements in the nearly 3 years between 2.2.4 and the latest 2.2.13 release. Somewhere in there was a GC logging addition as I scanned the changes, which could help with troubleshoot

Re: Help in understanding strange cassandra CPU usage

2018-12-09 Thread Jeff Jirsa
Sounds like over time you’re ending to doing something odd - maybe you’re leaking cql connections or something and it gets more and more intensive to manage them until you invoke the breaker, then it drops Will probably take someone going through a heap dump to really understand what’s going on

Re: Help needed to enbale Client-to-node encryption(SSL)

2018-02-19 Thread Alain RODRIGUEZ
> > (2.0 is getting pretty old and isn't supported, you may want to consider > upgrading; 2.1 would be the smallest change and least risk, but it, too, is > near end of life) I would upgrade as well. Yet I think moving from Cassandra 2.0 to Cassandra 2.2 directly is doable smoothly and preferabl

Re: Help needed to enbale Client-to-node encryption(SSL)

2018-02-16 Thread Jeff Jirsa
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html https://www.youtube.com/watch?v=CKt0XVPogf4 (2.0 is getting pretty old and isn't supported, you may want to consider upgrading; 2.1 would be the smallest change and least risk, but it, too, is n

Re: Help in c* Data modelling

2017-07-23 Thread @Nandan@
Hi , The best way will go with per query per table plan.. and distribute the common column into both tables. This will help you to support queries as well as Read and Write will be fast. Only Drawback will be, you have to insert common data into both tables at the same time which can be easily hand

Re: Help in c* Data modelling

2017-07-23 Thread Jonathan Haddad
Using a different table to answer each query is the correct answer here assuming there's a significant amount of data. If you don't have that much data, maybe you should consider using a database like Postgres which gives you query flexibility instead of horizontal scalability. On Sun, Jul 23, 201

Re: Help in c* Data modelling

2017-07-23 Thread techpyaasa .
Hi vladyu/varunbarala Instead of creating second table as you said can I just have one(first) table below and get all rows with status=0. CREATE TABLE IF NOT EXISTS test.user ( account_id bigint, pid bigint, disp_name text, status int, PRIMARY KEY (account_id, pid) ) WITH CLUSTERING ORDER BY (pid

Re: Help in c* Data modelling

2017-07-23 Thread Vladimir Yudovin
Hi, unfortunately ORDER BY is supported for clustering columns only... Winguzone - Cloud Cassandra Hosting On Sun, 23 Jul 2017 12:49:36 -0400 techpyaasa . wrote Hi Varun, Thanks a lot for your reply. In this case if I want to update status(st

Re: Help in c* Data modelling

2017-07-23 Thread techpyaasa .
Hi Varun, Thanks a lot for your reply. In this case if I want to update status(status can be updated for given account_id, pid) , I need to delete existing row in 2nd table & add new one... :( :( Its like hitting cassandra twice for 1 change.. :( On Sun, Jul 23, 2017 at 8:42 PM, Varun Barala

Re: Help in c* Data modelling

2017-07-23 Thread Varun Barala
Hi, You can create pseudo index table. IMO, structure can be:- CREATE TABLE IF NOT EXISTS test.user ( account_id bigint, pid bigint, disp_name text, status int, PRIMARY KEY (account_id, pid) ) WITH CLUSTERING ORDER BY (pid ASC); CREATE TABLE IF NOT EXISTS test.user_index ( account_id bigint, pi

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Zoltan Lorincz
Great suggestion! Thanks Avi! On Mon, Mar 27, 2017 at 3:47 PM, Avi Kivity wrote: > You can use static columns to and just one table: > > > CREATE TABLE documents ( > > doc_id uuid, > > element_id uuid, > > description text static, > > doc_title text static, > > element_title

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Avi Kivity
You can use static columns to and just one table: CREATE TABLE documents ( doc_id uuid, element_id uuid, description text static, doc_title text static, element_title text, PRIMARY KEY (doc_id, element_id) ); The static columns are present once per unique doc_id.

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Zoltan Lorincz
Thank you Matija, because i am newbie, it was not clear for me that i am able to query by the partition key (not providing the clustering key), sorry about that! Zoltan. On Mon, Mar 27, 2017 at 1:54 PM, Matija Gobec wrote: > Thats exactly what I described. IN queries can be used sometimes but I

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Matija Gobec
Thats exactly what I described. IN queries can be used sometimes but I usually run parallel async as Alexander explained. On Mon, Mar 27, 2017 at 12:08 PM, Zoltan Lorincz wrote: > Hi Alexander, > > thank you for your help! I think we found the answer: > > CREATE TABLE documents ( > doc_id uu

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Zoltan Lorincz
Hi Alexander, thank you for your help! I think we found the answer: CREATE TABLE documents ( doc_id uuid, description text, title text, PRIMARY KEY (doc_id) ); CREATE TABLE nodes ( doc_id uuid, element_id uuid, title text, PRIMARY KEY (doc_id, element_id) ); We

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-26 Thread Alexander Dejanovski
Hi Zoltan, you must try to avoid multi partition queries as much as possible. Instead, use asynchronous queries to grab several partitions concurrently. Try to send no more than ~100 queries at the same time to avoid DDOS-ing your cluster. This would leave you roughly with 1000+ async queries gro

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-26 Thread Zoltan Lorincz
Querying by (doc_id and element_id ) OR just by (element_id) is fine, but the real question is, will it be efficient to query 100k+ primary keys in the elements table? e.g. SELECT * FROM elements WHERE element_id IN (element_id1, element_id2, element_id3, element_id100K+) ? The elements_id

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-26 Thread Matija Gobec
Have one table hold document metadata (doc_id, title, description, ...) and have another table elements where partition key is doc_id and clustering key is element_id. Only problem here is if you need to query and/or update element just by element_id but I don't know your queries up front. On Sun,

Re: HELP with bulk loading

2017-03-14 Thread Artur R
Thank you all! It turns out that the fastest ways are: https://github.com/brianmhess/ cassandra-loader and COPY FROM. So I decided to stick with COPY FROM as it built-in and easy-to-use. On Fri, Mar 10, 2017 at 2:22 PM, Ahmed Eljami wrote: > Hi, > > >3. sstableloader is slow too. Assuming that

Re: HELP with bulk loading

2017-03-10 Thread Ahmed Eljami
Hi, >3. sstableloader is slow too. Assuming that I have new empty C* cluster, how can I improve the upload speed? Maybe disable replication or some other settings while streaming and then turn it back? Maybe you can accelerate you load with the option -cph (connection per host): https://issues.ap

Re: HELP with bulk loading

2017-03-09 Thread Stefania Alborghetti
When I tested cqlsh COPY FROM for CASSANDRA-11053 , I was able to import about 20 GB in under 4 minutes on a cluster with 8 nodes u

Re: HELP with bulk loading

2017-03-09 Thread Ryan Svihla
I suggest using cassandra loader https://github.com/brianmhess/cassandra-loader On Mar 9, 2017 5:30 PM, "Artur R" wrote: > Hello all! > > There are ~500gb of CSV files and I am trying to find the way how to > upload them to C* table (new empty C* cluster of 3 nodes, replication > factor 2) with

Re: Help with cassandra triggers

2017-01-17 Thread Jonathan Haddad
Trigger only gets executed on the coordinator. There's no remote DC trigger. What you need is Change Data Capture (CDC). https://issues.apache.org/jira/browse/CASSANDRA-8844 On Tue, Jan 17, 2017 at 9:40 AM suraj pasuparthy wrote: > Hello > We have a usecase where we need to support triggers wi

Re: Help

2017-01-15 Thread Jonathan Haddad
I've heard enough stories of firewall issues that I'm willing to bet it's the problem, if it's sitting between the nodes. On Sun, Jan 15, 2017 at 9:32 AM Anshu Vajpayee wrote: > ​Setup is not on cloud. We have few nodes in one DC(1) and same number > of nodes in other DC(2). We have dedicated f

Re: Help

2017-01-15 Thread Anshu Vajpayee
​Setup is not on cloud. We have few nodes in one DC(1) and same number of nodes in other DC(2). We have dedicated firewall in-front on nodes. Read and write happen with local quorum so those dont get affected but hints get accumulated from one DC to other DC for replications. Hints are also gett

Re: Help

2017-01-14 Thread Aleksandr Ivanov
Could you share a bit your cluster setup? Do you use cloud for your deployment or dedicated firewalls in front of nodes? If gossip shows that everything is up it doesn't mean that all nodes can communicate with each other. I have noticed situations when TCP connection was killed by firewall and Ca

Re: Help

2017-01-09 Thread Chris Lohfink
Do you have any monitoring setup around garbage collections? A GC + network latency > write timeout will cause intermittent hints. On Sun, Jan 8, 2017 at 10:30 PM, Anshu Vajpayee wrote: > Gossip shows - all nodes are up. > > But when we perform writes , coordinator stores the hints. It means

Re: Help

2017-01-09 Thread Edward Capriolo
On Sun, Jan 8, 2017 at 11:30 PM, Anshu Vajpayee wrote: > Gossip shows - all nodes are up. > > But when we perform writes , coordinator stores the hints. It means - > coordinator was not able to deliver the writes to few nodes after meeting > consistency requirements. > > The nodes for which wr

Re: Help on temporal data modeling

2016-09-23 Thread Peter Lin
yes it would. Whether next_billing_date is timestamp or date wouldn't make any difference on scanning all partitions. If you want to them to be on the same node, you can use composite key, but there's a trade off. The nodes may get unbalanced, so you have to do the math to figure out if your specif

Re: Help on temporal data modeling

2016-09-23 Thread Denis Mikhaylov
Thanks, for you answer. Wouldn’t simple `select * from subscriptions where next_billing_date = '2016-10-25’` require full scan of all partitions? > On 23 Sep 2016, at 14:28, Peter Lin wrote: > > > Ignoring noSql for a minute, the standard way of modeling this in car and > health insurance is

Re: Help on temporal data modeling

2016-09-23 Thread Peter Lin
Ignoring noSql for a minute, the standard way of modeling this in car and health insurance is with effective/expiration day. Commonly called bi-temporal data modeling. How people model bi-temporal models varies quite a bit from first hand experience, but the common thing is to have transaction tim

Re: Help on temporal data modeling

2016-09-23 Thread Alain RODRIGUEZ
Hi Denis, You might want to have a look at - Materialized views http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views - Secondary index https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html My 2 cents: make sure to understand the implications before moving forwa

Re: Help regarding cassandra node networking..

2016-07-12 Thread Alain RODRIGUEZ
Hi, A few comments as I read: > I'm using node0 as seeder It is recommended to use at least 2 nodes per DC as seeds, for fault tolerance and good gossip propagation. Also I think you are a bit confused about what a seed node is and is not. See: https://docs.datastax.com/en/cassandra/2.1/cassa

Re: Help debugging a very slow query

2016-01-13 Thread Robert Coli
On Wed, Jan 13, 2016 at 12:40 PM, Bryan Cheng wrote: > 1) What's up with the megapartition? What's the best way to debug this? > Our data model is largely write once, we don't do any updates. We do > DELETE, but the partitions that are giving us issues haven't been removed. > We had some suspicio

Re: Help debugging a very slow query

2016-01-13 Thread Jeff Jirsa
Very large partitions create a lot of garbage during reads: https://issues.apache.org/jira/browse/CASSANDRA-9754 - you will see significant GC pauses trying to read from large enough partitions. I suspect GC, though it’s odd that you’re unable to see it. From: Bryan Cheng Reply-To: "user@

Re: Help diagnosing performance issue

2015-11-30 Thread Ryan Svihla
Honestly 20ms for spinning disks is really good, so I think you're just dealing with the reality of having a certain percentage of your reads off disk and not in memory. If you're reading data that is on older SSTables and you're out of buffer cache I'm not sure how else you could improve that. So

Re: Help diagnosing performance issue

2015-11-25 Thread Antoine Bonavita
Sebastian (and others, help is always appreciated), After 24h OK, read latencies started to degrade (up to 20ms) and I had to ramp down volumes again. The degradation is clearly linked to the number read IOPs which went up to 1.65k/s after 24h. If anybody can give me hints on what I should

Re: Help diagnosing performance issue

2015-11-23 Thread Antoine Bonavita
Sebastian, I tried to ramp up volume with this new setting and ran into the same problems. After that I restarted my nodes. This pretty much instantly got read latencies back to normal (< 5ms) on the 32G nodes. I am currently ramping up volumes again and here is what I am seeing on 32G nod

Re: Help diagnosing performance issue

2015-11-19 Thread rock zhang
unsubscribe. > On Nov 19, 2015, at 3:58 PM, Antoine Bonavita wrote: > > Sebastian, > > I took into account your suggestion and set max_sstable_age_days to 1. > > I left the TTL at 432000 and the gc_grace_seconds at 172800. So, I expect > SSTable older than 7 days to get deleted. Am I right ?

Re: Help diagnosing performance issue

2015-11-19 Thread Antoine Bonavita
Sebastian, I took into account your suggestion and set max_sstable_age_days to 1. I left the TTL at 432000 and the gc_grace_seconds at 172800. So, I expect SSTable older than 7 days to get deleted. Am I right ? I did not change dclocal_read_repair_chance because I have only one DC at this po

Re: Help diagnosing performance issue

2015-11-18 Thread Sebastian Estevez
> > When you say drop you mean reduce the value (to 1 day for example), not > "don't set the value", right ? Yes. If I set max sstable age days to 1, my understanding is that SSTables with > expired data (5 days) are not going to be compacted ever. And therefore my > disk usage will keep growing

Re: Help diagnosing performance issue

2015-11-18 Thread Antoine Bonavita
Sebastian, Your help is very much appreciated. I re-read the blog post and also https://labs.spotify.com/2014/12/18/date-tiered-compaction/ but some things are still confusing me. Please see my questions inline below. On 11/18/2015 04:21 PM, Sebastian Estevez wrote: Yep, I think you've mixe

Re: Help diagnosing performance issue

2015-11-18 Thread Sebastian Estevez
Yep, I think you've mixed up your DTCS levers. I would read, or re-read Marcus's post http://www.datastax.com/dev/blog/datetieredcompactionstrategy *TL;DR:* - *base_time_seconds* is the size of your initial window - *max_sstable_age_days* is the time after which you stop compacting ssta

Re: Help diagnosing performance issue

2015-11-18 Thread Antoine Bonavita
Sebastian, Robet, First, a big thank you to both of you for your help. It looks like you were right. I used pcstat (awesome tool, thanks for that as well) and it appears some files I would not expect to be in cache actually are. Here is a sample of my output (edited for convenience, adding th

Re: Help diagnosing performance issue

2015-11-17 Thread Robert Coli
On Tue, Nov 17, 2015 at 11:08 AM, Sebastian Estevez < sebastian.este...@datastax.com> wrote: > You're sstables are probably falling out of page cache on the smaller > nodes and your slow disks are killing your latencies. > +1 most likely. Are the heaps the same size on both machines? =Rob

Re: Help diagnosing performance issue

2015-11-17 Thread Sebastian Estevez
Hi, You're sstables are probably falling out of page cache on the smaller nodes and your slow disks are killing your latencies. Check to see if this is the case with pcstat: https://github.com/tobert/pcstat All the best, [image: datastax_logo.png] Sebastián Estéve

Re: Help diagnosing performance issue

2015-11-17 Thread Antoine Bonavita
Hello, As I have not heard from anybody on the list, I guess I did not provide the right kind of information or I did not ask the right question. The things I forgot to mention in my previous email: * Checked the logs without noticing anything out of the ordinary. Memtables flushes occur ever

Re: Help with tombstones and compaction

2015-09-22 Thread Venkatesh Arivazhagan
katesh Arivazhagan > Reply-To: "user@cassandra.apache.org" > Date: Monday, September 21, 2015 at 1:41 PM > To: "user@cassandra.apache.org" > Subject: Re: Help with tombstones and compaction > > Thank you for your reply Jeff! > > I will switch to Cassandra

Re: Help with tombstones and compaction

2015-09-21 Thread Jeff Jirsa
unit (milliseconds vs microseconds) could certainly cause confusion. From: Venkatesh Arivazhagan Reply-To: "user@cassandra.apache.org" Date: Monday, September 21, 2015 at 1:41 PM To: "user@cassandra.apache.org" Subject: Re: Help with tombstones and compaction Thank you

Re: Help with tombstones and compaction

2015-09-21 Thread Venkatesh Arivazhagan
Thank you for your reply Jeff! I will switch to Cassandra 2.1.9. Quick follow up question: Does the schema, settings I have setup look alright? My timestamp column's type is blob - I was wondering if this could confuse DTCS? On Sun, Sep 20, 2015 at 3:37 PM, Jeff Jirsa wrote: > 2.1.4 is getting

Re: Help with tombstones and compaction

2015-09-20 Thread Jeff Jirsa
2.1.4 is getting pretty old. There’s a DTCS deletion tweak in 2.1.5 ( https://issues.apache.org/jira/browse/CASSANDRA-8359 ) that may help you. 2.1.5 and 2.1.6 have some memory leak issues in DTCS, so go to 2.1.7 or newer (probably 2.1.9 unless you have a compelling reason not to go to 2.1.9)

Re: Help understanding aftermath of death by GC

2015-03-31 Thread Robert Coli
On Tue, Mar 31, 2015 at 9:12 AM, Jens Rantil wrote: > One issue when you are running a JVM and start running out of memory is > that the JVM can start throwing `OutOfMemoryError` in any thread - not > necessarily in the thread which is taking all the memory. I've seen this > happen multiple times

Re: Help understanding aftermath of death by GC

2015-03-31 Thread Jens Rantil
Hi Robert, On Tue, Mar 31, 2015 at 2:22 PM, Robert Wille wrote: > Can anybody help me understand why Cassandra wouldn’t recover? One issue when you are running a JVM and start running out of memory is that the JVM can start throwing `OutOfMemoryError` in any thread - not necessarily in the thr

Re: Help understanding aftermath of death by GC

2015-03-31 Thread Jason Wee
Hey Robert, you might want to start by looking into the statistics of cassandra, either exposed via nodetool or if you have monitoring system monitor the important metrics. I have read this article moment ago and I hope it help you http://aryanet.com/blog/cassandra-garbage-collector-tuning to begin

Re: Help on modeling a table

2015-02-02 Thread Asit KAUSHIK
I'll try your recommendations and would update on the same Thanks so much Cheers Asit On Mon, Feb 2, 2015, 9:56 PM Eric Stevens wrote: > Just a minor observation: those field names are extremely long. You store > a copy of every field name with every value with only a couple of > exceptions: >

Re: Help on modeling a table

2015-02-02 Thread Eric Stevens
Just a minor observation: those field names are extremely long. You store a copy of every field name with every value with only a couple of exceptions: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html Your partition key column name (lo

Re: Help on modeling a table

2015-02-02 Thread Jan
HI Asit;  The Partition key is only a part of the performance. Recommend reading this article:  Advanced Time Series with Cassandra   |   | |   | |   |   |   |   |   | | Advanced Time Series with CassandraDataStax - Software, support, and training for Apache Cassandra | | | | View on www.datast

Re: Help on modeling a table

2015-02-02 Thread Jack Krupansky
A leading wildcard is one of the slowest things you can do with Lucene, and not a recommended practice, so either accept that it is slow or don't do it. That said, there is a trick you can do with a reverse wildcard filter, but that's an expert-level feature and not recommended for average develop

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-22 Thread Les Hartzman
Thanks everyone for the responses. One thing I'd forgotten about was the need to model the CFs with regard to the kind of queries that are needed. Fortunately this is primarily a write-once/read-many type of application, so deletions are not currently a concern, but worth keeping in mind for the fu

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-20 Thread Brice Dutheil
I’m fairly new to cassandra, but here’s my input. Think of your column families as a projection of how the application needs them. Thinking with CQRS in mind helps. So with more CFs that may require more space, as data may be written differently in different column families for different usage. Fo

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-19 Thread Jack Krupansky
Start by asking how you intend to query the data. That should drive the data model. Is there existing app client code or an app layer that is already using the current schema, or are you intending to rewrite that as well. FWIW, you could place the numeric columns in a numeric map collection, an

Re: Help with approach to remove RDBMS schema from code to move to C*?

2014-09-19 Thread James Briggs
Most of the C* success stories are for greenfield applications. Migrating from one database to another database is a lot of work. C* offers no magical path. If you only have a few tables and minor RDBMS feature dependencies, it can be done. Make sure your users and QA people are cooperative fi

Re: Help with select IN query in cassandra

2014-09-01 Thread Subodh Nijsure
Thanks Michael I will certainly go with this approach for now. -Subodh On Mon, Sep 1, 2014 at 6:33 AM, Laing, Michael wrote: > This should work for your query requirements - 2 tables w same info because > disk is cheap and writes are fast so optimize for reads: > > CREATE TABLE sensor_asset ( >

Re: Help with select IN query in cassandra

2014-09-01 Thread Subodh Nijsure
Laing, Michael > Sent: Monday, September 1, 2014 11:34 AM > To: user@cassandra.apache.org > Subject: Re: Help with select IN query in cassandra > > Did the OP propose that? > > > On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky > wrote: >> >> One comment on del

Re: Help with select IN query in cassandra

2014-09-01 Thread Jack Krupansky
1, 2014 11:34 AM To: user@cassandra.apache.org Subject: Re: Help with select IN query in cassandra Did the OP propose that? On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky wrote: One comment on deletions – aren’t deletions kind of an anti-pattern for modern data processing, such as sensor

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
ging” rather than the exercise in > futility of doing a massive number of deletes and updates in place? > > -- Jack Krupansky > > *From:* Laing, Michael > *Sent:* Monday, September 1, 2014 9:33 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Help with select IN quer

Re: Help with select IN query in cassandra

2014-09-01 Thread Jack Krupansky
Krupansky From: Laing, Michael Sent: Monday, September 1, 2014 9:33 AM To: user@cassandra.apache.org Subject: Re: Help with select IN query in cassandra This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: CREATE TABLE

Re: Help with select IN query in cassandra

2014-09-01 Thread Laing, Michael
This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: CREATE TABLE sensor_asset ( asset_id text, event_time timestamp, tuuid timeuuid, sensor_reading map, sensor_serial_number text, sensor_type int, PRIMAR

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-09-01 Thread Sylvain Lebresne
On Sun, Aug 31, 2014 at 2:59 AM, Todd Nine wrote: > Hi all, > I'm working on transferring our thrift DAOs over to CQL. It's going > well, except for 2 cases that both use multi get. The use case is very > simple. It is a narrow row, by design, with only a few columns. When I > perform a mul

Re: Help with select IN query in cassandra

2014-08-31 Thread Subodh Nijsure
Thanks for your help Michael. If specifying asset_id would help I can construct queries that can include asset_id So I have been "playing" around with PRIMARY KEY definition and following table definition CREATE TABLE sensor_info_table ( asset_id text, event_time timestamp, "timestamp" tim

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Oh it must be late - I missed the fact that you didn't want to specify asset_id. The above queries will still work but you have to use 'allow filtering' - generally not a good idea. I'll look again in the morning. On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael wrote: > Hmm. Because the cluster

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Hmm. Because the clustering key is (event_time, "timestamp"), event_time must be specified as well - hopefully that info is available to the ux. Unfortunately you will then hit another problem with your query: you are selecting a collection field... this will not work with IN on "timestamp". So y

Re: Help with select IN query in cassandra

2014-08-31 Thread Subodh Nijsure
Not really event time stamp is created by the sensor when it reads data and timestamp is something server creates when inserting data into cassandra db. At later point in time my django ux allows users to browse this data and reference interesting data points via the timestamp field. The timestam

Re: Help with select IN query in cassandra

2014-08-31 Thread Laing, Michael
Are event_time and timestamp essentially representing the same datetime? On Sunday, August 31, 2014, Subodh Nijsure wrote: > I have following database schema > > CREATE TABLE sensor_info_table ( > asset_id text, > event_time timestamp, > "timestamp" timeuuid, > sensor_reading map, > se

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Todd Nine
Jack Krupansky wrote: > You might want to take a look at Titan, a graph database that can use > Cassandra as its storage engine, and see how it does these things. > > -- Jack Krupansky > > *From:* Todd Nine > *Sent:* Sunday, August 31, 2014 11:06 AM > *To:* user@cassandr

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Jack Krupansky
You might want to take a look at Titan, a graph database that can use Cassandra as its storage engine, and see how it does these things. -- Jack Krupansky From: Todd Nine Sent: Sunday, August 31, 2014 11:06 AM To: user@cassandra.apache.org Subject: Re: Help with migration from Thrift to CQL3

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael
You need to switch gears to new terminology as well - a "thrift row" is a partition now, etc... :) So yes - the *partition* key of the *table* would be scopeId, scopeType in my proposed scheme. But the partitions would be too big, given what you describe. You could shard the rows, but even then

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Todd Nine
Hey Michael, Thanks for the response. If I use the clustered columns in the way you described, won't that make the row key of the column family scopeId and scopeType? The scope fields represent a graph's owner. The graph itself can have several billion nodes in it. When a lot of deletes star

Re: Help with migration from Thrift to CQL3 on Cassandra 2.0.10

2014-08-31 Thread Laing, Michael
Actually I think you do want to use scopeId, scopeType as the partition key (and drop row caching until you upgrade to 2.1 where "rows" are in fact rows and not partitions): CREATE TABLE IF NOT EXISTS Graph_Marked_Nodes ( scopeId uuid, scopeType varchar, nodeId uuid, nodeType varchar, timestam

Re: Help with batch renaming legacy sstable files

2014-06-26 Thread Jens Rantil
Todd, 'rename' is a (perl?) command line utility that comes with many Linux distributions. It's not part of Cassandra. It's highly useful for renaming many files. The utility can also be installed using Homebrew on MacOSX. Cheers, Jens — Sent from Mailbox On Thu, Jun 26, 2014 at 6:51 PM,

Re: Help with batch renaming legacy sstable files

2014-06-26 Thread Robert Coli
On Thu, Jun 26, 2014 at 9:50 AM, Todd Nine wrote: > Robert. The data has been upgraded to 1.1, but it was created in a > 0.8 version. It's been steadily upgraded, so I believe sstable data > is in 1.1 format. However, they're still retaining the old naming > convention. > There's an automatic

Re: Help with batch renaming legacy sstable files

2014-06-26 Thread Todd Nine
Jens, Where is the "rename" utility? I don't see it in the 1.2.16 distribution. Robert. The data has been upgraded to 1.1, but it was created in a 0.8 version. It's been steadily upgraded, so I believe sstable data is in 1.1 format. However, they're still retaining the old naming convention.

Re: Help with batch renaming legacy sstable files

2014-06-26 Thread Robert Coli
On Wed, Jun 25, 2014 at 9:49 PM, Todd Nine wrote: > I'm working on migrating some data from 1.0.x clusters to a 1.2.16 > cluster. Part of my testing is (locally) loading the old 1.0 sstables > into my environment in 1.2.16. Since the 1.0 days, the file format > has changes from this format. >

Re: Help with batch renaming legacy sstable files

2014-06-25 Thread Hannu Kröger
Also, did you get to upgrade first to 1.1.x and and then to 1.2.x? That might smoothen the process. Hannu > On 26.6.2014, at 9.04, "Jens Rantil" wrote: > > Hi Todd, > > Maybe the "rename" command line utility could help you? > > Cheers, > Jens > — > Sent from Mailbox > > >> On Thu, Jun 26

Re: Help with batch renaming legacy sstable files

2014-06-25 Thread Jens Rantil
Hi Todd, Maybe the "rename" command line utility could help you? Cheers, Jens — Sent from Mailbox On Thu, Jun 26, 2014 at 6:50 AM, Todd Nine wrote: > Hey guys, > I'm working on migrating some data from 1.0.x clusters to a 1.2.16 > cluster. Part of my testing is (locally) loading the old

Re: Help me on Cassandra Data Modelling

2014-01-28 Thread Thunder Stumpges
Hey Naresh, Unfortunately I don't have any further advice. I keep feeling like you're looking at a search problem instead of a lookup problem. Perhaps Cassandra is not the right tool for your need in this case. Perhaps something with a full-text index type feature would help. Or perhaps someone m

Re: Help me on Cassandra Data Modelling

2014-01-28 Thread Naresh Yadav
please inputs on last email if any.. On Tue, Jan 28, 2014 at 7:18 AM, Naresh Yadav wrote: > yes thunder you are right, i had simplified that by moving *tags > *search(partial/exact) > in separate column family tagcombination which will act as index for all > search based on tags and in my my o

Re: Help me on Cassandra Data Modelling

2014-01-27 Thread Naresh Yadav
yes thunder you are right, i had simplified that by moving *tags *search(partial/exact) in separate column family tagcombination which will act as index for all search based on tags and in my my original metricresult table will store tagcombinationid and time in columns otherwise it was getting com

Re: Help me on Cassandra Data Modelling

2014-01-27 Thread Thunder Stumpges
Hey Naresh, You asked a similar question a week or two ago. It looks like you have simplified your needs quite a bit. Were you able to adjust your requirements or separate the issue? You had a complicated time dimension before, as well as a single "query" for multiple AND cases on tags. > c)

  1   2   3   >