AW: AssertionError on PasswordAuthenticator
Hi, thanks, that was the issue. I got distracted from too much debugging, when I hunted down Usergrid username/password instead of Cassandra Username/Password. This way I overlooked the error in simply copying the config file from the link that Nate pointed to. One closer look would have sufficed. Thanks for your support anyway. Best, Andreas Von: Nate McCall [mailto:n...@thelastpickle.com] Gesendet: Montag, 27. Juli 2015 23:36 An: Cassandra Users Betreff: Re: AssertionError on PasswordAuthenticator Any ideas what might be wrong or which prerequisites need to be met? This is the first request for a connection. Sam makes a good point. Make sure you have the username and password properties set in the configuration file: https://github.com/apache/incubator-usergrid/blob/master/stack/config/src/main/resources/usergrid-default.properties#L52-L53 See this page for details on configuration: http://usergrid.readthedocs.org/en/latest/deploy-local.html#install-and-configure-cassandra For Usergrid specific questions, feel free to stop by our mail list or IRC channel both of which are listed here: http://usergrid.incubator.apache.org/community/ -- - Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: question about bootstrapping sequence
I'm wondering how the Cassandra protocol brings a newly bootstrapped node "up to speed". for ease of illustration, let's say we just have one key, K, and the value is continually updated: 1,2 ,3 ,4 originally we have 1 node, A, now node B joins, and needs to bootstrap and get its newly assigned range (just "K") from A. now let's say A has seen updates 1,2,3 up to this point. according to the StreamingRequestVerbHandler , A does a flush of its memtable, then streams out the new sstables. but what while the (newly-flushed) sstable is being streamed out from A, before B fully received them, A now gets more updates: 4,5,6 ? now B gets the streamed range, and happily declares itself ready, and joins the ring. but now it's actually not "up to speed" with the "old members". cuz A now has a value K=6 while B has K=3 of course when clients query now, A's and B's results are reconciled, so client gets latest result. but would B stay forever "not up to speed" ? how can we make it up to speed? cuz although the following is a very hypothetical scenario, it will lead to lost writes: say B is still in the "not up to date " state, then another node is removed and a new node is inserted, then after more of such cycles, all the "up to date" nodes are gone, and we essentially lose the latest writes.
Re: High CPU load
I'm still struggling with finding the root cause for such CPU utilisation patterns. http://i58.tinypic.com/24pifcy.jpg After a 3 weeks after C* restart CPU utilisation is going through the roof, such situation isn't happening shortly after the restart (which is visible at the graph). C* is running on the machines with 16 cores, HEAP_NEWSIZE was set to 1600MB. Looking at gc logs I found: 2015-07-27T03:27:41.891+: 3354764.688: [ParNew: 1362069K->51348K(1474560K), 0.0942700 secs] 3268635K->1957914K(8224768K)After GC: 2015-07-27T03:27:41.999+: 3354764.797: [ParNew: 55327K->48503K(1474560K), 0.0935170 secs] 1961894K->1955069K(8224768K)After GC: 2015-07-27T03:27:43.464+: 3354766.261: [ParNew: 1359224K->54214K(1474560K), 0.0922640 secs] 3265790K->1960780K(8224768K)After GC: 2015-07-27T03:27:43.570+: 3354766.368: [ParNew: 56165K->45672K(1474560K), 0.0948220 secs] 1962732K->1952239K(8224768K)After GC: 2015-07-27T03:27:45.016+: 3354767.814: [ParNew: 1356393K->54245K(1474560K), 0.0922290 secs] 3262959K->1960811K(8224768K)After GC: 2015-07-27T03:27:45.121+: 3354767.919: [ParNew: 57866K->48553K(1474560K), 0.0928670 secs] 1964433K->1955119K(8224768K)After GC: 2015-07-27T03:27:46.590+: 3354769.387: [ParNew: 1359420K->48560K(1474560K), 0.0913750 secs] 3265986K->1955126K(8224768K)After GC: It's clear to me that ParNew needs to be executed when 1359420K out of 1474560K is taken. But why it's executed when only 57866K/1474560K - ~4% of young is occupied? SurvivorRatio is set to 8 in our case. Also what's bothering me is the time of CMS' concurreent sweep time: gc.log.9:2015-07-27T07:54:36.561+: 3370772.285: [CMS-concurrent-sweep: 19.710/20.600 secs] [Times: user=0.00 sys=317.60, real=20.60 secs] Isn't it quite high? Do you guys have any other thoughts? Which part of gc logs requires more attention? On Tue, Jul 21, 2015 at 4:22 PM, Marcin Pietraszek wrote: > Yup... it seems like it's gc fault > > gc logs > > 2015-07-21T14:19:54.336+: 2876133.270: Total time for which > application threads were stopped: 0.0832030 seconds > 2015-07-21T14:19:55.739+: 2876134.673: Total time for which > application threads were stopped: 0.0806960 seconds > 2015-07-21T14:19:57.149+: 2876136.083: Total time for which > application threads were stopped: 0.0806890 seconds > 2015-07-21T14:19:58.550+: 2876137.484: Total time for which > application threads were stopped: 0.0821070 seconds > 2015-07-21T14:19:59.941+: 2876138.875: Total time for which > application threads were stopped: 0.0802640 seconds > 2015-07-21T14:20:01.340+: 2876140.274: Total time for which > application threads were stopped: 0.0835670 seconds > 2015-07-21T14:20:02.744+: 2876141.678: Total time for which > application threads were stopped: 0.0842440 seconds > 2015-07-21T14:20:04.143+: 2876143.077: Total time for which > application threads were stopped: 0.0841630 seconds > 2015-07-21T14:20:05.541+: 2876144.475: Total time for which > application threads were stopped: 0.0839850 seconds > > Heap after GC invocations=2273737 (full 101): > par new generation total 1474560K, used 106131K > [0x0005fae0, 0x00065ee0, 0x00065ee0) > eden space 1310720K, 0% used [0x0005fae0, > 0x0005fae0, 0x00064ae0) > from space 163840K, 64% used [0x00064ae0, > 0x0006515a4ee0, 0x000654e0) > to space 163840K, 0% used [0x000654e0, > 0x000654e0, 0x00065ee0) > concurrent mark-sweep generation total 6750208K, used 1316691K > [0x00065ee0, 0x0007fae0, 0x0007fae0) > concurrent-mark-sweep perm gen total 49336K, used 29520K > [0x0007fae0, 0x0007fde2e000, 0x0008) > } > 2015-07-21T14:12:05.683+: 2875664.617: Total time for which > application threads were stopped: 0.0830280 seconds > {Heap before GC invocations=2273737 (full 101): > par new generation total 1474560K, used 1416851K > [0x0005fae0, 0x00065ee0, 0x00065ee0) > eden space 1310720K, 100% used [0x0005fae0, > 0x00064ae0, 0x00064ae0) > from space 163840K, 64% used [0x00064ae0, > 0x0006515a4ee0, 0x000654e0) > to space 163840K, 0% used [0x000654e0, > 0x000654e0, 0x00065ee0) > concurrent mark-sweep generation total 6750208K, used 1316691K > [0x00065ee0, 0x0007fae0, 0x0007fae0) > concurrent-mark-sweep perm gen total 49336K, used 29520K > [0x0007fae0, 0x0007fde2e000, 0x0008) > > It seems like eden heap space is being constantly occupied by > something which is later removed by gc... > > > On Mon, Jul 20, 2015 at 9:18 AM, Jason Wee wrote: >> just a guess, gc? >> >> On Mon, Jul 20, 2015 at 3:15 PM, Marcin Pietraszek >> wrote: >>> >>> Hello! >>> >>> I've noticed a strange CPU utilisation patterns on machines in our >>> cluster. After C* daemon restart it behaves in a normal way, a
cassandra-stress: Not enough replica available for query at consistency LOCAL_ONE (1 required but only 0 alive)
I'm running benchmark on a 2 nodes C* 2.1.8 cluster using cassandra-stress, with the default of CL =1 Stress runs fine for some time, and than start throwing: java.io.IOException: Operation x10 on key(s) [36333635504d4b343130]: Error executing: (UnavailableException): Not enough replica available for query at consistency LOCAL_ONE (1 required but only 0 alive) at org.apache.cassandra.stress.Operation.error(Operation.java:216) at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:188) at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:99) at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:107) at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:259) at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:309) The problem disappears when I decrease the number of client threads, but my goal is to test max performance, so lowering the bar defeat my purpose. Is this normal server push back under too much pressure? shouldn't the stress client slow down before this happened? Thanks Tzach
Re: question about bootstrapping sequence
On Tue, Jul 28, 2015 at 1:31 AM, Yang wrote: > I'm wondering how the Cassandra protocol brings a newly bootstrapped node > "up to speed". > Bootstrapping nodes get "extra" replicated copies of data for the range they are joining. So if before the bootstrap the nodes responsible for Key "X" are : A B D and you add node C "between" B and D which takes over a sub-set of their replicas, writes go to the set A,B,C,D for the duration. =Rob
Re: question about bootstrapping sequence
Thanks. but I don't think having more nodes in the example changes the issue I outlined. say u have just key "X", rf = 3, nodes A, B, D are responsible for "X". in stable mode, the updates X=1, 2, 3, goes to all 3 servers. then at this time, node C joins, bootstraps, gets the sstables from B. but on B, ***right after memtableswitch()***, updates X=4,5,6 arrive and update the new memtable (the same updates also go to A and D). then B continues to stream to C, and C gets its state to X=3. now node C declares itself ready, and D gives up ownership of key "X". but now the state of C and A, B are different. On Tue, Jul 28, 2015 at 12:40 PM, Robert Coli wrote: > On Tue, Jul 28, 2015 at 1:31 AM, Yang wrote: > >> I'm wondering how the Cassandra protocol brings a newly bootstrapped node >> "up to speed". >> > > Bootstrapping nodes get "extra" replicated copies of data for the range > they are joining. > > So if before the bootstrap the nodes responsible for Key "X" are : > > A B D > > and you add node C "between" B and D which takes over a sub-set of their > replicas, writes go to the set A,B,C,D for the duration. > > =Rob > >
Re: question about bootstrapping sequence
On Tue, Jul 28, 2015 at 1:01 PM, Yang wrote: > Thanks. but I don't think having more nodes in the example changes the > issue I outlined. > > say u have just key "X", rf = 3, nodes A, B, D are responsible for "X". > > in stable mode, the updates X=1, 2, 3, goes to all 3 servers. > > then at this time, node C joins, bootstraps, gets the sstables from B. but > on B, ***right after memtableswitch()***, updates X=4,5,6 arrive and update > the new memtable (the same updates also go to A and D). then B continues to > stream to C, and C gets its state to X=3. > You appear to be missing the point in my original mail : the memtable switch is irrelevant, because C is receiving the same writes into memtables that B is. They're not counted for the purposes of consistency, but they are otherwise received just as if C were a an actual replica. Bootstrapping is two parts : 1) streaming of sstables 2) "extra" replication Your mental model appears to ignore 2), which is why you care what flushed? Perhaps I am still misunderstanding the scenario you are describing? =Rob
Re: question about bootstrapping sequence
thanks. hmmm somehow I had the impression that untill B's streamingIn finished it does not adverise itself to other servers for receiving fresh replications. looks I'm wrong here, ler me check the code.. On Jul 28, 2015 2:07 PM, "Robert Coli" wrote: > On Tue, Jul 28, 2015 at 1:01 PM, Yang wrote: > >> Thanks. but I don't think having more nodes in the example changes the >> issue I outlined. >> >> say u have just key "X", rf = 3, nodes A, B, D are responsible for "X". >> >> in stable mode, the updates X=1, 2, 3, goes to all 3 servers. >> >> then at this time, node C joins, bootstraps, gets the sstables from B. >> but on B, ***right after memtableswitch()***, updates X=4,5,6 arrive and >> update the new memtable (the same updates also go to A and D). then B >> continues to stream to C, and C gets its state to X=3. >> > > You appear to be missing the point in my original mail : the memtable > switch is irrelevant, because C is receiving the same writes into memtables > that B is. > > They're not counted for the purposes of consistency, but they are > otherwise received just as if C were a an actual replica. > > Bootstrapping is two parts : > > 1) streaming of sstables > 2) "extra" replication > > Your mental model appears to ignore 2), which is why you care what > flushed? Perhaps I am still misunderstanding the scenario you are > describing? > > =Rob > >
Re: Cassandra WriteTimeoutException
Are you using light weight transactions anywhere? On Wed, Jul 15, 2015 at 7:40 AM, Michael Shuler wrote: > On 07/15/2015 02:28 AM, Amlan Roy wrote: > >> Hi, >> >> I get the following error intermittently while writing to Cassandra. >> I am using version 2.1.7. Not sure how to fix the actual issue >> without increasing the timeout in cassandra.yaml. >> > > > > Post your data model, query, and maybe some cluster config basics for > better help. Increasing the timeout is never a great answer.. > > -- > Kind regards, > Michael >
Re: Reduced write performance when reading
Increase memtable_flush_writers. In cassandra.yaml, it was recommended to increase this setting when SSDs used for storing data. On Fri, Jul 24, 2015 at 1:55 PM, Soerian Lieve wrote: > I was on CFQ so I changed it to noop. The problem still persisted however. > Do you have any other ideas? > > On Thu, Jul 23, 2015 at 5:00 PM, Jeff Ferland wrote: > >> Imbalanced disk use is ok in itself. It’s only saturated throughput >> that’s harmful. RAID 0 does give more consistent throughput and balancing, >> but that’s another story. >> >> As for your situation with SSD drive, you can probably tweak this by >> changing the scheduler is set to noop, or read up on >> https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt for >> the deadline scheduler (lower writes_starved value). If you’re one CFQ, >> definitely ditch it. >> >> -Jeff >> >> On Jul 23, 2015, at 4:17 PM, Soerian Lieve wrote: >> >> I set up RAID0 after experiencing highly imbalanced disk usage with a >> JBOD setup so my transaction logs are indeed on the same media as the >> sstables. >> Is there any alternative to setting up RAID0 that doesn't have this issue? >> >> On Thu, Jul 23, 2015 at 4:03 PM, Jeff Ferland >> wrote: >> >>> My immediate guess: your transaction logs are on the same media as your >>> sstables and your OS prioritizes read requests. >>> >>> -Jeff >>> >>> > On Jul 23, 2015, at 2:51 PM, Soerian Lieve >>> wrote: >>> > >>> > Hi, >>> > >>> > I am currently performing benchmarks on Cassandra. Independently from >>> each other I am seeing ~100k writes/sec and ~50k reads/sec. When I read and >>> write at the same time, writing drops down to ~1000 writes/sec and reading >>> stays roughly the same. >>> > >>> > The heap used is the same as when only reading, as is the disk >>> utilization. Replication factor is 3, consistency level on both reads and >>> writes is ONE. Using Cassandra 2.1.6. All cassandra.yaml settings set up >>> according to the Datastax guide. All nodes are running on SSDs. >>> > >>> > Any ideas what could cause this? >>> > >>> > Thanks, >>> > Soerian >>> >>> >> >> >
Re: Reduced write performance when reading
I did already set that to the number of cores of the machines (24), but it made no difference. On Tue, Jul 28, 2015 at 4:44 PM, Bharatendra Boddu wrote: > Increase memtable_flush_writers. In cassandra.yaml, it was recommended to > increase this setting when SSDs used for storing data. > > On Fri, Jul 24, 2015 at 1:55 PM, Soerian Lieve > wrote: > >> I was on CFQ so I changed it to noop. The problem still persisted >> however. Do you have any other ideas? >> >> On Thu, Jul 23, 2015 at 5:00 PM, Jeff Ferland >> wrote: >> >>> Imbalanced disk use is ok in itself. It’s only saturated throughput >>> that’s harmful. RAID 0 does give more consistent throughput and balancing, >>> but that’s another story. >>> >>> As for your situation with SSD drive, you can probably tweak this by >>> changing the scheduler is set to noop, or read up on >>> https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt for >>> the deadline scheduler (lower writes_starved value). If you’re one CFQ, >>> definitely ditch it. >>> >>> -Jeff >>> >>> On Jul 23, 2015, at 4:17 PM, Soerian Lieve wrote: >>> >>> I set up RAID0 after experiencing highly imbalanced disk usage with a >>> JBOD setup so my transaction logs are indeed on the same media as the >>> sstables. >>> Is there any alternative to setting up RAID0 that doesn't have this >>> issue? >>> >>> On Thu, Jul 23, 2015 at 4:03 PM, Jeff Ferland >>> wrote: >>> My immediate guess: your transaction logs are on the same media as your sstables and your OS prioritizes read requests. -Jeff > On Jul 23, 2015, at 2:51 PM, Soerian Lieve wrote: > > Hi, > > I am currently performing benchmarks on Cassandra. Independently from each other I am seeing ~100k writes/sec and ~50k reads/sec. When I read and write at the same time, writing drops down to ~1000 writes/sec and reading stays roughly the same. > > The heap used is the same as when only reading, as is the disk utilization. Replication factor is 3, consistency level on both reads and writes is ONE. Using Cassandra 2.1.6. All cassandra.yaml settings set up according to the Datastax guide. All nodes are running on SSDs. > > Any ideas what could cause this? > > Thanks, > Soerian >>> >>> >> >
Re: RE: Manual Indexing With Buckets
Any more thoughts ? Anyone? Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Sat, 25 Jul, 2015 at 5:14 pm Subject:Re: RE: Manual Indexing With Buckets We are in product development and batch size depends on the customer base of customer buying our product. Huge customers buying product may have huge batches while small customers may have much smaller ones. So we dont know upgront how many buckets per batch would be required and we dont wanna ask for additional configuration from our customer to input average batch size. So, we are planning to use dynamic bucketing. Every row in primary is associated with only one batch. Comments required on the following: 1. I want to know any suggestios on proposed design? 2. Whats the best approach for updating/deleting from index table. When a row is manually purged from primary table, we dont know where that row key exists in x number of buckets created for its batch id? Thanks Anuj Sent from Yahoo Mail on Android From:"sean_r_dur...@homedepot.com" Date:Fri, 24 Jul, 2015 at 5:39 pm Subject:RE: Manual Indexing With Buckets It is a bit hard to follow. Perhaps you could include your proposed schema (annotated with your size predictions) to spur more discussion. To me, it sounds a bit convoluted. Why is a “batch” so big (up to 100 million rows)? Is a row in the primary only associated with one batch? Sean Durity – Cassandra Admin, Big Data Team To engage the team, create a request From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in] Sent: Friday, July 24, 2015 3:57 AM To: user@cassandra.apache.org Subject: Re: Manual Indexing With Buckets Can anyone take this one? Thanks Anuj Sent from Yahoo Mail on Android From:"Anuj Wadehra" Date:Thu, 23 Jul, 2015 at 10:57 pm Subject:Manual Indexing With Buckets We have a primary table and we need search capability by batchid column. So we are creating a manual index for search by batch id. We are using buckets to restrict a row size in batch id index table to 50mb. As batch size may vary drastically ( ie one batch id may be associated to 100k row keys in primary table while other may be associated with 100million row keys), we are creating a metadata table to track the approximate data while insertions for a batch in primary table, so that batch id index table has dynamic no of buckets/rows. As more data is inserted for a batch in primary table, new set of 10 buckets are added. At any point in time, clients will write to latest 10 buckets created for a batch od index in round robin to avoid hotspots. Comments required on the following: 1. I want to know any suggestios on above design? 2. Whats the best approach for updating/deleting from index table. When a row is manually purged from primary table, we dont know where that row key exists in x number of buckets created for its batch id? Thanks Anuj Sent from Yahoo Mail on Android The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Is there a way to remove a node with Opscenter?
I know this is an old thread but just FYI for others having the same problem (OpsCenter trying to connect to node that is already removed)...the solution is to ssh into the OpsCenter node and run `sudo service opscenterd restart` On Thu, Jul 9, 2015 at 3:52 PM, Sid Tantia wrote: > Found my mistake: I was typing the command on the node I was trying to > remove from the cluster. After trying the command on another node in the > cluster, it worked (`nodetool status` shows the node as removed), however > OpsCenter still does not recognize the node as removed. > > Any ways to fix OpsCenter so that it stops trying to connect to the node > that is already removed? > > > > On Tue, Jul 7, 2015 at 11:38 PM, Jean Tremblay < > jean.tremb...@zen-innovations.com> wrote: > >> When you do a nodetool command and you don’t specify a hostname, it sends >> the requests via JMX to the localhost node. If that node is down then the >> command will not succeed. >> In your case you are probably doing the command from a machine which has >> not cassandra running, in that case you need to specify a node with the >> switch -h. >> >> So for your that would be: >> >> nodetool -h removenode >> >> where is the address of a server which has cassandra >> daemon running. >> >> Cheers >> >> Jean >> >> On 08 Jul 2015, at 01:39 , Sid Tantia >> wrote: >> >> I tried both `nodetool remove node ` and `nodetool decommission` >> and they both give the error: >> >> nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: >> 'Connection refused’. >> >> Here is what I have tried to fix this: >> >> 1) Uncommented JVM_OPTS=”$JVM_OPTS -Djava.rmi.server.hostname=> name>” >> 2) Changed rpc_address to 0.0.0.0 >> 3) Restarted cassandra >> 4) Restarted datastax-agent >> >> (Note that I installed my cluster using opscenter so that may have >> something to do with it? ) >> >> >> >> On Tue, Jul 7, 2015 at 2:08 PM, Surbhi Gupta >> wrote: >> >>> If node is down use : >>> >>> nodetool removenode >>> >>> We have to run the below command when the node is down & if the cluster >>> does not use vnodes, before running the nodetool removenode command, adjust >>> the tokens. >>> >>> If the node is up, then the command would be “nodetool decommission” to >>> remove the node. >>> >>> Remove the node from the “seed list” within the configuration >>> cassandra.yaml. >>> >>> On 7 July 2015 at 12:56, Sid Tantia >>> wrote: >>> Thanks for the response. I’m trying to remove a node that’s already down for some reason so its not allowing me to decommission it, is there some other way to do this? On Tue, Jul 7, 2015 at 12:45 PM, Kiran mk wrote: > Yes, if your intension is to decommission a node. You can do that > by clicking on the node and decommission. > > Best Regards, > Kiran.M.K. > On Jul 8, 2015 1:00 AM, "Sid Tantia" > wrote: > >> I know you can use `nodetool removenode` from the command line but >> is there a way to remove a node from a cluster using OpsCenter? >> >> >>> >> >> >
Re: Manual Indexing With Buckets
On 07/28/2015 07:54 PM, Anuj Wadehra wrote: Any more thoughts ? Anyone? You could help others try to help you by including details, as previously asked: *From*:"sean_r_dur...@homedepot.com" *Date*:Fri, 24 Jul, 2015 at 5:39 pm It is a bit hard to follow. Perhaps you could include your proposed schema (annotated with your size predictions) to spur more discussion. To me, it sounds a bit convoluted. Why is a “batch” so big (up to 100 million rows)? Is a row in the primary only associated with one batch? Sean Durity – Cassandra Admin, Big Data Team -- Kind regards, Michael