how to determine RF on the fly ?
Hello! is there easy way to determine current RF, for instance, via mx4j ? Cheers, Ilya Shipitsin
Re: High performance hardware with lot of data per node - Global learning about configuration
This comment and some testing were enough for us. "Generally, a value between 128 and 512 here coupled with a large key cache size on CFs results in the best trade offs. This value is not often changed, however if you have many very small rows (many to an OS page), then increasing this will often lower memory usage without a impact on performance." And indeed, I started using this config in only one node without seeing any performance degradation. Mean reads latency was around 4 ms in all the servers, including this one. And I had no more heap full. Heap used now goes from 2.5 GB to 5.5 GB increasing slowly instead of getting stuck around 5.0 GB and 6.5GB (out of 8GB Heap). All the graph I could see while having both configurations (128/512) on different servers were almost the same, excepted about the Heap. So 512 was a lot better in our case. Hope it will help you, since it was also the purpose of this thread. Alain 2013/7/9 Mike Heffner > I'm curious because we are experimenting with a very similar > configuration, what basis did you use for expanding the index_interval to > that value? Do you have before and after numbers or was it simply reduction > of the heap pressure warnings that you looked for? > > thanks, > > Mike > > > On Tue, Jul 9, 2013 at 10:11 AM, Alain RODRIGUEZ wrote: > >> Hi, >> >> Using C*1.2.2. >> >> We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks) >> servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers >> instead, for about the same price. >> >> We tried it after reading some benchmark published by Netflix. >> >> It is awesome and I recommend it to anyone who is using more than 18 >> xLarge server or can afford these high cost / high performance EC2 >> instances. SSD gives a very good throughput with an awesome latency. >> >> Yet, we had about 200 GB data per server and now about 1 TB. >> >> To alleviate memory pressure inside the heap I had to reduce the index >> sampling. I changed the index_interval value from 128 to 512, with no >> visible impact on latency, but a great improvement inside the heap which >> doesn't complain about any pressure anymore. >> >> Is there some more tuning I could use, more tricks that could be useful >> while using big servers, with a lot of data per node and relatively high >> throughput ? >> >> SSD are at 20-40 % of their throughput capacity (according to OpsCenter), >> CPU almost never reach a bigger load than 5 or 6 (with 16 CPU), 15 GB RAM >> used out of 60GB. >> >> At this point I have kept my previous configuration, which is almost the >> default one from the Datastax community AMI. There is a part of it, you can >> consider that any property that is not in here is configured as default : >> >> cassandra.yaml >> >> key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88 % >> and 92 %, good enough ?) >> row_cache_size_in_mb: 0 (not usable in our use case, a lot of different >> and random reads) >> flush_largest_memtables_at: 0.80 >> reduce_cache_sizes_at: 0.90 >> >> concurrent_reads: 32 (I am thinking to increase this to 64 or more since >> I have just a few servers to handle more concurrence) >> concurrent_writes: 32 (I am thinking to increase this to 64 or more too) >> memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I >> use bigger value, why for ?) >> >> rpc_server_type: sync (I tried hsha and had the "ERROR 12:02:18,971 Read >> an invalid frame size of 0. Are you using TFramedTransport on the client >> side?" error). No idea how to fix this, and I use 5 different clients for >> different purpose (Hector, Cassie, phpCassa, Astyanax, Helenus)... >> >> multithreaded_compaction: false (Should I try enabling this since I now >> use SSD ?) >> compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32 or >> even more) >> >> cross_node_timeout: true >> endpoint_snitch: Ec2MultiRegionSnitch >> >> index_interval: 512 >> >> cassandra-env.sh >> >> I am not sure about how to tune the heap, so I mainly use defaults >> >> MAX_HEAP_SIZE="8G" >> HEAP_NEWSIZE="400M" (I tried with higher values, and it produced bigger >> GC times (1600 ms instead of < 200 ms now with 400M) >> >> -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC >> -XX:+CMSParallelRemarkEnabled >> -XX:SurvivorRatio=8 >> -XX:MaxTenuringThreshold=1 >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+UseCMSInitiatingOccupancyOnly >> >> Does this configuration seems coherent ? Right now, performance are >> correct, latency < 5ms almost all the time. What can I do to handle more >> data per node and keep these performances or get even better once ? >> >> I know this is a long message but if you have any comment or insight even >> on part of it, don't hesitate to share it. I guess this kind of comment on >> configuration is usable by the entire community. >> >> Alain >> >> > > > -- > > Mike Heffner > Librato, Inc. > >
Re: Purpose of BLOB datatype
fine, thanks. On Tue, Jul 9, 2013 at 11:24 PM, Pavel Kirienko < pavel.kirienko.l...@gmail.com> wrote: > > Do you know any direct ways in CQL to handle BLOB, just like DataStax > Java driver? > > Well, CQL3 specification explicitly says that there is no way to encode > blob into CQL request other than HEX string: > http://cassandra.apache.org/doc/cql3/CQL.html#constants > > > > On Tue, Jul 9, 2013 at 6:40 PM, Ollif Lee wrote: > >> Thank you for your patience. That is what I have expected. >> PS. Do you know any direct ways in CQL to handle BLOB, just like DataStax >> Java driver? >> >> >> On Tue, Jul 9, 2013 at 4:53 PM, Sylvain Lebresne wrote: >> >>> > Pls explain why and how. >>> >>> Why and how what? >>> >>> Not encoding blobs into strings is the "preferred way" because that's >>> obviously >>> more efficient (in speed and space), since you don't do any encoding >>> pass. >>> >>> As for how, "use prepared statement" was the "how". What are the exact >>> lines of >>> code to use to do prepared statements will depends on the client driver >>> you >>> use, and you should check your driver documentation. >>> >>> But, to give you an example, if you use the DataStax Java driver >>> (https://github.com/datastax/java-driver), this might look something >>> like: >>> >>> PreparedStatement st = session.prepare("INSERT INTO foo(myKey, myBlob) >>> VALUES (?, ?)"); >>> String myKey = ...; >>> ByteBuffer myBlob = ...; >>> session.execute(st.bind(myKey, myBlob)); >>> >>> >>> -- >>> Sylvain >>> >> >> >
manually removing sstable
Hi, I think I remember reading that if you have sstables that you know contain only data that whose ttl has expired, it's safe to remove them manually by stopping c*, removing the *-Data.db files and then starting up c* again. is this correct? we have a cluster where everything is written with a ttl, and sometimes c* needs to compact over a 100 gb of sstables where we know ever has expired, and we'd rather just manually get rid of those. T#
Re: General doubts about bootstrap
> => Adding a new node between other nodes would avoid running move, but the ring would be unbalanced, right? Would this imply in having a node (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overloaded? I'm refering http://wiki.apache.org/cassandra/Operations#Load_balancing > > >> >> Yes, if you're using a single vnode per server, or are running an older version of Cassandra. For lowest impact, doubling the size of your cluster is recommended so that you can avoid doing moves. Or if you're on Cassandra 1.2+, you can use vnodes, and you should not typically need to rebalance after bringing a new server online. On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix < rodrigofelixdealme...@gmail.com> wrote: > Thank you very much for you response. Follows my comments about your email. > > Att. > > *Rodrigo Felix de Almeida* > LSBD - Universidade Federal do Ceará > Project Manager > MBA, CSM, CSPO, SCJP > > > On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli wrote: > >> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix < >> rodrigofelixdealme...@gmail.com> wrote: >> >>> >>>- Is it normal to take about 9 minutes to add a new node? Follows >>>the log generated by a script to add a new node. >>> >>> Sure. => OK >> >>> >>>- Is there a way to reduce the time to start cassandra? >>> >>> Not usually. => OK >> >>> >>>- Sometimes cleanup operation takes make minutes (about 10). Is this >>>normal since the amount of data is small (1.7gb at maximum / seed)? >>> >>> Compaction is throttled, and cleanup is a type of compaction. Bootstrap >> is also throttled via the streaming throttle. => OK >> >>> >>>- Considering that I have two seeds in the beginning, their tokens >>>are 0 and 85070591730234615865843651857942052864. When I add a new >>> machine, >>>do I need to execute move and cleanup on both seeds? Nowadays, I'm >>> running >>>cleanup on seed 0, move + cleanup on the other seed and neither move nor >>>cleanup on the just added node. Is this OK? >>> >>> Only nodes which have "lost" ranges need to run cleanup. In general you >> should add new nodes "between" other nodes such that "move" is not required >> at all. >> > > => Adding a new node between other nodes would avoid running move, but the > ring would be unbalanced, right? Would this imply in having a node (with > bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing > 3 nodes) overloaded? I'm refering > http://wiki.apache.org/cassandra/Operations#Load_balancing > >> >>>- What if I do not run cleanup in any existing node when adding or >>>removing a node? Is the data that was not "cleaned up" still available >>> if I >>>send a scan, for instance, and the scan range is still in the node but it >>>wouldn't be there if I had run cleanup? Data would be gather from other >>>node, ie. the one that properly has the range specified in the scan >>> query? >>> >>> If data for range [x] is on node [a] but node [a] is no longer >> considered an endpoint for range [x], it will never receive a request to >> serve range [x]. => OK >> >>> >>>- After decommissioning a node, is it advisable to run cleanup in >>>the remaining nodes? The consequences of not to run are the same of not >>> to >>>run when adding a node? >>> >>> Cleanup is only for the node which lost a range. In decommission case, >> no live nodes lost a range, only some nodes gained one. => OK >> >> =Rob >> > >
Re: manually removing sstable
Theo, We have several CFs that we TTL all columns, set gc_grace=0 and we never overwrite or delete records. We will manually remove sstables from disk during a rolling C* restart process. You'll also want to remove all associated index/filter files with the sst -- so if foo-hf-123-Data.db is > TTL, ensure you remove all foo-hf-123-*. I recommend taking a snapshot beforehand to be safe. ;-) Mike On Wed, Jul 10, 2013 at 8:09 AM, Theo Hultberg wrote: > Hi, > > I think I remember reading that if you have sstables that you know contain > only data that whose ttl has expired, it's safe to remove them manually by > stopping c*, removing the *-Data.db files and then starting up c* again. is > this correct? > > we have a cluster where everything is written with a ttl, and sometimes c* > needs to compact over a 100 gb of sstables where we know ever has expired, > and we'd rather just manually get rid of those. > > T# > -- Mike Heffner Librato, Inc.
Re: manually removing sstable
On Wed, Jul 10, 2013 at 5:09 AM, Theo Hultberg wrote: > I think I remember reading that if you have sstables that you know contain > only data that whose ttl has expired, it's safe to remove them manually by > stopping c*, removing the *-Data.db files and then starting up c* again. is > this correct? > Yes. > we have a cluster where everything is written with a ttl, and sometimes c* > needs to compact over a 100 gb of sstables where we know ever has expired, > and we'd rather just manually get rid of those. > Have you considered TRUNCATE oriented approaches to this problem? I believe that TRUNCATE (with proper handling/purging of snapshots) oriented approaches have potential for cases where 100% of data in a given time window becomes worthless. =Rob
Quorum reads and response time
I have a 3 node cluster with RF=3. All nodes are running. I have a table with 39 rows and ~44,000 columns evenly spread across 39 rows. When I do range slice query on this table with consistency of one, it returns the data back in about ~600 ms. I tried the same from all of the 3 nodes,no matter which node I ran it from, queries were answered in 600 ms for consistency level of one. But when I run the same query with consistency level as Quorum, it is taking ~2.3 seconds. It feels as if querying of the nodes are in sequence. Is this normal? -- Regards, Baskar Duraikannu
Re: Quorum reads and response time
Just adding few other details to my question. - We are using RandomPartitioner - 256 virtual nodes configured. On Wed, Jul 10, 2013 at 12:54 PM, Baskar Duraikannu < baskar.duraikannu...@gmail.com> wrote: > I have a 3 node cluster with RF=3. All nodes are running. I have a table > with 39 rows and ~44,000 columns evenly spread across 39 rows. > > When I do range slice query on this table with consistency of one, it > returns the data back in about ~600 ms. I tried the same from all of the > 3 nodes,no matter which node I ran it from, queries were answered in 600 ms > for consistency level of one. > > But when I run the same query with consistency level as Quorum, it is > taking ~2.3 seconds. It feels as if querying of the nodes are in sequence. > > > Is this normal? > > -- > Regards, > Baskar Duraikannu > >
Re: Node tokens / data move
I copied the sstables and then ran a repair. It worked. Looks like export and import may have been much faster given that we had very little data. Thanks everyone. On Tue, Jul 9, 2013 at 1:34 PM, sankalp kohli wrote: > Hi Aaron, > Can he not specify all 256 tokens in the YAML of the new > cluster and then copy sstables? > I know it is a bit ugly but should work. > > Sankalp > > > On Tue, Jul 9, 2013 at 3:19 AM, Baskar Duraikannu < > baskar.duraikannu...@gmail.com> wrote: > >> Thanks Aaron >> >> On 7/9/13, aaron morton wrote: >> >> Can I just copy data files for the required keyspaces, create schema >> >> manually and run repair? >> > If you have something like RF 3 and 3 nodes then yes, you can copy the >> data >> > from one node in the source cluster to all nodes in the dest cluster >> and use >> > cleanup to remove the unneeded data. Because each node in the source >> cluster >> > has a full copy of the data. >> > >> > If that's not the case you cannot copy the data files, even if they >> have the >> > same number of nodes, because the nodes in the dest cluster will have >> > different tokens. AFAIK you need to export the full data set from the >> source >> > DC and then import it into the dest system. >> > >> > The Bulk Load utility may be of help >> > http://www.datastax.com/docs/1.2/references/bulkloader . You could >> copy the >> > SSTables from every node in the source system and bulk load them into >> the >> > dest system. That process will ensure rows are sent to nodes that are >> > replicas. >> > >> > Cheers >> > >> > - >> > Aaron Morton >> > Freelance Cassandra Consultant >> > New Zealand >> > >> > @aaronmorton >> > http://www.thelastpickle.com >> > >> > On 9/07/2013, at 12:45 PM, Baskar Duraikannu >> > wrote: >> > >> >> We have two clusters used by two different groups with vnodes enabled. >> Now >> >> there is a need to move some of the keyspaces from cluster 1 to >> cluster 2. >> >> >> >> >> >> Can I just copy data files for the required keyspaces, create schema >> >> manually and run repair? >> >> >> >> Anything else required? Please help. >> >> -- >> >> Thanks, >> >> Baskar Duraikannu >> > >> > >> > >
JMX Latency stats
I was wondering if anyone knows the difference between the JMX latency stats and could enlighten me. We've been looking the column family specific stats and see really lovely < 3ms 99th percentile stats for all our families. org.apache.cassandra.metrics:type=ColumnFamily,keyspace=mykeyspace,scope=myc olumnfamily,name=ReadLatency Now, when we look at the overall client request read latency stats we see a far more inconsistent jagged 99th percentile flying between 5ms - 80ms org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency Thanks Chris
Re: manually removing sstable
yep that works, you need to remove all components of the sstable though, not just -Data.db and, in 2.0 there is this: https://issues.apache.org/jira/browse/CASSANDRA-5228 /Marcus On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg wrote: > Hi, > > I think I remember reading that if you have sstables that you know contain > only data that whose ttl has expired, it's safe to remove them manually by > stopping c*, removing the *-Data.db files and then starting up c* again. is > this correct? > > we have a cluster where everything is written with a ttl, and sometimes c* > needs to compact over a 100 gb of sstables where we know ever has expired, > and we'd rather just manually get rid of those. > > T# >
Re: JMX Latency stats
The column family specific numbers are reporting latencies local to the node. So a write/read that has reached the correct replica and just needs to hit memory/disk. The non column family specific numbers are reporting latencies from the coordinator. So the latency from the time the coordinator receives a write/read request, contacts the right replica(s), receives an internal response and responds to the client. On Wed, Jul 10, 2013 at 12:27 PM, Christopher Wirt wrote: > I was wondering if anyone knows the difference between the JMX latency > stats and could enlighten me. > > ** ** > > We’ve been looking the column family specific stats and see really lovely > < 3ms 99th percentile stats for all our families. > > > org.apache.cassandra.metrics:type=ColumnFamily,keyspace=mykeyspace,scope=mycolumnfamily,name=ReadLatency > > > ** ** > > Now, when we look at the overall client request read latency stats we see > a far more inconsistent jagged 99th percentile flying between 5ms – 80ms * > *** > > org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency*** > * > > ** ** > > ** ** > > Thanks > > ** ** > > Chris > > ** ** > > ** ** > > ** ** > > ** ** >
Cassandra performance tuning...
Hi All, I am trying to compare Cassandra to another relational database. I am getting around 2-3msec response time using Datastax driver, Java 1.7.0_05 64-bit jre and the other database is under 500 microseconds for the jdbc SQL preparedStatement execute.. One of the major differences is Cassandra uses text for the default primary key in the Column family and the SQL table I use int which is faster. Can the primary column family key data type be changed to a int? I also know Casandra uses varint for IntegerType and not sure that will be what I need but I will try it if I can change "key" column to that. If I try Int32Type for the primary key I suspect I will need to reload the data after that change. I have looked at the default Java Options in the Cassandra bat file and they seem a good starting point but I am just starting to tune now that I can get Column Family caching to work. Regards, -Tony
Re: General doubts about bootstrap
Currently, I'm using cassandra 1.1.5, but I'm considering to update to 1.2.x in order to make use of vnodes. Doubling the size is not possible to me because I want to measure the response while adding (or removing) single nodes. Thank you guys. It help me a lot to understand better how cassandra works. Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do Ceará Project Manager MBA, CSM, CSPO, SCJP On Wed, Jul 10, 2013 at 11:11 AM, Eric Stevens wrote: > > => Adding a new node between other nodes would avoid running move, but > the ring would be unbalanced, right? Would this imply in having a node > (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, > supposing 3 nodes) overloaded? I'm refering > http://wiki.apache.org/cassandra/Operations#Load_balancing >> >> >>> >>> Yes, if you're using a single vnode per server, or are running an older > version of Cassandra. For lowest impact, doubling the size of your cluster > is recommended so that you can avoid doing moves. Or if you're on > Cassandra 1.2+, you can use vnodes, and you should not typically need to > rebalance after bringing a new server online. > > > On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix < > rodrigofelixdealme...@gmail.com> wrote: > >> Thank you very much for you response. Follows my comments about your >> email. >> >> Att. >> >> *Rodrigo Felix de Almeida* >> LSBD - Universidade Federal do Ceará >> Project Manager >> MBA, CSM, CSPO, SCJP >> >> >> On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli wrote: >> >>> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix < >>> rodrigofelixdealme...@gmail.com> wrote: >>> - Is it normal to take about 9 minutes to add a new node? Follows the log generated by a script to add a new node. Sure. => OK >>> - Is there a way to reduce the time to start cassandra? Not usually. => OK >>> - Sometimes cleanup operation takes make minutes (about 10). Is this normal since the amount of data is small (1.7gb at maximum / seed)? Compaction is throttled, and cleanup is a type of compaction. Bootstrap >>> is also throttled via the streaming throttle. => OK >>> - Considering that I have two seeds in the beginning, their tokens are 0 and 85070591730234615865843651857942052864. When I add a new machine, do I need to execute move and cleanup on both seeds? Nowadays, I'm running cleanup on seed 0, move + cleanup on the other seed and neither move nor cleanup on the just added node. Is this OK? Only nodes which have "lost" ranges need to run cleanup. In general you >>> should add new nodes "between" other nodes such that "move" is not required >>> at all. >>> >> >> => Adding a new node between other nodes would avoid running move, but >> the ring would be unbalanced, right? Would this imply in having a node >> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, >> supposing 3 nodes) overloaded? I'm refering >> http://wiki.apache.org/cassandra/Operations#Load_balancing >> >>> - What if I do not run cleanup in any existing node when adding or removing a node? Is the data that was not "cleaned up" still available if I send a scan, for instance, and the scan range is still in the node but it wouldn't be there if I had run cleanup? Data would be gather from other node, ie. the one that properly has the range specified in the scan query? If data for range [x] is on node [a] but node [a] is no longer >>> considered an endpoint for range [x], it will never receive a request to >>> serve range [x]. => OK >>> - After decommissioning a node, is it advisable to run cleanup in the remaining nodes? The consequences of not to run are the same of not to run when adding a node? Cleanup is only for the node which lost a range. In decommission case, >>> no live nodes lost a range, only some nodes gained one. => OK >>> >>> =Rob >>> >> >> >
Re: how to determine RF on the fly ?
On Wed, Jul 10, 2013 at 12:58 AM, Илья Шипицин wrote: > is there easy way to determine current RF, for instance, via mx4j ? > The methods which show keyspace or schema (from CLI or cqlsh) show the replication factor, as the replication factor is a keyspace property. I don't believe it's available via JMX, but there's no reason it couldn't be... =Rob
Re: Quorum reads and response time
The coordinator node has to merge the results from 2 nodes and the request is done in parallel. I have seen lot of GC pressure with range queries because of tombstones. Can you see logs to see if there is lot of GC going on. Also try to have GC log enabled. On Wed, Jul 10, 2013 at 9:57 AM, Baskar Duraikannu < baskar.duraikannu...@gmail.com> wrote: > Just adding few other details to my question. > > - We are using RandomPartitioner > - 256 virtual nodes configured. > > > On Wed, Jul 10, 2013 at 12:54 PM, Baskar Duraikannu < > baskar.duraikannu...@gmail.com> wrote: > >> I have a 3 node cluster with RF=3. All nodes are running. I have a table >> with 39 rows and ~44,000 columns evenly spread across 39 rows. >> >> When I do range slice query on this table with consistency of one, it >> returns the data back in about ~600 ms. I tried the same from all of the >> 3 nodes,no matter which node I ran it from, queries were answered in 600 ms >> for consistency level of one. >> >> But when I run the same query with consistency level as Quorum, it is >> taking ~2.3 seconds. It feels as if querying of the nodes are in sequence. >> >> >> Is this normal? >> >> -- >> Regards, >> Baskar Duraikannu >> >> >
node tool ring displays 33.33% owns on 3 node cluster with replication
Hello, I recently upgraded cassandra from 1.1.9 to 1.2.6 on a three node cluster with {replication_factor : 3}. When I run nodetool's ring, I see 'Owns' now reports 33.33%. Previously it reported 100.00% on each node. The following snapshots are from two different clusters, so please ignore the Load diffs. I did verify {replication_factor : 3} on both clusters. 1.1.9-xobni1 'nodetool -h 127.0.0.1 -p 8080 ring': Address DC RackStatus State Load Effective-Ownership Token 170141183460469231731687303715884105728 Xxx.xx.xx.00 16 96 Up Normal 225.03 GB 100.00% 56713727820156410577229101238628035242 Xxx.xx.xx.01 16 97 Up Normal 226.43 GB 100.00% 113427455640312821154458202477256070484 Xxx.xx.xx.02 16 97 Up Normal 231.76 GB 100.00% 170141183460469231731687303715884105728 1.2.6-xobni1 'nodetool -h 127.0.0.1 -p 8080 ring': AddressRackStatus State LoadOwns Token 170141183460469231731687303715884105728 Xxx.xx.xx.00 97 Up Normal 453.94 GB 33.33% 56713727820156410577229101238628035242 Xxx.xx.xx.01 97 Up Normal 565.87 GB 33.33% 113427455640312821154458202477256070484 Xxx.xx.xx.02 96 Up Normal 523.53 GB 33.33% 170141183460469231731687303715884105728 Is this simply a display issue, or have I lost replication? Thanks for any info. Cheers, ~Jason
Re: node tool ring displays 33.33% owns on 3 node cluster with replication
On Wed, Jul 10, 2013 at 4:04 PM, Jason Tyler wrote: > Is this simply a display issue, or have I lost replication? > Almost certainly just a display issue. Do "nodetool -h localhost getendpoints 0", which will tell you the endpoints for the non-transformed key "0." It should give you 3 endpoints. You could also do this test with a known existing key and then go to those nodes and verify that they have that data on disk via sstable2json. (FWIW, it is an odd display issue/bug if it is one. Because it has reverted to pre-1.1 behavior...) =Rob
Re: Working with libcql
The highlighted line will read all the rows from the system table that lists the keyspaces in the cluster. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 9/07/2013, at 9:46 PM, Shubham Mittal wrote: > yeah I tried that and below is the output I get > > LOG: resolving remote host localhost:9160 > LOG: resolved remote host, attempting to connect > LOG: connection successful to remote host > LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, stream: > 0x00, opcode: 0x05, length: 0} OPTIONS > LOG: wrote to socket 8 bytes > LOG: error reading header End of file > > and I checked all the keyspaces in my cluster, it changes nothing in the > cluster. > > I couldn't understand the code much. What is this code supposed to do anyways? > > > On Tue, Jul 9, 2013 at 4:20 AM, aaron morton wrote: > Did you see the demo app ? > Seems to have a few examples of reading data. > > https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85 > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 9/07/2013, at 1:14 AM, Shubham Mittal wrote: > >> Hi, >> >> I found out that there exist a C++ client libcql for cassandra but its >> github repository just provides the example on how to connect to cassandra. >> Is there anyone who has written some code using libcql to read and write >> data to a cassandra DB, kindly share it. >> >> Thanks > >
temporarily running a cassandra side by side in production
We have a 12 node production cluster and a 4 node QA cluster. We are starting to think we are going to try to run a side by side cassandra instance in production while we map/reduce from one cassandra into the new instance. We are intending to do something like this Modify all ports in cassandra.yaml and the jmx port in cassandra-env.sh, 7000, 7001, 9160, 9042, and cassandra-env 7199. Can I assume a cassandra instance will not only bind to the new ports when I change these values but will talk to the other cassandra nodes on those same ports as well such that this cassandra instance is completely independent of my other cassandra instance? Are there other gotchas that I have to be aware of? (we are refactoring our model into a new faster model that we tested in QA with live data as well as moving randompartitioner to murmur) Thanks, Dean
Re: Decommissioned nodes not leaving and Hinted Handoff flood
Thanks for sharing, here is some more information… > 1 - At first, one of my node came down 5 min and when it came back it get > flooded by Hinted Handoff so hard that it could not handle the real time > queries properly. I haven't find a way to prioritize app queries rather than > Hinted Handoff. You can disable hint delivery with nodetool pausehandoff or reduce the hint throughput https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L50 > 2 - Nodes keep hints for a node that has been removed. The hints are stored with a TTL that is the gc_grace_seconds for the CF a the time the hint is written, so they will eventually be purged by compaction. You can also delete the hints using the Hinted Handoff bean https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/HintedHandOffManagerMBean.java#L30 > 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be > decommissioned, they stuck after streaming their data. The hint KS is defined using the LocalStrategy and so it not replicated. They should not be involved in streaming. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/07/2013, at 12:47 AM, Alain RODRIGUEZ wrote: > Hi, > > C*1.2.2. > > I have removed 4 nodes with "nodetool decommission". 2 of them have left with > no issue, while the 2 others nodes remained "leaving" even after streaming > their data. > > The only specific thing of these 2 nodes is that they had a lot of hints > pending. Hints from a node that couldn't come back and that I removed earlier > (because of the heavy load induced by Hinted Handoff while coming back, which > induced a lot of latencies in our app. This node didn't manage to come back > after 10 minutes, I removed it). > > So there I faced 3 bugs (or problems) : > > 1 - At first, one of my node came down 5 min and when it came back it get > flooded by Hinted Handoff so hard that it could not handle the real time > queries properly. I haven't find a way to prioritize app queries rather than > Hinted Handoff. > 2 - Nodes keep hints for a node that has been removed. > 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be > decommissioned, they stuck after streaming their data. > > > As solutions for this 3 issues I did the following: > > Solution to 1 - I removed this down node (nodetool removenode) > Solution to 2 - Stop the node remove system hints > Solution to 3 - Stop the node and removenode instead of decommission > > Now I have no more issue, yet I felt I had to report this. Maybe my > experience can help users to get out of tricky situations and commiters to > detect some issues, specially about hinted handoff. > > Alain > >
Re: Trying to write when at cassandra capacity
> It hits an OOM. To add a little more colour because I stepped through this with someone the other day. When memtables are not removed from the memtable flush queue (because they have not been written) the queue will fill up. When this happens the flush process will block trying to fill the queue, and will hold the internal switch lock used to syncronise around the commit log. This will prevent write threads from progressing. All the while writes will continue to be delivered to the node and the Mutation thread pool queue will fill. All of this results in extreme memory pressure, the JVM will spend a lot of time running GC to try and free some space. While all the GC is going on chances are the other nodes will see the failing node ad flapping as it fails to keep up with gossip. None of this will work and eventually the JVM will raise an OOM error that is normally trapped and results in the node trying to shut down. During the shutdown process it will try to disable the rpc / native transports and gossip. It's a simple thing to test and a useful example to walk through (by looking at the logs) with an Ops team if they are just starting out. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/07/2013, at 5:36 AM, Robert Coli wrote: > On Mon, Jul 8, 2013 at 5:58 PM, Faraaz Sareshwala > wrote: > What does cassandra do when it is at its data capacity (disk drives and > memtable > is full) and writes continue to pour in? My intuition says that cassandra > won't > be able to handle the new writes (they will either get silently dropped or > cassandra will hit an OOM -- does anyone know which one?). The sstables on > disk > won't magically disappear so cassandra won't be able to service the write > requests. > > It hits an OOM. > > =Rob >
Re: Logging Cassandra Reads/Writes
Some info on request tracing http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 > 1) Is it possible to log which node provides the real data in a read > operation? It's available at the DEBUG level of logging. You probably just want to enable it on the org.apache.cassandra.db.StorageProxy class, see log4j-server.properties for info > 2) Also, is it possible to log the different delays involved in each > operation-- for example, 0.1 seconds to get digests from all nodes, 1 second > to transfer data, etc.? Not Applicable as you've seen, we request to all replicas at the same time. There is more logging that will show when the responses are processed, try turning DEBUG logging on for a small 3 node cluster and send one request. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/07/2013, at 8:58 AM, Mohit Anchlia wrote: > There is a new tracing feature in Cassandra 1.2 that might help you with this. > > On Tue, Jul 9, 2013 at 1:31 PM, Blair Zajac wrote: > No idea on the logging, I'm pretty new to Cassandra. > > Regards, > Blair > > On Jul 9, 2013, at 12:50 PM, hajjat wrote: > > > Blair, thanks for the clarification! My friend actually just told me the > > same.. > > > > Any idea on how to do logging?? > > > > Thanks! > > > > > > > > -- > > View this message in context: > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Logging-Cassandra-Reads-Writes-tp7588893p7588896.html > > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > > Nabble.com. > > >
Re: Working with libcql
So, if I want to create a keyspace, what do I need to change in that file? On Thu, Jul 11, 2013 at 5:04 AM, aaron morton wrote: > The highlighted line will read all the rows from the system table that > lists the keyspaces in the cluster. > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 9/07/2013, at 9:46 PM, Shubham Mittal wrote: > > yeah I tried that and below is the output I get > > LOG: resolving remote host localhost:9160 > LOG: resolved remote host, attempting to connect > LOG: connection successful to remote host > LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, > stream: 0x00, opcode: 0x05, length: 0} OPTIONS > LOG: wrote to socket 8 bytes > LOG: error reading header End of file > > and I checked all the keyspaces in my cluster, it changes nothing in the > cluster. > > I couldn't understand the code much. What is this code supposed to do > anyways? > > > On Tue, Jul 9, 2013 at 4:20 AM, aaron morton wrote: > >> Did you see the demo app ? >> Seems to have a few examples of reading data. >> >> https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85 >> >> Cheers >> >>- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 9/07/2013, at 1:14 AM, Shubham Mittal wrote: >> >> Hi, >> >> I found out that there exist a C++ client libcql for cassandra but its >> github repository just provides the example on how to connect to cassandra. >> Is there anyone who has written some code using libcql to read and write >> data to a cassandra DB, kindly share it. >> >> Thanks >> >> >> > >
unsubscribe
user@cassandra.apache.org Subject: unsubscribe
Re: Working with libcql
You can replace "USE" statement with create statement and then change use_callback with whatever you want to do next. -- Thanks, Baskar Duraikannu Shubham Mittal wrote: >So, if I want to create a keyspace, what do I need to change in that file? > > > >On Thu, Jul 11, 2013 at 5:04 AM, aaron morton wrote: > >The highlighted line will read all the rows from the system table that lists >the keyspaces in the cluster. > > >Cheers > > >- > >Aaron Morton > >Freelance Cassandra Consultant > >New Zealand > > >@aaronmorton > >http://www.thelastpickle.com > > >On 9/07/2013, at 9:46 PM, Shubham Mittal wrote: > > >yeah I tried that and below is the output I get > > >LOG: resolving remote host localhost:9160 > >LOG: resolved remote host, attempting to connect > >LOG: connection successful to remote host > >LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, stream: >0x00, opcode: 0x05, length: 0} OPTIONS > >LOG: wrote to socket 8 bytes > >LOG: error reading header End of file > > >and I checked all the keyspaces in my cluster, it changes nothing in the >cluster. > > >I couldn't understand the code much. What is this code supposed to do anyways? > > > >On Tue, Jul 9, 2013 at 4:20 AM, aaron morton wrote: > >Did you see the demo app ? > >Seems to have a few examples of reading data. > > >https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85 > > >Cheers > > >- > >Aaron Morton > >Freelance Cassandra Consultant > >New Zealand > > >@aaronmorton > >http://www.thelastpickle.com > > >On 9/07/2013, at 1:14 AM, Shubham Mittal wrote: > > >Hi, > > >I found out that there exist a C++ client libcql for cassandra but its github >repository just provides the example on how to connect to cassandra. Is there >anyone who has written some code using libcql to read and write data to a >cassandra DB, kindly share it. > > >Thanks > > > > >
Re: Working with libcql
On that note, is anyone using this library in production? Can anyone speak to its stability and readiness for use? I only noticed it on the list of cassandra clients a few days ago and haven't heard much talk about it elsewhere. Faraaz On Wed, Jul 10, 2013 at 05:55:55PM -0700, Baskar Duraikannu wrote: > You can replace "USE" statement with create statement and then change > use_callback with whatever you want to do next. > > -- > Thanks, > Baskar Duraikannu > > Shubham Mittal wrote: > > So, if I want to create a keyspace, what do I need to change in that file? > > > On Thu, Jul 11, 2013 at 5:04 AM, aaron morton wrote: > > The highlighted line will read all the rows from the system table that > lists the keyspaces in the cluster. > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 9/07/2013, at 9:46 PM, Shubham Mittal wrote: > > > yeah I tried that and below is the output I get > > LOG: resolving remote host localhost:9160 > LOG: resolved remote host, attempting to connect > LOG: connection successful to remote host > LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, > stream: 0x00, opcode: 0x05, length: 0} OPTIONS > LOG: wrote to socket 8 bytes > LOG: error reading header End of file > > and I checked all the keyspaces in my cluster, it changes nothing in > the cluster. > > I couldn't understand the code much. What is this code supposed to do > anyways? > > > On Tue, Jul 9, 2013 at 4:20 AM, aaron morton > wrote: > > Did you see the demo app ? > Seems to have a few examples of reading data. > > https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85 > > Cheers > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 9/07/2013, at 1:14 AM, Shubham Mittal > wrote: > > > Hi, > > I found out that there exist a C++ client libcql for cassandra > but its github repository just provides the example on how to > connect to cassandra. Is there anyone who has written some > code > using libcql to read and write data to a cassandra DB, kindly > share it. > > Thanks > > > > > > > >
Re: manually removing sstable
thanks a lot. I can confirm that it solved our problem too. looks like the C* 2.0 feature is perfect for us. T# On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson wrote: > yep that works, you need to remove all components of the sstable though, > not just -Data.db > > and, in 2.0 there is this: > https://issues.apache.org/jira/browse/CASSANDRA-5228 > > /Marcus > > > On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg wrote: > >> Hi, >> >> I think I remember reading that if you have sstables that you know >> contain only data that whose ttl has expired, it's safe to remove them >> manually by stopping c*, removing the *-Data.db files and then starting up >> c* again. is this correct? >> >> we have a cluster where everything is written with a ttl, and sometimes >> c* needs to compact over a 100 gb of sstables where we know ever has >> expired, and we'd rather just manually get rid of those. >> >> T# >> > >
RE: unsubscribe
http://wiki.apache.org/cassandra/FAQ#unsubscribe a écrit sur 11/07/2013 02:25:28 : > De : > A : user@cassandra.apache.org, > Date : 11/07/2013 02:26 > Objet : unsubscribe > > user@cassandra.apache.org > > Subject: unsubscribe >