Re: [BETA RELEASE] Apache Cassandra 1.0.0-beta1 released
On Thu, Sep 15, 2011 at 10:04 PM, mcasandra wrote: > This is a great new! Is it possible to do a write-up of main changes like > "Leveldb" and explain it a little bit. I get lost reading JIRA and sometimes > is difficult to follow the thread. It looks like there are some major > changes in this release. The NEWS file should list the main changes, though it doesn't go into much details. But more detailed write-up will come soon. -- Sylvain > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BETA-RELEASE-Apache-Cassandra-1-0-0-beta1-released-tp6797930p6798330.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >
Re: Exception in Hadoop Word Count sample
The example works against the 7.0 branch, not against trunk. JIRA created at https://issues.apache.org/jira/browse/CASSANDRA-3215 On Thu, Sep 15, 2011 at 3:58 PM, Tharindu Mathew wrote: > Now I get this, > > Any help would be greatly appreciated. > > ./bin/word_count > 11/09/15 12:28:28 INFO WordCount: output reducer type: cassandra > 11/09/15 12:28:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 11/09/15 12:28:30 INFO mapred.JobClient: Running job: job_local_0001 > 11/09/15 12:28:30 INFO mapred.MapTask: io.sort.mb = 100 > 11/09/15 12:28:30 INFO mapred.MapTask: data buffer = 79691776/99614720 > 11/09/15 12:28:30 INFO mapred.MapTask: record buffer = 262144/327680 > 11/09/15 12:28:30 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.RuntimeException: java.lang.UnsupportedOperationException: no > local connection available > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:132) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.UnsupportedOperationException: no local connection > available > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getLocation(ColumnFamilyRecordReader.java:176) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:113) > ... 4 more > 11/09/15 12:28:31 INFO mapred.JobClient: map 0% reduce 0% > 11/09/15 12:28:31 INFO mapred.JobClient: Job complete: job_local_0001 > 11/09/15 12:28:31 INFO mapred.JobClient: Counters: 0 > 11/09/15 12:28:31 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > 11/09/15 12:28:32 INFO mapred.JobClient: Running job: job_local_0002 > 11/09/15 12:28:32 INFO mapred.MapTask: io.sort.mb = 100 > 11/09/15 12:28:32 INFO mapred.MapTask: data buffer = 79691776/99614720 > 11/09/15 12:28:32 INFO mapred.MapTask: record buffer = 262144/327680 > 11/09/15 12:28:32 WARN mapred.LocalJobRunner: job_local_0002 > java.lang.RuntimeException: java.lang.UnsupportedOperationException: no > local connection available > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:132) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.UnsupportedOperationException: no local connection > available > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getLocation(ColumnFamilyRecordReader.java:176) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:113) > ... 4 more > 11/09/15 12:28:33 INFO mapred.JobClient: map 0% reduce 0% > 11/09/15 12:28:33 INFO mapred.JobClient: Job complete: job_local_0002 > 11/09/15 12:28:33 INFO mapred.JobClient: Counters: 0 > 11/09/15 12:28:33 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > 11/09/15 12:28:34 INFO mapred.JobClient: Running job: job_local_0003 > 11/09/15 12:28:34 INFO mapred.MapTask: io.sort.mb = 100 > 11/09/15 12:28:34 INFO mapred.MapTask: data buffer = 79691776/99614720 > 11/09/15 12:28:34 INFO mapred.MapTask: record buffer = 262144/327680 > 11/09/15 12:28:34 WARN mapred.LocalJobRunner: job_local_0003 > java.lang.RuntimeException: java.lang.UnsupportedOperationException: no > local connection available > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:132) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: java.lang.UnsupportedOperationException: no local connection > available > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getLocation(ColumnFamilyRecordReader.java:176) > at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:113) > ... 4 more > 11/09/15 12:28:35 INFO mapred.JobClient: map 0% reduce 0% > 11/09/15 12:28:35 INFO mapred.JobClient: Job complete: job_local_0003 > 11/09/15 12:28:35 INFO mapred.JobClient: Counters: 0 > 11/09/15 12:28:35 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with > processName=JobTracker, sessionId= - already initialized > 11/09/15 12:28:36 INFO mapred.JobClient: Running job: job_local_0004 > 11/09/15 12:28:36 INFO
"Ignorning message." showing in the log while upgrade to 0.8
I am running local tests about upgrade cassandra. upgrade from 0.7.4 to 0.8.5 after upgrade one node1, two problem happened: 1, node2 keep saying: "Received connection from newer protocol version. Ignorning message." is that normal behaviour? 2, while running "describe cluster" on node1, it shows node2 unreachable: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: UNREACHABLE: [node2] 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3] node3 seems act normal. I saw the JMXPORT has changed since 0.8, is that the reason node was unreachable? thanks!
Re: "Ignorning message." showing in the log while upgrade to 0.8
after kill node1 and start it again, node 3 has the same problems with node2... On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu wrote: > I am running local tests about upgrade cassandra. upgrade from 0.7.4 to > 0.8.5 > after upgrade one node1, two problem happened: > > 1, node2 keep saying: > > "Received connection from newer protocol version. Ignorning message." > > is that normal behaviour? > > 2, while running "describe cluster" on node1, it shows node2 unreachable: > Cluster Information: >Snitch: org.apache.cassandra.locator.SimpleSnitch >Partitioner: org.apache.cassandra.dht.RandomPartitioner >Schema versions: > UNREACHABLE: [node2] > 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3] > > node3 seems act normal. > > > I saw the JMXPORT has changed since 0.8, is that the reason node was > unreachable? > > > thanks! > > >
Re: "Ignorning message." showing in the log while upgrade to 0.8
and also the load is unusual(node1 has 80M data before the upgrade): bash-3.2$ bin/nodetool -h localhost ring Address DC RackStatus State LoadOwns Token 93798607613553124915572813490354413064 node2 datacenter1 rack1 Up Normal 86.03 MB46.81% 3303745385038694806791595159000401786 node3 datacenter1 rack1 Up Normal 67.68 MB26.65% 48642301133762927375044585593194981764 node1 datacenter1 rack1 Up Normal 114.81 KB 26.54% 93798607613553124915572813490354413064 On Fri, Sep 16, 2011 at 10:48 PM, Yan Chunlu wrote: > after kill node1 and start it again, node 3 has the same problems with > node2... > > > On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu wrote: > >> I am running local tests about upgrade cassandra. upgrade from 0.7.4 to >> 0.8.5 >> after upgrade one node1, two problem happened: >> >> 1, node2 keep saying: >> >> "Received connection from newer protocol version. Ignorning message." >> >> is that normal behaviour? >> >> 2, while running "describe cluster" on node1, it shows node2 unreachable: >> Cluster Information: >>Snitch: org.apache.cassandra.locator.SimpleSnitch >>Partitioner: org.apache.cassandra.dht.RandomPartitioner >>Schema versions: >> UNREACHABLE: [node2] >> 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3] >> >> node3 seems act normal. >> >> >> I saw the JMXPORT has changed since 0.8, is that the reason node was >> unreachable? >> >> >> thanks! >> >> >> >
Re: "Ignorning message." showing in the log while upgrade to 0.8
You might need to do the nodetool scrub on the nodes to rebuild the sstables for the different protocols. On Fri, Sep 16, 2011 at 10:50 PM, Yan Chunlu wrote: > and also the load is unusual(node1 has 80M data before the upgrade): > > bash-3.2$ bin/nodetool -h localhost ring > Address DC RackStatus State LoadOwns >Token > >93798607613553124915572813490354413064 > node2 datacenter1 rack1 Up Normal 86.03 MB46.81% > 3303745385038694806791595159000401786 > node3 datacenter1 rack1 Up Normal 67.68 MB26.65% > 48642301133762927375044585593194981764 > node1 datacenter1 rack1 Up Normal 114.81 KB 26.54% > 93798607613553124915572813490354413064 > > > > On Fri, Sep 16, 2011 at 10:48 PM, Yan Chunlu wrote: > >> after kill node1 and start it again, node 3 has the same problems with >> node2... >> >> >> On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu wrote: >> >>> I am running local tests about upgrade cassandra. upgrade from 0.7.4 to >>> 0.8.5 >>> after upgrade one node1, two problem happened: >>> >>> 1, node2 keep saying: >>> >>> "Received connection from newer protocol version. Ignorning message." >>> >>> is that normal behaviour? >>> >>> 2, while running "describe cluster" on node1, it shows node2 unreachable: >>> Cluster Information: >>>Snitch: org.apache.cassandra.locator.SimpleSnitch >>>Partitioner: org.apache.cassandra.dht.RandomPartitioner >>>Schema versions: >>> UNREACHABLE: [node2] >>> 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3] >>> >>> node3 seems act normal. >>> >>> >>> I saw the JMXPORT has changed since 0.8, is that the reason node was >>> unreachable? >>> >>> >>> thanks! >>> >>> >>> >> > -- Dikang Gu 0086 - 18611140205
Re: "Ignorning message." showing in the log while upgrade to 0.8
On Fri, Sep 16, 2011 at 9:42 AM, Yan Chunlu wrote: > I am running local tests about upgrade cassandra. upgrade from 0.7.4 to > 0.8.5 > after upgrade one node1, two problem happened: > 1, node2 keep saying: > "Received connection from newer protocol version. Ignorning message." > is that normal behaviour? Yes. It will take a few exchanges before the new node knows to use the older protocol with the 0.7 nodes. > 2, while running "describe cluster" on node1, it shows node2 unreachable: > Cluster Information: > Snitch: org.apache.cassandra.locator.SimpleSnitch > Partitioner: org.apache.cassandra.dht.RandomPartitioner > Schema versions: > UNREACHABLE: [node2] > 05f1ee3b-e063-11e0-97d5-63c2fb3f0ca8: [node1, node3] > node3 seems act normal. > > I saw the JMXPORT has changed since 0.8, is that the reason node was > unreachable? No. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Performance deterioration while building secondary index
Well, the problem is still there, i.e. I tried to add one more index and the 3-node cluster is just going spastic, becomes unresponsive etc. These boxes have plenty of CPU and memory. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Performance-deterioration-while-building-secondary-index-tp6564401p6801680.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: import data into cassandra
Hi, Is there a tool which imports data from large CSV files into Cassandra using Thrift API (If using JAVA, it would be great). Thanks, Nehal Mehta -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/import-data-into-cassandra-tp4627325p6801723.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Performance deterioration while building secondary index
Did you create a ticket? On Fri, Sep 16, 2011 at 12:50 PM, buddhasystem wrote: > Well, the problem is still there, i.e. I tried to add one more index and the > 3-node cluster is just going spastic, becomes unresponsive etc. These boxes > have plenty of CPU and memory. > > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Performance-deterioration-while-building-secondary-index-tp6564401p6801680.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Queue suggestion in Cassandra
We try to implement an ordered queue system in Cassandra(ver 0.8.5). In initial design we use a row as queue, a column for each item in queue. that means creating new column when inserting item and delete column when top item is popped. Since columns are sorted in Cassandra we got the ordered queue. It works fine until queue size reaches 50K, then we got high CPU usage and constant GC, that makes the whole Cassandra server very slow and not responsive, we have to do full compaction to fix this problem. Due to this performance issue that this queue is not useful for us. We are looking for other designs. I want to know if anybody has implemented a large ordered queue successfully. Let me know if you have suggestion, Thank you in advance. Daning
Re: Queue suggestion in Cassandra
use zookeeper. Scott Fines has a great library on top of zk. On Fri, Sep 16, 2011 at 7:08 PM, Daning Wang wrote: > We try to implement an ordered queue system in Cassandra(ver 0.8.5). In > initial design we use a row as queue, a column for each item in queue. > that means creating new column when inserting item and delete column when > top item is popped. Since columns are sorted in Cassandra we got the ordered > queue. > > It works fine until queue size reaches 50K, then we got high CPU usage and > constant GC, that makes the whole Cassandra server very slow and not > responsive, we have to do full compaction to fix this problem. > > Due to this performance issue that this queue is not useful for us. We are > looking for other designs. I want to know if anybody has implemented a large > ordered queue successfully. > > Let me know if you have suggestion, > > Thank you in advance. > > Daning > > >
ByteOrderedPartitioner
How is the performance of ByteOrderedPartitioner, compared to RandomPartitioner? the perforamnce when getting data with single key, does it use same algorithm? I have read that the downside of ByteOrderedPartitioner is creating hotspot. But if I have 4 nodes and I set RF to 4, that will replicate data to all 4 nodes, that could avoid hot spot, right? Thank you in advance, Daning
Re: LevelDB type compaction
> >> and updates could be scattered all over >> before compaction? > > No, updates to a given row will be still be in a single sstable. > > Can you please explain little more? You mean that if Level 1 file contains range from 1-100 all the updates would still go in that file? The link on leveldb says: > The compaction picks a file from level L and all overlapping files from > the next level L+1 > If all updates go in the same sstables then how do overlapping files get generated. By overlapping I am assuming it means new or updated value for a given key exists in multiple files? Thanks for the explanation -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/LevelDB-type-compaction-tp6798334p6802772.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
on flush, is it possible to dump flushed memtable to row cache?
this way we wouldn't need to get a sudden rise in read latency after flush. or is something similar is already i there?
Re: on flush, is it possible to dump flushed memtable to row cache?
Writes already update rows that are hot in the cache when they are sent. So in the best case this would be no better, and in the average case (where you push out rows in the cache to make room for cold ones from the memtable) a lot worse. On Fri, Sep 16, 2011 at 8:40 PM, Yang wrote: > this way we wouldn't need to get a sudden rise in read latency after flush. > or is something similar is already i there? > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Get CL ONE / NTS
I will look at that to understand the behavior of ONE with NTS. Thanks. - Pierre -Original Message- From: aaron morton Date: Fri, 16 Sep 2011 10:11:30 To: Reply-To: user@cassandra.apache.org Subject: Re: Get CL ONE / NTS > What I’m missing is a clear behavior for CL.ONE. I’m unsure about what nodes > are used by ONE and how the filtering of missing data/error is done. I’ve > landed in ReadCallback.java but error handling is out of my reach for the > moment. Start with StorageProxy.fetch() to see which nodes are considered to be part of the request. ReadCallback.ctor() will decide which are actually involved based on the CL and RR been enabled. At CL ONE there is no checkin of the replica responses for consistency, as there is only one response. If RR is enabled it will start from ReadCallback.maybeResolveForRepair(). Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/09/2011, at 7:21 PM, Pierre Chalamet wrote: > I do not agree here. I trade “consistency” (it’s more data miss than > consistency here) over performance in my case. > I’m okay to handle the popping of the Spanish inquisition in the current DC > by triggering a new read with a stronger CL somewhere else (for example in > other DCs). > If the data is nowhere to be found or nothing is reachable, well, it’s sad > but true but it will be the end of the game. Fine. > > What I’m missing is a clear behavior for CL.ONE. I’m unsure about what nodes > are used by ONE and how the filtering of missing data/error is done. I’ve > landed in ReadCallback.java but error handling is out of my reach for the > moment. > > Thanks, > - Pierre > > From: aaron morton [mailto:aa...@thelastpickle.com] > Sent: Thursday, September 15, 2011 12:27 AM > To: user@cassandra.apache.org > Subject: Re: Get CL ONE / NTS > > Are you advising CL.ONE does not worth the game when considering > read performance ? > Consistency is not performance, it's a whole new thing to tune in your > application. If you have performance issues deal with those as performance > issues, better code / data model / hard ware. > > By the way, I do not have consistency problem at all - data is only written > once > Nobody expects a consistency problem. It's chief weapon is surprise. Surprise > and fear. It's two weapons are fear and surprise. And so forth > http://www.youtube.com/watch?v=Ixgc_FGam3s > > If you write at LOCAL QUORUM in DC 1 and DC 2 is down at the start of the > request, a hint will be stored in DC 1. Some time later when DC 2 comes back > that hint will be sent to DC 2. If in the mean time you read from DC 2 at CL > ONE you will not get that change. With Read Repair enabled it will repair in > the background and you may get a different response on the next read (Am > guessing here, cannot remember exactly how RR works cross DC) > > Cheers > > > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 15/09/2011, at 10:07 AM, Pierre Chalamet wrote: > > > Thanks Aaron, didn't seen your answer before mine. > > I do agree for 2/ I might have read error. Good suggestion to use > EACH_QUORUM - it could be a good trade off to read at this level if ONE > fails. > > Maybe using LOCAL_QUORUM might be a good answer and will avoid headache > after all. Are you advising CL.ONE does not worth the game when considering > read performance ? > > By the way, I do not have consistency problem at all - data is only written > once (and if more it is always the same data) and read several times across > DC. I only have replication problems. That's why I'm more inclined to use > CL.ONE for read if possible. > > Thanks, > - Pierre > > > -Original Message- > From: aaron morton [mailto:aa...@thelastpickle.com] > Sent: Wednesday, September 14, 2011 11:48 PM > To: user@cassandra.apache.org; pie...@chalamet.net > Subject: Re: Get CL ONE / NTS > > Your current approach to Consistency opens the door to some inconsistent > behavior. > > > 1/ Will I have an error because DC2 does not have any copy of the data ? > If you read from DC2 at CL ONE and the data is not replicated it will not be > returned. > > > 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 > ? > Not at CL ONE. If you used CL EACH QUORUM then the read will go to all the > DC's. If DC2 is behind DC1 then you will get the data form DC1. > > > 3/ In case of partial replication to DC2, will I see sometimes errors > about servers not holding the data in DC2 ? > Depending on the API call and the client, working at CL ONE, you will see > either errors or missing data. > > > 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it > does not have the data or does it waits until all servers tell they do not > have the data ? > yes > > Consider > > using LOCAL QUORUM for write and read, will make