iteration does not yield all data with consistency ONE
Hello, We have a cluster of 4 nodes (0.6.6) and use the random partitioner and a replication of 2. When I insert a number of rows I can always retrieve them by their explicit id (get_range_slices("","", 1). Playing with consistency levels and temporarily shutting down a Cassandra node all yields the expected result. However when I use get_range_slices("","", n) to iterate over all rows, I sometimes don't get anything (depending on the node). I then reduced the problem to inserting just a single row. Specifically, the 'iteration' only seems to succeed when I issue the request to the node that contains the first copy. I Discovered that when I iterate using a consistency level of Quorum/All the iteration always succeeds and I properly get the one row. So a solution would be to always use consistency level One/All but that has a performance penalty. Can anyone explain why iterating using get_range_slices("","",n) does not always function with consistency level One on all nodes? Thanks, Eric P.S. To rule out any discussion on whether or not to use iteration in the first place, we only plan to use it for backup and periodic cleanup cycles.
Re: WordCount example problem
Hi, I'm trying the WordCount example and getting this error: [12:33]$ ./bin/word_count 10/11/10 12:34:35 INFO WordCount: output reducer type: filesystem 10/11/10 12:34:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 10/11/10 12:34:36 INFO WordCount: XXX:text0 10/11/10 12:34:36 INFO mapred.JobClient: Running job: job_local_0001 10/11/10 12:34:36 INFO mapred.MapTask: io.sort.mb = 100 10/11/10 12:34:36 INFO mapred.MapTask: data buffer = 79691776/99614720 10/11/10 12:34:36 INFO mapred.MapTask: record buffer = 262144/327680 10/11/10 12:34:36 WARN mapred.LocalJobRunner: job_local_0001 java.lang.ClassCastException: java.nio.HeapByteBuffer cannot be cast to [B at WordCount$TokenizerMapper.map(WordCount.java:73) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 10/11/10 12:34:37 INFO mapred.JobClient: map 0% reduce 0% 10/11/10 12:34:37 INFO mapred.JobClient: Job complete: job_local_0001 10/11/10 12:34:37 INFO mapred.JobClient: Counters: 0 I'm using cassandra 0.7.0beta3 (from latest trunk) on just one machine. Is the example working for anybody? Thanks, P.
Re: Data management on a ring
If I understand your correctly, you just want to add 8 nodes to a ring that already has 2 ? You could add the nodes and manually assign them tokens following the guidelines here http://wiki.apache.org/cassandra/Operations I'm not sure how to ensure the minimum amount of data transfer though. Adding all 8 at once is probably a bad idea. How about you make a new cluster of 8 nodes, manually assign tokens and then copy the data from the 2 node ring to the 8 node. Then move the 2 original nodes into the new cluster? Hope that helps. Aaron On 10 Nov 2010, at 20:56, Jean-Yves LEBLEU wrote: > Hello all, > > We have an installation of 10 nodes, and we choose to deploy 5 rings of 2 > nodes. > > We would like to change to a ring of 10 nodes. > > Some data have to be replicated on the 10 nodes, some should stay on 2 nodes. > Do you have any idea or documentation pointer in order to have a ring of 10 > nodes with such data repartition ? > > Thanks for any answer. > > Jean-Yves
Re: Data management on a ring
Thanks for the anwser. It was not exactly my point, I would like to know if in a 10 nodes rings if it is possible to restrict replication of some data to only 2 nodes, and other data to all nodes ? Regards. Jean-Yves On Wed, Nov 10, 2010 at 11:17 AM, aaron morton wrote: > If I understand your correctly, you just want to add 8 nodes to a ring that > already has 2 ? > > You could add the nodes and manually assign them tokens following the > guidelines here http://wiki.apache.org/cassandra/Operations > > I'm not sure how to ensure the minimum amount of data transfer though. > Adding all 8 at once is probably a bad idea. > > How about you make a new cluster of 8 nodes, manually assign tokens and > then copy the data from the 2 node ring to the 8 node. Then move the 2 > original nodes into the new cluster? > > Hope that helps. > Aaron > > On 10 Nov 2010, at 20:56, Jean-Yves LEBLEU wrote: > > > Hello all, > > > > We have an installation of 10 nodes, and we choose to deploy 5 rings of 2 > nodes. > > > > We would like to change to a ring of 10 nodes. > > > > Some data have to be replicated on the 10 nodes, some should stay on 2 > nodes. Do you have any idea or documentation pointer in order to have a ring > of 10 nodes with such data repartition ? > > > > Thanks for any answer. > > > > Jean-Yves > >
about key sorting and token partitioning
Hi, I am using cassandra to store a message steam, and want to use timestamps (like mmddhhMIss or something alike) as the keys. So if I use RandomPartitioner, I will loose the order when using get_range_slices(). If I use OrderPreservingPartitioner, how should I configure cassandra to make load balance between the nodes? Thanks! 2010-11-10 zangds
Re: about key sorting and token partitioning
> I am using cassandra to store a message steam, and want to use timestamps > (like mmddhhMIss or something alike) as the keys. > So if I use RandomPartitioner, I will loose the order when using > get_range_slices(). > If I use OrderPreservingPartitioner, how should I configure cassandra to > make load balance between the nodes? AFAIK there's no silver bullet to making the order preserving partitioner easy to use w.r.t. node balancing in the situation you're describing. One thing to consider is to use the random partitioner (for its simplicity in managing the cluster) and use a granular subset of the timestamp as the row key. For example, you could have the row key be mmddhh to get an entire hour per row. A reasonable granularity would depend on your use-case; but the idea is to be able to take advantage of the simplicity of using the random partitioner, while having reasonable efficiency on range slices by making each row contain a pretty large range such that any additional overhead in jumping across nodes is negligible in comparison to the other work done. -- / Peter Schuller
Re: Data management on a ring
Yes, on a per-keyspace basis with NetworkTopologyStrategy (in 0.7). On Wed, Nov 10, 2010 at 4:40 AM, Jean-Yves LEBLEU wrote: > Thanks for the anwser. > > It was not exactly my point, I would like to know if in a 10 nodes rings if > it is possible to restrict replication of some data to only 2 nodes, and > other data to all nodes ? > Regards. > Jean-Yves > > On Wed, Nov 10, 2010 at 11:17 AM, aaron morton > wrote: >> >> If I understand your correctly, you just want to add 8 nodes to a ring >> that already has 2 ? >> >> You could add the nodes and manually assign them tokens following the >> guidelines here http://wiki.apache.org/cassandra/Operations >> >> I'm not sure how to ensure the minimum amount of data transfer though. >> Adding all 8 at once is probably a bad idea. >> >> How about you make a new cluster of 8 nodes, manually assign tokens and >> then copy the data from the 2 node ring to the 8 node. Then move the 2 >> original nodes into the new cluster? >> >> Hope that helps. >> Aaron >> >> On 10 Nov 2010, at 20:56, Jean-Yves LEBLEU wrote: >> >> > Hello all, >> > >> > We have an installation of 10 nodes, and we choose to deploy 5 rings of >> > 2 nodes. >> > >> > We would like to change to a ring of 10 nodes. >> > >> > Some data have to be replicated on the 10 nodes, some should stay on 2 >> > nodes. Do you have any idea or documentation pointer in order to have a >> > ring >> > of 10 nodes with such data repartition ? >> > >> > Thanks for any answer. >> > >> > Jean-Yves >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: WordCount example problem
http://www.mail-archive.com/user@cassandra.apache.org/msg07093.html On Wed, Nov 10, 2010 at 5:47 AM, Patrik Modesto wrote: > Hi, > > I'm trying the WordCount example and getting this error: > > [12:33]$ ./bin/word_count > 10/11/10 12:34:35 INFO WordCount: output reducer type: filesystem > 10/11/10 12:34:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 10/11/10 12:34:36 INFO WordCount: XXX:text0 > 10/11/10 12:34:36 INFO mapred.JobClient: Running job: job_local_0001 > 10/11/10 12:34:36 INFO mapred.MapTask: io.sort.mb = 100 > 10/11/10 12:34:36 INFO mapred.MapTask: data buffer = 79691776/99614720 > 10/11/10 12:34:36 INFO mapred.MapTask: record buffer = 262144/327680 > 10/11/10 12:34:36 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.ClassCastException: java.nio.HeapByteBuffer cannot be cast to [B > at WordCount$TokenizerMapper.map(WordCount.java:73) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > 10/11/10 12:34:37 INFO mapred.JobClient: map 0% reduce 0% > 10/11/10 12:34:37 INFO mapred.JobClient: Job complete: job_local_0001 > 10/11/10 12:34:37 INFO mapred.JobClient: Counters: 0 > > I'm using cassandra 0.7.0beta3 (from latest trunk) on just one > machine. Is the example working for anybody? > > Thanks, > P. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: iteration does not yield all data with consistency ONE
Was the node that should have the other replica of this row down when it was inserted? On Wed, Nov 10, 2010 at 6:08 AM, Eric van Orsouw wrote: > > Hello, > > > > We have a cluster of 4 nodes (0.6.6) and use the random partitioner and a > replication of 2. > > When I insert a number of rows I can always retrieve them by their explicit > id (get_range_slices(“”,””, 1). > > Playing with consistency levels and temporarily shutting down a Cassandra > node all yields the expected result. > > > > However when I use get_range_slices(“”,””, n) to iterate over all rows, I > sometimes don’t get anything (depending on the node). > > > > I then reduced the problem to inserting just a single row. > > Specifically, the ‘iteration’ only seems to succeed when I issue the request > to the node that contains the first copy. > > I Discovered that when I iterate using a consistency level of Quorum/All the > iteration always succeeds and I properly get the one row. > > > > So a solution would be to always use consistency level One/All but that has a > performance penalty. > > > > Can anyone explain why iterating using get_range_slices(“”,””,n) does not > always function with consistency level One on all nodes? > > > > Thanks, > > Eric > > > > P.S. To rule out any discussion on whether or not to use iteration in the > first place, we only plan to use it for backup and periodic cleanup cycles. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Question about consitency level & data propagation & eventually consistent
Hi, Assuming I'm reading and writing with consitency level 1 (one), read repair turned off, I have a few questions about data propagation. Data is being stored at consistency level 3. I'm not interested in the deletes. I can live with older data (or data that has been deleted and will reappear), but I need to know how long it will take until the data will be available at the other nodes, since I have turned read repair off. 1) If all nodes are up: - Will all writes eventually reach all nodes (of the 3 nodes)? - What will be the maximal time until the last write reaches the last node (of the 3 nodes)? (e.g. Assume one of the node is doing compactation at that time) 2) If one or two nodes are down - As I understood it, one node will buffer the writes for the remaining nodes. - If the nodes go up again: When will these writes be propagated, at compactation?, what will be the maximal time until the writes reach the 2 nodes? Will these writes be propagated at all? In case of 2: The best way would then be to run nodetool repair after the two nodes will be available again. Is there a way to make the node not accept any connections during that time until it is finished repairing? (eg throw the Unavailableexception) Thanks, Thibaut
RE: iteration does not yield all data with consistency ONE
No, all nodes were up and running while the single key was inserted. The insert however was with consistency One. I assume however that the replicas are still written in this case. It is btw also very reproducible. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: woensdag 10 november 2010 15:44 To: user Subject: Re: iteration does not yield all data with consistency ONE Was the node that should have the other replica of this row down when it was inserted? On Wed, Nov 10, 2010 at 6:08 AM, Eric van Orsouw wrote: > > Hello, > > > > We have a cluster of 4 nodes (0.6.6) and use the random partitioner and a > replication of 2. > > When I insert a number of rows I can always retrieve them by their explicit > id (get_range_slices("","", 1). > > Playing with consistency levels and temporarily shutting down a Cassandra > node all yields the expected result. > > > > However when I use get_range_slices("","", n) to iterate over all rows, I > sometimes don't get anything (depending on the node). > > > > I then reduced the problem to inserting just a single row. > > Specifically, the 'iteration' only seems to succeed when I issue the request > to the node that contains the first copy. > > I Discovered that when I iterate using a consistency level of Quorum/All the > iteration always succeeds and I properly get the one row. > > > > So a solution would be to always use consistency level One/All but that has a > performance penalty. > > > > Can anyone explain why iterating using get_range_slices("","",n) does not > always function with consistency level One on all nodes? > > > > Thanks, > > Eric > > > > P.S. To rule out any discussion on whether or not to use iteration in the > first place, we only plan to use it for backup and periodic cleanup cycles. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Range queries using token instead of key
Hi, I am trying to iterate over the entire dataset to calculate some information. Now the way I am trying to do this is by going directly to the node that has a data range, so here is the route I am following - get TokenRange using - describe_ring - then for each tokenRange pick a node and get all data from that node (so talk directly to that node for local data) - using get_range_slices () and using KeyRange with start and end token. I want to get about N tokens at a time. - I want to use paging approach for this, but I cannot seem to find a way to get the token for my last keyslice? The only thing I can find is key, now is there way to get token given a key? As per some suggestions I can do the md5 on the last key and use that as the starting token for the next query, would that work? Also is there a better way of doing this? The data per row is very small. This looks like a hadoop kind of a job, but am trying to avoid hadoop since have no other use for it and this operation will be infrequent. I am using 0.6.6, RandomPartitioner. Thanks Anand
Re: Question about consitency level & data propagation & eventually consistent
> 1) If all nodes are up: > - Will all writes eventually reach all nodes (of the 3 nodes)? I believe that if read repair is completely off, then for data that was written that did *not* get saved by hinted hand-off, would not propagate until anti-entropy as part of a 'nodetool repair' or perhaps as part of node movement in the ring (as a side-effect). Also see http://wiki.apache.org/cassandra/Operations under "Consistency". > - What will be the maximal time until the last write reaches the last node > (of the 3 nodes)? (e.g. Assume one of the node is doing compactation at that > time) There is no particular time guarantee, unless you yourself take steps that would imply such a guarantee (such as by running repair with a certain frequency). > 2) If one or two nodes are down > - As I understood it, one node will buffer the writes for the remaining > nodes. AFAIK not all. I.e., only when a node is marked as down will hinted hand-off start eating writes for the node (right, anyone?). Hinted hand-off is not supposed to be a guarantee that all data will become up-to-date; it's rather a way to lessen the impact of nodes going down by decreasing the amount of data that remains out of synch. > - If the nodes go up again: When will these writes be propagated, at > compactation?, what will be the maximal time until the writes reach the 2 > nodes? Will these writes be propagated at all? Again there's no time guarantee as such. As for the writes, I believe hinted hand-off sends those along independently of compaction (but I'm not sure). -- / Peter Schuller
RE: WordCount example problem
Also, your Mapper class needs to look like this: MyMapper extends Mapper,Text,SumWritable> ... with all the necessary fixes to the map method. AD -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, November 10, 2010 8:40 AM To: user Subject: Re: WordCount example problem http://www.mail-archive.com/user@cassandra.apache.org/msg07093.html On Wed, Nov 10, 2010 at 5:47 AM, Patrik Modesto wrote: > Hi, > > I'm trying the WordCount example and getting this error: > > [12:33]$ ./bin/word_count > 10/11/10 12:34:35 INFO WordCount: output reducer type: filesystem > 10/11/10 12:34:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 10/11/10 12:34:36 INFO WordCount: XXX:text0 > 10/11/10 12:34:36 INFO mapred.JobClient: Running job: job_local_0001 > 10/11/10 12:34:36 INFO mapred.MapTask: io.sort.mb = 100 > 10/11/10 12:34:36 INFO mapred.MapTask: data buffer = 79691776/99614720 > 10/11/10 12:34:36 INFO mapred.MapTask: record buffer = 262144/327680 > 10/11/10 12:34:36 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.ClassCastException: java.nio.HeapByteBuffer cannot be cast to [B > at WordCount$TokenizerMapper.map(WordCount.java:73) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > 10/11/10 12:34:37 INFO mapred.JobClient: map 0% reduce 0% > 10/11/10 12:34:37 INFO mapred.JobClient: Job complete: job_local_0001 > 10/11/10 12:34:37 INFO mapred.JobClient: Counters: 0 > > I'm using cassandra 0.7.0beta3 (from latest trunk) on just one > machine. Is the example working for anybody? > > Thanks, > P. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: iteration does not yield all data with consistency ONE
Interesting. Does it simplify further to RF=1 and 2 nodes? On Wed, Nov 10, 2010 at 8:58 AM, Eric van Orsouw wrote: > No, all nodes were up and running while the single key was inserted. > The insert however was with consistency One. I assume however that the > replicas are still written in this case. > It is btw also very reproducible. > > -Original Message- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: woensdag 10 november 2010 15:44 > To: user > Subject: Re: iteration does not yield all data with consistency ONE > > Was the node that should have the other replica of this row down when > it was inserted? > > On Wed, Nov 10, 2010 at 6:08 AM, Eric van Orsouw > wrote: >> >> Hello, >> >> >> >> We have a cluster of 4 nodes (0.6.6) and use the random partitioner and a >> replication of 2. >> >> When I insert a number of rows I can always retrieve them by their explicit >> id (get_range_slices("","", 1). >> >> Playing with consistency levels and temporarily shutting down a Cassandra >> node all yields the expected result. >> >> >> >> However when I use get_range_slices("","", n) to iterate over all rows, I >> sometimes don't get anything (depending on the node). >> >> >> >> I then reduced the problem to inserting just a single row. >> >> Specifically, the 'iteration' only seems to succeed when I issue the request >> to the node that contains the first copy. >> >> I Discovered that when I iterate using a consistency level of Quorum/All the >> iteration always succeeds and I properly get the one row. >> >> >> >> So a solution would be to always use consistency level One/All but that has >> a performance penalty. >> >> >> >> Can anyone explain why iterating using get_range_slices("","",n) does not >> always function with consistency level One on all nodes? >> >> >> >> Thanks, >> >> Eric >> >> >> >> P.S. To rule out any discussion on whether or not to use iteration in the >> first place, we only plan to use it for backup and periodic cleanup cycles. > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Question about consitency level & data propagation & eventually consistent
On Wed, Nov 10, 2010 at 8:54 AM, Thibaut Britz wrote: > Assuming I'm reading and writing with consitency level 1 (one), read repair > turned off, I have a few questions about data propagation. > Data is being stored at consistency level 3. > 1) If all nodes are up: > - Will all writes eventually reach all nodes (of the 3 nodes)? Yes. > - What will be the maximal time until the last write reaches the last node Situation-dependent. The important thing is that if you are writing at CL.ALL, it will be before the write is acked to the client. > 2) If one or two nodes are down > - As I understood it, one node will buffer the writes for the remaining > nodes. Yes: _after_ the failure detector recognizes them as down. This will take several seconds. > - If the nodes go up again: When will these writes be propagated When FD recognizes them as back up. > The best way would then be to run nodetool repair after the two nodes will > be available again. Is there a way to make the node not accept any > connections during that time until it is finished repairing? (eg throw the > Unavailableexception) No. The way to prevent stale reads is to use an appropriate consistencylevel, not error-prone heuristics. (For instance: what if the replica with the most recent data were itself down when the first node recovered and initiated repair?) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: MapReduce/Hadoop in cassandra 0.7 beta3
Aditya, Can you reproduce the problem locally with "pig -x local myscript.pig"? Also, moving this message back to the cassandra user list. On Nov 10, 2010, at 10:47 AM, Aditya Muralidharan wrote: > Hi, > > I'm still getting the error associated with > https://issues.apache.org/jira/browse/CASSANDRA-1700 > I have 7 suse nodes running Cassandra0.7 branch (latest as of the morning of > Nov 9). I've loaded 10 rows with one column family(replication factor=4) and > 100 super columns. Using the ColumnFamilyInputFormat with mapreduce > (LocalJobRunner) to retrieve all the rows gives me the following exception: > > 10/11/10 10:33:15 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal > error processing get_range_slices >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189) >at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148) >at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) >at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: org.apache.thrift.TApplicationException: Internal error processing > get_range_slices >at > org.apache.thrift.TApplicationException.read(TApplicationException.java:108) >at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724) >at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255) >... 11 more > > The server has the following exception: > ERROR [pool-1-thread-11] 2010-11-10 10:35:58,839 Cassandra.java (line 2876) > Internal error processing get_range_slices > java.lang.AssertionError: > (150596448267070854052355226693835429313,18886431880788352792108545029372560769] >at > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1200) >at > org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:429) >at > org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:513) >at > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868) >at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) >at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) >at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:619) > > Any help would be appreciated. > > Thanks. > > AD
Re: Cassandra 0.7 bootstrap exception on windows
moving this to the cassandra user list. On Nov 10, 2010, at 11:05 AM, Aditya Muralidharan wrote: > Hi, > > I'm building (on windows) a release tar from the HEAD of the Cassandra 0.7 > branch. Running a new single node instance of Cassandra gives me the > following bootstrap exception: > INFO 10:54:14,030 Enqueuing flush of memtable-locationi...@613975815(227 > bytes, 4 operations) > INFO 10:54:14,036 Writing memtable-locationi...@613975815(227 bytes, 4 > operations) > ERROR 10:54:14,278 Fatal exception in thread Thread[FlushWriter:1,5,main] > java.io.IOError: java.io.IOException: rename failed of > \var\lib\cassandra\data\system\LocationInfo-e-1-Data.db >at > org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:238) >at > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:208) >at > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:191) >at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161) >at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49) >at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174) >at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >at java.util.concurrent.FutureTask.run(FutureTask.java:138) >at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.IOException: rename failed of > \var\lib\cassandra\data\system\LocationInfo-e-1-Data.db >at > org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.java:359) >at > org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:234) >... 12 more > > > This is not a problem on linux. Any thoughts? Anyone else seeing this > behavior? > > Thanks. > > AD
Re: MapReduce/Hadoop in cassandra 0.7 beta3
Hey Aditya, Would you mind attaching that last hundred few lines from before the exception from the server log to this ticket: https://issues.apache.org/jira/browse/CASSANDRA-1724 ? Thanks, Stu -Original Message- From: "Jeremy Hanna" Sent: Wednesday, November 10, 2010 11:40am To: user@cassandra.apache.org Subject: Re: MapReduce/Hadoop in cassandra 0.7 beta3 Aditya, Can you reproduce the problem locally with "pig -x local myscript.pig"? Also, moving this message back to the cassandra user list. On Nov 10, 2010, at 10:47 AM, Aditya Muralidharan wrote: > Hi, > > I'm still getting the error associated with > https://issues.apache.org/jira/browse/CASSANDRA-1700 > I have 7 suse nodes running Cassandra0.7 branch (latest as of the morning of > Nov 9). I've loaded 10 rows with one column family(replication factor=4) and > 100 super columns. Using the ColumnFamilyInputFormat with mapreduce > (LocalJobRunner) to retrieve all the rows gives me the following exception: > > 10/11/10 10:33:15 WARN mapred.LocalJobRunner: job_local_0001 > java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal > error processing get_range_slices >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189) >at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148) >at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) >at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > Caused by: org.apache.thrift.TApplicationException: Internal error processing > get_range_slices >at > org.apache.thrift.TApplicationException.read(TApplicationException.java:108) >at > org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724) >at > org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) >at > org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255) >... 11 more > > The server has the following exception: > ERROR [pool-1-thread-11] 2010-11-10 10:35:58,839 Cassandra.java (line 2876) > Internal error processing get_range_slices > java.lang.AssertionError: > (150596448267070854052355226693835429313,18886431880788352792108545029372560769] >at > org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1200) >at > org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:429) >at > org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:513) >at > org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868) >at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) >at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) >at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:619) > > Any help would be appreciated. > > Thanks. > > AD
encoding of values in cassandra
Cassandra keys and values are just bytes. My values range from simple doubles to complex objects so I need to serialize them with something like avro, thrift or protobuf. Since I am working in a test environment and casssandra is moving to avro I decided to use the avro protocol to communicate with cassandra (from python and java). So naturally I would also like to encode my values with avro (why have 2 serialization frameworks around?). However avro needs to safe the schema with the serialized values. This is considerable overhead (even if I just safe pointers to schemas or something like that with the serialized values). It also seems complicated compared to thrift or protobuf where one can just store values. Did anyone find a neat solution to this? Or should I just use avro for communication and something like protobuf for value serialization? Best, Koert
RE: MapReduce/Hadoop in cassandra 0.7 beta3
My bad. Moved to Cassandra user list. -Original Message- From: Aditya Muralidharan [mailto:aditya.muralidha...@nisc.coop] Sent: Wednesday, November 10, 2010 10:48 AM To: u...@pig.apache.org Subject: RE: MapReduce/Hadoop in cassandra 0.7 beta3 Hi, I'm still getting the error associated with https://issues.apache.org/jira/browse/CASSANDRA-1700 I have 7 suse nodes running Cassandra0.7 branch (latest as of the morning of Nov 9). I've loaded 10 rows with one column family(replication factor=4) and 100 super columns. Using the ColumnFamilyInputFormat with mapreduce (LocalJobRunner) to retrieve all the rows gives me the following exception: 10/11/10 10:33:15 WARN mapred.LocalJobRunner: job_local_0001 java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing get_range_slices at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: org.apache.thrift.TApplicationException: Internal error processing get_range_slices at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255) ... 11 more The server has the following exception: ERROR [pool-1-thread-11] 2010-11-10 10:35:58,839 Cassandra.java (line 2876) Internal error processing get_range_slices java.lang.AssertionError: (150596448267070854052355226693835429313,18886431880788352792108545029372560769] at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1200) at org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:429) at org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:513) at org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Any help would be appreciated. Thanks. AD
Re: encoding of values in cassandra
We are moving towards treating Thrift more as a driver than as a format itself, and using libraries like Hector, pycassa, and phpcassa from the client. On Wed, Nov 10, 2010 at 1:03 PM, Koert Kuipers wrote: > Cassandra keys and values are just bytes. My values range from simple > doubles to complex objects so I need to serialize them with something like > avro, thrift or protobuf. > > > > Since I am working in a test environment and casssandra is moving to avro I > decided to use the avro protocol to communicate with cassandra (from python > and java). So naturally I would also like to encode my values with avro (why > have 2 serialization frameworks around?). However avro needs to safe the > schema with the serialized values. This is considerable overhead (even if I > just safe pointers to schemas or something like that with the serialized > values). It also seems complicated compared to thrift or protobuf where one > can just store values. > > > > Did anyone find a neat solution to this? Or should I just use avro for > communication and something like protobuf for value serialization? > > > > Best, Koert > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
multiple datacenter with low replication factor - idea for greater flexibility
Hello, We've had Cassandra running in a single production data center now for several months and have started detailed plans to add data center fault tolerance. Our requirements do not appear to be solved out-of-the-box with Cassandra. I'd like to share a solution we're planning and find others considering similar problems. We require the following: 1. Two data centers One is primary, the other hot standby to be used when primary fails. Of course Cassandra has no such bias, but as will be seen below this becomes important when considering app latency. 2. No more than 3 copies of data total We are storing blob-like objects. Cost per unit of usable storage is closely scrutinized vs other solutions. Hence we want to keep replication factor low. Two copies will be held in the primary DC, 1 in the secondary DC - with the corresponding ratio of machines in each DC. 3. Immediate consistency 4. No waiting on remote data center The application front-end runs in the primary data center and expects that operations using a local coordinator node will not suffer a response time determined by the WAN. Hence we cannot require a response from the node in the secondary data center to achieve quorum. 5. Ability to operate with a single working node per key, if necessary We wish to temporarily operate with even a single working node per token in desperate situations involving data center failures or combinations of node and data center failure. Existing Cassandra solutions offer combinations of the above, but it is not at all clear how to achieve all the above without custom work. Normal quorum with N=3 can only work with a single down node regardless of topology. Furthermore if one node in the primary DC fails, quorum requires synchronous operations over the WAN. NetworkTopologyStrategy is nice, but requiring quorum in the primary DC with 2 nodes means no tolerance to a single node failure there. If we're overlooking something I'd love to know. Hence the following proposal for a new replication strategy we're calling SubQuorum. In short SubQuorum allows administratively marking some nodes as being exempt from participating in quorum. As all nodes agree as to exemption status, consistency is still guaranteed as quorum is still achieved amongst the remaining nodes. We gain tremendous flexibility to deal with node and DC failures. Exempt nodes, if up, still receive mutation messages as usual. For example : If a primary DC node fails we can mark its remote counterpart exempt from quorum, hence allowing continued operation without a synchronous call over the WAN. Or another example : If the primary DC fails we mark all primary DC nodes exempt and move the entire application to the secondary DC where it runs as usual but with just the one copy. The implementation is trivial and consists of two pieces: 1. Exempt node management. The list of exempt nodes is broadcast out of band. In our case we're leveraging puppet and a admin server. 2. We've written an implementation of AbstractReplicationStrategy that returns custom QuorumResponseHandler and IWriteResponseHandler. These simply wait for quorum amongst non-exempt nodes. This requires a small change to the AbstractReplicationStrategy interface to pass the endpoints to getQuorumResponseHandler and getWriteResponseHandler, but otherwise changes are contained in the plugin. There is more analysis I can share if anyone is interested. But at this point I'd like to get feedback. Thanks, Wayne Lewis
[RELEASE] 0.6.7
It's been about a month since our last stable update and we've accumulated a few changes[1] worth having, so I'm pleased to announce the release of 0.6.7. If you're coming from a version older than 0.6.6 then please be sure to read the release notes[2]; upgrades from 0.6.6. should be completely seamless. As usual, links to binary and source archives are available from the Downloads page[3], and packages for Debian-based systems are available from our repo[4]. Thanks, and enjoy! [1]: http://goo.gl/pGEx5 [CHANGES.txt] [2]: http://goo.gl/IQ3rR [NEWS.txt] [3]: http://cassandra.apache.org/download [4]: http://wiki.apache.org/cassandra/DebianPackaging -- Eric Evans eev...@rackspace.com
Re: Range queries using token instead of key
On Wed, Nov 10, 2010 at 10:05 AM, Anand Somani wrote: > Hi, > > I am trying to iterate over the entire dataset to calculate some > information. Now the way I am trying to do this is by going directly to the > node that has a data range, so here is the route I am following > > get TokenRange using - describe_ring > then for each tokenRange pick a node and get all data from that node (so > talk directly to that node for local data) - using get_range_slices () and > using KeyRange with start and end token. I want to get about N tokens at a > time. > I want to use paging approach for this, but I cannot seem to find a way to > get the token for my last keyslice? The only thing I can find is key, now is > there way to get token given a key? As per some suggestions I can do the md5 > on the last key and use that as the starting token for the next query, would > that work? > > Also is there a better way of doing this? The data per row is very small. > This looks like a hadoop kind of a job, but am trying to avoid hadoop since > have no other use for it and this operation will be infrequent. > > I am using 0.6.6, RandomPartitioner. > > Thanks > Anand > You should take the last key from your keyslice and pass it into FBUtilities.hash(key) to get its token. Edward
CF Stats in 0.7beta3
Afternoon all - I'm playing with 0.7beta3 on some boxes I have here at the office and while checking out the stats from one of my tests I'm seeing Write Latency being reported as "0.009 ms". I haven't done any timing yet in my client, but is this really microsecond latency, or is there a mismatch between the numeric and the label? Granted, I'm not loading the complex up at all just writing with a single thread to play with pycassa so the cluster doesn't have anything to do but handle my write, but I'd like to make sure before I run off trying to talk my manager into something :-) Column Family: NameServer2Domain SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 39718 Memtable Data Size: 2531109 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 39718 Write Latency: 0.009 ms. Pending Tasks: 0 Key cache capacity: 20 Key cache size: 0 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0
Re: CF Stats in 0.7beta3
Yeah, that's really microsecond latency. Note, though that this isn't the full request timing, its just the storage proxy down, so it doesn't account for any latency added by thrift or the network. -ryan On Wed, Nov 10, 2010 at 1:43 PM, Rock, Paul wrote: > Afternoon all - I'm playing with 0.7beta3 on some boxes I have here at the > office and while checking out the stats from one of my tests I'm seeing Write > Latency being reported as "0.009 ms". I haven't done any timing yet in my > client, but is this really microsecond latency, or is there a mismatch > between the numeric and the label? Granted, I'm not loading the complex up at > all just writing with a single thread to play with pycassa so the cluster > doesn't have anything to do but handle my write, but I'd like to make sure > before I run off trying to talk my manager into something :-) > > Column Family: NameServer2Domain > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > Memtable Columns Count: 39718 > Memtable Data Size: 2531109 > Memtable Switch Count: 0 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 39718 > Write Latency: 0.009 ms. > Pending Tasks: 0 > Key cache capacity: 20 > Key cache size: 0 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > >
Non-Unique Indexes, How ?
Hi, I'm trying to work out a way to support a non-unique index. For example, lets say I have a contact list, where its possible to have Names that are the same but are for different people and so should have different contact entries but I'd want to be able to search on their full name and get a list of potential matches. In cassandra, as far as i know, column names and row keys need to be unique - so unless I some how construct a unique form of the full name to use as a column name or key value I'm left with using the column value (as opposed to the name) and the indexing facility in 0.7 - but its not clear to me whether the 0.7 index facility would support non-unique column values this way. e.g. CF: Contacts (with an index on 'fullname') key : id1 { fullname : "John Brown", address : "London" } key : id2 { fullname : "John Brown", address : "Paris"} Would the 0.7 index on fullname allow me to lookup the 2 entries if I searched on "John" or "John Brown" ? Regards Jason
rename column family with cassandra-cli in 0.7.0-beta3
Re: Non-Unique Indexes, How ?
On Wed, Nov 10, 2010 at 5:55 PM, J T wrote: > CF: Contacts (with an index on 'fullname') > key : id1 { fullname : "John Brown", address : "London" } > key : id2 { fullname : "John Brown", address : "Paris" } > Would the 0.7 index on fullname allow me to lookup the 2 entries if I > searched on "John" or "John Brown" ? Yes, the latter. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Non-Unique Indexes, How ?
Ok, so non-unique indexes are supported, but only full equality matches on the values are supported right now. Will it in the future allow for partial/range matches ? e.g. Find all contacts with a J as the first letter ? Jason On Thu, Nov 11, 2010 at 12:13 AM, Jonathan Ellis wrote: > On Wed, Nov 10, 2010 at 5:55 PM, J T wrote: > > CF: Contacts (with an index on 'fullname') > > key : id1 { fullname : "John Brown", address : "London" } > > key : id2 { fullname : "John Brown", address : "Paris"} > > Would the 0.7 index on fullname allow me to lookup the 2 entries if I > > searched on "John" or "John Brown" ? > > Yes, the latter. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: rename column family with cassandra-cli in 0.7.0-beta3
https://issues.apache.org/jira/browse/CASSANDRA-1630 On Wed, Nov 10, 2010 at 6:09 PM, gbanks wrote: > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Non-Unique Indexes, How ?
Yes. On Wed, Nov 10, 2010 at 6:39 PM, J T wrote: > Ok, so non-unique indexes are supported, but only full equality matches on > the values are supported right now. > Will it in the future allow for partial/range matches ? > > e.g. Find all contacts with a J as the first letter ? > Jason > On Thu, Nov 11, 2010 at 12:13 AM, Jonathan Ellis wrote: >> >> On Wed, Nov 10, 2010 at 5:55 PM, J T wrote: >> > CF: Contacts (with an index on 'fullname') >> > key : id1 { fullname : "John Brown", address : "London" } >> > key : id2 { fullname : "John Brown", address : "Paris" } >> > Would the 0.7 index on fullname allow me to lookup the 2 entries if I >> > searched on "John" or "John Brown" ? >> >> Yes, the latter. >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Unsubscribe
Warm regards, Vibhaw Rajan Application Developer-Mainframes IBM India Pvt. Ltd. DLF IT Park, Chennai, India Office +91 44 22723552 Mobile +91 996 253 3029 Email vibra...@in.ibm.com "Success is not final, failure is not fatal: it is the courage to continue that counts"
Re: WordCount example problem
Thanks, I'll do. P. On Wed, Nov 10, 2010 at 16:28, Aditya Muralidharan wrote: > Also, your Mapper class needs to look like this: > MyMapper extends Mapper IColumn>,Text,SumWritable> ... with all the necessary fixes to the map method. > > AD > > -Original Message- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Wednesday, November 10, 2010 8:40 AM > To: user > Subject: Re: WordCount example problem > > http://www.mail-archive.com/user@cassandra.apache.org/msg07093.html > > On Wed, Nov 10, 2010 at 5:47 AM, Patrik Modesto > wrote: >> Hi, >> >> I'm trying the WordCount example and getting this error: >> >> [12:33]$ ./bin/word_count >> 10/11/10 12:34:35 INFO WordCount: output reducer type: filesystem >> 10/11/10 12:34:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with >> processName=JobTracker, sessionId= >> 10/11/10 12:34:36 INFO WordCount: XXX:text0 >> 10/11/10 12:34:36 INFO mapred.JobClient: Running job: job_local_0001 >> 10/11/10 12:34:36 INFO mapred.MapTask: io.sort.mb = 100 >> 10/11/10 12:34:36 INFO mapred.MapTask: data buffer = 79691776/99614720 >> 10/11/10 12:34:36 INFO mapred.MapTask: record buffer = 262144/327680 >> 10/11/10 12:34:36 WARN mapred.LocalJobRunner: job_local_0001 >> java.lang.ClassCastException: java.nio.HeapByteBuffer cannot be cast to [B >> at WordCount$TokenizerMapper.map(WordCount.java:73) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> 10/11/10 12:34:37 INFO mapred.JobClient: map 0% reduce 0% >> 10/11/10 12:34:37 INFO mapred.JobClient: Job complete: job_local_0001 >> 10/11/10 12:34:37 INFO mapred.JobClient: Counters: 0 >> >> I'm using cassandra 0.7.0beta3 (from latest trunk) on just one >> machine. Is the example working for anybody? >> >> Thanks, >> P. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: WordCount example problem
That's exactly what's happening to me. I wonder why Google did't find it. Thanks! P. On Wed, Nov 10, 2010 at 15:39, Jonathan Ellis wrote: > http://www.mail-archive.com/user@cassandra.apache.org/msg07093.html > > On Wed, Nov 10, 2010 at 5:47 AM, Patrik Modesto > wrote: >> Hi, >> >> I'm trying the WordCount example and getting this error: >> >> [12:33]$ ./bin/word_count >> 10/11/10 12:34:35 INFO WordCount: output reducer type: filesystem >> 10/11/10 12:34:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with >> processName=JobTracker, sessionId= >> 10/11/10 12:34:36 INFO WordCount: XXX:text0 >> 10/11/10 12:34:36 INFO mapred.JobClient: Running job: job_local_0001 >> 10/11/10 12:34:36 INFO mapred.MapTask: io.sort.mb = 100 >> 10/11/10 12:34:36 INFO mapred.MapTask: data buffer = 79691776/99614720 >> 10/11/10 12:34:36 INFO mapred.MapTask: record buffer = 262144/327680 >> 10/11/10 12:34:36 WARN mapred.LocalJobRunner: job_local_0001 >> java.lang.ClassCastException: java.nio.HeapByteBuffer cannot be cast to [B >> at WordCount$TokenizerMapper.map(WordCount.java:73) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >> 10/11/10 12:34:37 INFO mapred.JobClient: map 0% reduce 0% >> 10/11/10 12:34:37 INFO mapred.JobClient: Job complete: job_local_0001 >> 10/11/10 12:34:37 INFO mapred.JobClient: Counters: 0 >> >> I'm using cassandra 0.7.0beta3 (from latest trunk) on just one >> machine. Is the example working for anybody? >> >> Thanks, >> P. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >