Re: Load balancing
Mubarak Seyed apple.com> writes: > > - How does client (application) connect to cassandra cluster? Is it always for one node (and thrift can get ring info) and send the request to connected node This depends on client library you use. Any cassandra node can accept client connections and forward request to node owning requested data. > - If we send 300k records from each node, it is a over kill for a node which accepts client connection, does > node get choked? Of course in your situation no single node can handle all load. So you have to connect to several nodes. The best way, I believe, is to connect right to the node, owning data you need. Take a look to org/apache/cassandra/client/RingCache.java for an example how to read ring state and forward requests to right node. > - How do we design a cassandra cluster to make sure that insert get distributed to more than one nodes? > - If i prefer OrderPreservingPartition as a partitioner, how does single node handle all the 200k records? If you prefer OPP, you have 2 ways (manual and automatic): 1. If you know distribution of keys in your data, you distribute token values between you nodes in a way, which ensures unform key distribution. Imagine, if you have single byte keys ranging from 0 to 255 and 64 nodes (i assume data is distributed uniformly across all keys for simplicity). For this you'll have to manually configure in storage-conf of 1st node to 0, 2nd = 4, 3rd = 8, 4th=12 and so on. 2. The automatic way is to start cassandra cluster with small node count, import data to it and bootstrap rest of nodes, specifying bootstrap=true and empty value for token in storage conf. This way cassandra will try to balance data by itself. 200k of records are not big deal for cassandra, IMHO, but of course this depends on your hardware and size of records. Anyway, good idea is to test your configuration with real data first.
Cassandra Multiple DataCenter Suitability - why?
Hello, I keep reading everywhere that Cassandra has supported multiple datacenters from the beginning. I would like to know what does Cassandra do to achieve that. Is it just that the developers have written some code that supports that scenario, or is there something inherent in Cassandra's design that is suitable for a multi DC environment, like minimizing inter-DC traffic? I have read about RackAwareStrategy on the wiki, and have also browsed through some code (DataCenterShardStrategy), but I would like to see what people have to say about this. I also read about an implemenetation of Rack Awareness employing Zookeeper, but I gather that wasn't released by Facebook and it was more geared towards single-DC rack awareness because Zookeeper is a bit heavy on the bandwidth. Anyway, just to sum it up, my question is this: please explain in brief the reasons why Cassandra is well suited for multi-DC environments. Alexander Altanis
Java-Client returns empty data sets after Cassandra sits for a while
Hi, I have noticed the following behaviour (bug?) which I don't completely understand: 1. start Cassandra (I'm using 0.6.2, but it also appears in 0.6.1) 2. work with it (I'm using Java thrift API) 3. let it sit for a long time (in my case: a day or more) without issuing any command 4. go back to (2) -- but now Cassandra always returns empty data sets to queries in Java. The command line interface works, no matter if left open or started newly. Here's how I connect to Cassandra (leaving exception handling out for better readability): - ... import org.apache.cassandra.thrift.Cassandra; import org.apache.thrift.protocol.TBinaryProtocol; import org.apache.thrift.protocol.TProtocol; import org.apache.thrift.transport.TSocket; ... TTransport transport = new TSocket(cassandraHost, cassandraPort); TProtocol protocol = new TBinaryProtocol(transport); Cassandra.Client client = new Cassandra.Client(protocol); transport.open(); ... List keySlices = client.get_range_slices(...); ... transport.flush(); transport.close(); ... - This code usually works, but after leaving Cassandra running unused for a couple of hours (days), this code connects fine to Cassandra, but the client.get_range_slices returns an empty result set. I am not very sure, but I believe it happens after compacting. Need to do more tests on this one. Does anybody know what I'm doing wrong here? Is there any kind of "initialisation step" that I should have taken before running queries? If you need more (debug) information on this matter, please let me know how I can provide you with it. The log files didn't show anything while running the query. The last log message was: INFO [COMPACTION-POOL:1] 2010-06-18 14:07:45,882 CompactionManager.java (line 246) Compacting [] I ran the query at around 14:20, no other message after this one. Thanks for your help in advance! Cheers, Manfred -- Dr. Manfred Muench Nanjing Imperiosus Technology Co. Ltd. Wu Xing Nian Hua Da Sha, Room 1004 134 Hanzhong Lu, Nanjing, P.R. China
Re: AVRO client API
On Fri, 2010-06-18 at 12:27 +0530, Atul Gosain wrote: > Is the client API for cassandra available in AVRO. Significant parts of it, but it is not yet finished. > If so, any links to examples or some documentation? There is no samples or documentation yet, sorry. > and If so, any comparison between Thrift and Avro API's to determine > the better of them? The Plan is to develop enough critical mass around the Avro API that Thrift can be deprecated. We don't want to maintain more than one of these long-term. -- Eric Evans eev...@rackspace.com
Re: ec2 tests
Hi all, @Chris, Did you get any bench you could share with us? I am running the same kind of test on EC2 (m.large instances) : - one VM for stress.py (can be launched several times) - another VM for a unique cassandra node I use the default conf settings (Xmx 1G, concurrentwrite 32...) except for commitlog and DataFileDirectory : I have a raid0 EBS for commit log and another raid0 EBS for data. I can't get through 7500 write/sec (when launching 4 stress.py in the same time). Moreover I can see some pending tasks in the org.cassandra.db.ColumnFamilyStores.Keyspace1.Standard1 MBean Any ideas on the bottleneck? Thanks a lot. oliv/ On Fri, May 28, 2010 at 5:14 PM, gabriele renzi wrote: > On Fri, May 28, 2010 at 3:48 PM, Mark Greene wrote: > > First thing I would do is stripe your EBS volumes. I've seen blogs that > say > > this helps and blogs that say it's fairly marginal. > > > just to point out: another option is to stripe the ephemeral drives > (if using instances > small) > -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Failover and slow nodes
Our cassandra client fails over if a node times out. Aside from actual failure, repair and major compactions can make a node so slow that it affects application performance. One problem we've run in to is that a node in the midst of repair will still have requests routed to it internally, even if all clients have failed over. With a small number of nodes, this has a major impact on the performance of the overall system. I'm wondering whether people have any recommendations on tuning this behaviour. It would be really nice not to route requests to an insanely slow node.
Re: read operation is slow
Would it perhaps be worth denormalising your data so that you can retrieve all rows as a single row using a key encoded with the query predicate? Until we get a stored proc feature (dunno if planned) it's hard to avoid round trips without denormalizing/replication of data to fit your query paths Simon Reavely On Jun 11, 2010, at 9:49 PM, "caribbean410" wrote: Thanks for the suggestion. For the test case, it is 1 key and 1 column. I once changed 10 to 1, as I remember there is no much difference. I have 200k keys and each key is randomly generated. I will try the optimized query next week. But maybe you still have to face the case that each time a client just wants to query one key from db. From: Dop Sun [mailto:su...@dopsun.com] Sent: Friday, June 11, 2010 6:05 PM To: user@cassandra.apache.org Subject: RE: read operation is slow And also, you are only select 1 key and 10 columns? criteria.keyList(Lists.newArrayList(userName)).columnRange (nameFirst, nameFirst, 10); Then, if you have 200k keys, you have 200k Thrift calls. If this is the case, you may need to optimize the way you do the query (to combine multiple keys into a single query), and to reduce the number of calls. From: Dop Sun [mailto:su...@dopsun.com] Sent: Saturday, June 12, 2010 8:57 AM To: user@cassandra.apache.org Subject: RE: read operation is slow You mean after you “I remove some unnecessary column family and chan ge the size of rowcache and keycache, now the latency changes from 0 .25ms to 0.09ms. In essence 0.09ms*200k=18s.”, it still takes 400 se conds to returning? From: Caribbean410 [mailto:caribbean...@gmail.com] Sent: Saturday, June 12, 2010 8:48 AM To: user@cassandra.apache.org Subject: Re: read operation is slow Hi, do you mean this one should not introduce much extra delay? To read a record, I need select here, not sure where the extra delay comes from. On Fri, Jun 11, 2010 at 5:29 PM, Dop Sun wrote: Jassandra is used here: Map> map = criteria.select(); The select here basically is a call to Thrift API: get_range_slices From: Caribbean410 [mailto:caribbean...@gmail.com] Sent: Saturday, June 12, 2010 8:00 AM To: user@cassandra.apache.org Subject: Re: read operation is slow I remove some unnecessary column family and change the size of rowcache and keycache, now the latency changes from 0.25ms to 0.09ms. In essence 0.09ms*200k=18s. I don't know why it takes more than 400s total. Here is the client code and cfstats. There are not many operations here, why is the extra time so large? long start = System.currentTimeMillis(); for (int j = 0; j < 1; j++) { for (int i = 0; i < numOfRecords; i++) { int n = random.nextInt(numOfRecords); ICriteria criteria = cf.createCriteria(); userName = keySet[n]; criteria.keyList(Lists.newArrayList (userName)).columnRange(nameFirst, nameFirst, 10); Map> map = criteria.select(); List list = map.get(userName); // ByteArray bloc = list.get(0).getValue(); // byte[] byteArrayloc = bloc.toByteArray(); // loc = new String(byteArrayloc); // readBytes = readBytes + loc.length(); readBytes = readBytes + blobSize; } } long finish=System.currentTimeMillis(); float totalTime=(finish-start)/1000; Keyspace: Keyspace1 Read Count: 60 Read Latency: 0.090530067 ms. Write Count: 20 Write Latency: 0.01504989 ms. Pending Tasks: 0 Column Family: Standard2 SSTable count: 3 Space used (live): 265990358 Space used (total): 265990358 Memtable Columns Count: 2615 Memtable Data Size: 2667300 Memtable Switch Count: 3 Read Count: 60 Read Latency: 0.091 ms. Write Count: 20 Write Latency: 0.015 ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 187465 Key cache hit rate: 0.0 Row cache capacity: 1000 Row cache size: 189990 Row cache hit rate: 0.68335 Compacted row minimum size: 0 Compacted row maximum size: 0 Compacted row mean size: 0 Keyspace: system Read Count: 1 Read Latency: 10.954 ms. Write Count: 4 Write Latency: 0.28075 ms. Pending Tasks: 0 Column Family: HintsColumnFamily SSTable count: 0 Space used (live): 0 Space used (total): 0 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1
Re: extending multiget
Assume a map/reduce program which needs to update some values during ingest, and needs to perform read operations on 100 keys each of which have say 50 different columns. This happens many times for a given reduce task in the cluster. Shouldn't that be handled by the server as a single call? On Thu, Jun 17, 2010 at 5:54 PM, Jonathan Ellis wrote: > No. At that point you basically have no overhead advantage vs just > doing multiple single-row requests. > > On Thu, Jun 17, 2010 at 2:39 PM, Sonny Heer wrote: >> Any plans for this sort of call? >> >> >> Instead of: >> >> public Map> multiget_slice(String >> keyspace, List keys, ColumnParent column_parent, >> SlicePredicate predicate, ConsistencyLevel consistency_level) throws >> InvalidRequestException, UnavailableException, TimedOutException, >> TException; >> >> --- >> >> public Map> multiget_slice(String >> keyspace, Map> keyColNames, ColumnParent >> column_parent, ConsistencyLevel consistency_level) throws >> InvalidRequestException, UnavailableException, TimedOutException, >> TException; >> >> --- >> >> where the keyColNames explicitly maps which column names to retrieve >> for a given key, instead of a column slice on all keys. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: what is the best way to truncate a column family
In 0.6 your only option with those constraints is to iterate over the entire CF and deleting row by row. This requires you are either using OPP or have an index that covers all keys in the CF. 0.7 adds the ability to truncate a CF (deleting all its rows) through the API. On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang wrote: > programmatically w/o bringing the servers down. > > thanks, > claire >
Re: Failover and slow nodes
Would be interesting to have a snitch that manipulated responses for read nodes based on historical response times. On Fri, Jun 18, 2010 at 8:21 AM, James Golick wrote: > Our cassandra client fails over if a node times out. Aside from actual > failure, repair and major compactions can make a node so slow that it > affects application performance. > One problem we've run in to is that a node in the midst of repair will still > have requests routed to it internally, even if all clients have failed over. > With a small number of nodes, this has a major impact on the performance of > the overall system. > I'm wondering whether people have any recommendations on tuning this > behaviour. It would be really nice not to route requests to an insanely slow > node.
Re: Failover and slow nodes
See https://issues.apache.org/jira/browse/CASSANDRA-981 -Original Message- From: "Benjamin Black" Sent: Friday, June 18, 2010 12:32pm To: user@cassandra.apache.org Subject: Re: Failover and slow nodes Would be interesting to have a snitch that manipulated responses for read nodes based on historical response times. On Fri, Jun 18, 2010 at 8:21 AM, James Golick wrote: > Our cassandra client fails over if a node times out. Aside from actual > failure, repair and major compactions can make a node so slow that it > affects application performance. > One problem we've run in to is that a node in the midst of repair will still > have requests routed to it internally, even if all clients have failed over. > With a small number of nodes, this has a major impact on the performance of > the overall system. > I'm wondering whether people have any recommendations on tuning this > behaviour. It would be really nice not to route requests to an insanely slow > node.
Re: Occasional 10s Timeouts on Read
To summarize: If a request for a column comes in *after a period of several hours with no requests*, then the node servicing the request hangs while looking for its peer rather than servicing the request like it should. It then throws either a TimedOutException or a (wrong) NotFoundExeption. And it doen't appear to actually send the message it says it does to its peer. Or at least its peer doesn't report the request being received. And then the situation magically clears up after approximately 2 minutes. However, if the idle period never occurs, then the problem does not manifest. If I run a cron job with wget against my server every minute, I do not see the problem. I'll be looking at some tcpdump logs to see if i can suss out what's really happening, and perhaps file this as a bug. The several hours between reproducible events makes this whole thing aggravating for detection, debugging and I'll assume, fixing, if it is indeed a cassandra problem. It was suggested on IRC that it may be my network. But gossip is continually sending heartbeats and nodetool and the logs show the nodes as up and available. If my network was flaking out I'd think it would be dropping heartbeats and I'd see that. AJ On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater wrote: > These are physical machines. > > storage-conf.xml.fs03 is here: > > http://pastebin.com/weL41NB1 > > Diffs from that for the other two storage-confs are inline here: > > a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 > storage-conf.xml.fs01 > 185c185 > >> 71603818521973537678586548668074777838 > 229c229 > < 10.33.2.70 > --- >> 10.33.3.10 > 241c241 > < 10.33.2.70 > --- >> 10.33.3.10 > 341c341 > < 16 > --- >> 4 > > > a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 > storage-conf.xml.fs02 > 185c185 > < 0 > --- >> 120215585224964746744782921158327379306 > 206d205 > < 10.33.3.20 > 229c228 > < 10.33.2.70 > --- >> 10.33.3.20 > 241c240 > < 10.33.2.70 > --- >> 10.33.3.20 > 341c340 > < 16 > --- >> 4 > > > Thank you for your attention, > > AJ > > > On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black wrote: >> Are these physical machines or virtuals? Did you post your >> cassandra.in.sh and storage-conf.xml someplace? >> >> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater wrote: >>> Total data size in the entire cluster is about twenty 12k images. With >>> no other load on the system. I just ask for one column and I get these >>> timeouts. Performing multiple gets on the columns leads to multiple >>> timeouts for a period of a few seconds or minutes and then the >>> situation magically resolves itself and response times are down to >>> single digit milliseconds for a column get. >>> >>> On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater wrote: Cassandra 0.6.2 from the apache debian source. Ubunutu Jaunty. Sun Java6 jvm. All nodes in separate racks at 365 main. On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater wrote: > I'm seing 10s timeouts on reads few times a day. Its hard to reproduce > consistently but seems to happen most often after its been a long time > between reads. After presenting itself for a couple minutes the > problem then goes away. > > I've got a three node cluster with replication factor 2, reading at > consistency level ONE. The columns being read are around 12k each. The > nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB. > > Here's an application log from early this morning when a developer in > Belgrade accessed the system: > > Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 | > Requested image_id: 5827067133c3d670071c17d9144f0b49 > Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 | > TimedOutException for Image 5827067133c3d670071c17d9144f0b49 > Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image > Get took 10005.388975 ms > Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 | > Requested image_id: af8caf3b76ce97d13812ddf795104a5c > Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image > Get took 3.658056 ms > Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image > Transform took 0.978947 ms > > That's a Timeout and then a successful get of another column. > > Here's the cassandra log for 10.33.2.70: > > DEBUG 03:54:17,070 get_slice > DEBUG 03:54:17,071 weakreadremote reading > SliceFromReadCommand(table='jolitics.com', > key='5827067133c3d670071c17d9144f0b49', > column_parent='QueryPath(columnFamilyName='Images', > superColumnName='null', columnName='null')', start='', finish=' > ', reversed=false, count=100) > DEBUG 03:54:17,071 weakreadremote reading > SliceFromReadCommand(table='jolitics.com', > key='5827067133c3d670071c17d9144f0b49', > column_parent='QueryPath(columnFamilyName='Images', > superCol
Re: Cassandra Multiple DataCenter Suitability - why?
On 06/18/2010 01:20 AM, alta...@ceid.upatras.gr wrote: I also read about an implemenetation of Rack Awareness employing Zookeeper, but I gather that wasn't released by Facebook and it was more geared towards single-DC rack awareness because Zookeeper is a bit heavy on the bandwidth. Bandwidth is not the issue with a cross-colo ZooKeeper ensemble -- latency is the issue. ZK is a quorum based service, a majority of the servers need to agree to every change (writes, reads are serviced locally by the server and don't face this issue). If the latency between servers is high then write operations will take longer. Generally this is "4L", so if you have 10ms latency btw colos it will take 40ms for a write to complete, if you have 100ms latency btw colos it will take 400ms, etc... This is not an issue for "in colo" deployments since latency is typically very low. If you are using ZK for high level coordination then 100ms latency might not be bad, if you are using ZK for fine grained sharding it might be... Patrick
Re: ec2 tests
On Fri, Jun 18, 2010 at 8:00 AM, Olivier Mallassi wrote: > I use the default conf settings (Xmx 1G, concurrentwrite 32...) except for > commitlog and DataFileDirectory : I have a raid0 EBS for commit log and > another raid0 EBS for data. > I can't get through 7500 write/sec (when launching 4 stress.py in the same > time). > Moreover I can see some pending tasks in the > org.cassandra.db.ColumnFamilyStores.Keyspace1.Standard1 MBean > Any ideas on the bottleneck? Your instance has 7.5G of RAM, but you are limiting Cassandra to 1G. Increase -Xmx to 4G for a start. You are likely to get significantly better performance with the ephemeral drive, as well. I suggest testing with commitlog on the ephemeral drive for comparison. b
Re: Failover and slow nodes
Perfect, ship it. On Fri, Jun 18, 2010 at 10:37 AM, Stu Hood wrote: > See https://issues.apache.org/jira/browse/CASSANDRA-981 > > -Original Message- > From: "Benjamin Black" > Sent: Friday, June 18, 2010 12:32pm > To: user@cassandra.apache.org > Subject: Re: Failover and slow nodes > > Would be interesting to have a snitch that manipulated responses for > read nodes based on historical response times. > > On Fri, Jun 18, 2010 at 8:21 AM, James Golick wrote: >> Our cassandra client fails over if a node times out. Aside from actual >> failure, repair and major compactions can make a node so slow that it >> affects application performance. >> One problem we've run in to is that a node in the midst of repair will still >> have requests routed to it internally, even if all clients have failed over. >> With a small number of nodes, this has a major impact on the performance of >> the overall system. >> I'm wondering whether people have any recommendations on tuning this >> behaviour. It would be really nice not to route requests to an insanely slow >> node. > > >
Re: AVRO client API
On Jun 18, 2010, at 8:01 AM, Eric Evans wrote: > On Fri, 2010-06-18 at 12:27 +0530, Atul Gosain wrote: >> Is the client API for cassandra available in AVRO. > Significant parts of it, but it is not yet finished. >> If so, any links to examples or some documentation? > There is no samples or documentation yet, sorry. >> and If so, any comparison between Thrift and Avro API's to determine >> the better of them? > The Plan is to develop enough critical mass around the Avro API that > Thrift can be deprecated. We don't want to maintain more than one of > these long-term. At the risk of asking about religion (but with no interest in hearing about it), why Avro instead of something like plain-old-JSON over HTTP? -- Paul
Re: what is the best way to truncate a column family
I have been reminded that you can do a range query+pagination with RP in 0.6 to perform this operation. On Fri, Jun 18, 2010 at 10:29 AM, Benjamin Black wrote: > In 0.6 your only option with those constraints is to iterate over the > entire CF and deleting row by row. This requires you are either using > OPP or have an index that covers all keys in the CF. 0.7 adds the > ability to truncate a CF (deleting all its rows) through the API. > > On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang > wrote: >> programmatically w/o bringing the servers down. >> >> thanks, >> claire >> >
Re: Failover and slow nodes
What's the current timeframe on 0.7? On Fri, Jun 18, 2010 at 1:45 PM, Benjamin Black wrote: > Perfect, ship it. > > On Fri, Jun 18, 2010 at 10:37 AM, Stu Hood wrote: > > See https://issues.apache.org/jira/browse/CASSANDRA-981 > > > > -Original Message- > > From: "Benjamin Black" > > Sent: Friday, June 18, 2010 12:32pm > > To: user@cassandra.apache.org > > Subject: Re: Failover and slow nodes > > > > Would be interesting to have a snitch that manipulated responses for > > read nodes based on historical response times. > > > > On Fri, Jun 18, 2010 at 8:21 AM, James Golick > wrote: > >> Our cassandra client fails over if a node times out. Aside from actual > >> failure, repair and major compactions can make a node so slow that it > >> affects application performance. > >> One problem we've run in to is that a node in the midst of repair will > still > >> have requests routed to it internally, even if all clients have failed > over. > >> With a small number of nodes, this has a major impact on the performance > of > >> the overall system. > >> I'm wondering whether people have any recommendations on tuning this > >> behaviour. It would be really nice not to route requests to an insanely > slow > >> node. > > > > > > >
Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type
*Hopefully* fixed. I was never able to duplicate the problem on my workstation, but I had a pretty good idea what was causing the problem. Julie, if you're in a position to apply and test the fix, it would help help us make sure we've got this one nailed down. Gary. On Thu, Jun 17, 2010 at 00:33, Jonathan Ellis wrote: > That is consistent with the > https://issues.apache.org/jira/browse/CASSANDRA-1169 bug I mentioned, > that is fixed in the 0.6 svn branch. > > On Wed, Jun 16, 2010 at 10:51 PM, Julie wrote: >> The loop is in IncomingStreamReader.java, line 62, a 3-line while loop. >> bytesRead is not changing. pendingFile.getExpectedBytes() returns >> 7,161,538,639 but bytesRead is stuck at 2,147,483,647. >>
Re: what is the best way to truncate a column family
In 0.6.x the iterating approach works ... but you need to flush and compact (after GCGraceSeconds) in order to NOT see the keys in the CF. Will the behavior of the truncate method in 0.7 require flush/compact as well? Or will it be immediate? -phil On Jun 18, 2010, at 1:29 PM, Benjamin Black wrote: > In 0.6 your only option with those constraints is to iterate over the > entire CF and deleting row by row. This requires you are either using > OPP or have an index that covers all keys in the CF. 0.7 adds the > ability to truncate a CF (deleting all its rows) through the API. > > On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang > wrote: >> programmatically w/o bringing the servers down. >> >> thanks, >> claire >>
Re: what is the best way to truncate a column family
it will be immediate. But it will fail if not all hosts in the cluster are up, this is the tradeoff. We regard the truncate operation an admin api so I think it's a fair tradeoff. On Fri, Jun 18, 2010 at 11:50 PM, Phil Stanhope wrote: > In 0.6.x the iterating approach works ... but you need to flush and compact > (after GCGraceSeconds) in order to NOT see the keys in the CF. > > Will the behavior of the truncate method in 0.7 require flush/compact as > well? Or will it be immediate? > > -phil > > On Jun 18, 2010, at 1:29 PM, Benjamin Black wrote: > > > In 0.6 your only option with those constraints is to iterate over the > > entire CF and deleting row by row. This requires you are either using > > OPP or have an index that covers all keys in the CF. 0.7 adds the > > ability to truncate a CF (deleting all its rows) through the API. > > > > On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang > > wrote: > >> programmatically w/o bringing the servers down. > >> > >> thanks, > >> claire > >> > >
Re: what is the best way to truncate a column family
I am happy with this restriction on truncate operation for 0.7. Thanks for the quick response. -phil On Jun 18, 2010, at 4:57 PM, Ran Tavory wrote: > it will be immediate. > But it will fail if not all hosts in the cluster are up, this is the > tradeoff. We regard the truncate operation an admin api so I think it's a > fair tradeoff. > > On Fri, Jun 18, 2010 at 11:50 PM, Phil Stanhope wrote: > In 0.6.x the iterating approach works ... but you need to flush and compact > (after GCGraceSeconds) in order to NOT see the keys in the CF. > > Will the behavior of the truncate method in 0.7 require flush/compact as > well? Or will it be immediate? > > -phil > > On Jun 18, 2010, at 1:29 PM, Benjamin Black wrote: > > > In 0.6 your only option with those constraints is to iterate over the > > entire CF and deleting row by row. This requires you are either using > > OPP or have an index that covers all keys in the CF. 0.7 adds the > > ability to truncate a CF (deleting all its rows) through the API. > > > > On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang > > wrote: > >> programmatically w/o bringing the servers down. > >> > >> thanks, > >> claire > >> > >
Re: AVRO client API
On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote: > At the risk of asking about religion (but with no interest in hearing > about it), why Avro instead of something like plain-old-JSON over > HTTP? At the risk of having this thread veer off on a very long tangent... In a nutshell, we need a way of processing requests and responses over the network with typed data. You could of course put something together to do this using JSON and HTTP, but not without reimplementing another framework like Avro or Thrift (both of which can do JSON encoding, and both of which have an HTTP transport). -- Eric Evans eev...@rackspace.com
Cassandra dinner in Austin
As mentioned in the #cassandra IRC channel - there's going to be a dinner in Austin on July 15th for people interested in Cassandra. For those interested: http://cassandradinneraustin.eventbrite.com/ (Sorry if this doesn't apply to everyone, but everyone is welcome :)
Possible bug in Cassandra MapReduce
We are using MapReduce to periodical verify and rebuild our secondary indexes along with counting total records. We started to noticed double counting of unique keys on single machine standalone tests. We were finally able to reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it multiple times. We are hoping someone can verify the bug. re-run the tests and the word count for /tmp/word_count3/part-r-0 will be 1000 +~200 and will change if you blow the data away and re-run. Notice the setup script loops and only inserts 1000 records so we expect count to be 1000. Once the data is generated then re-running the setup script and/or mapreduce doesn't change the number (still off). The key is to blow all the data away and start over which will cause it to change. Can someone please verify this behavior? -Corey
Re: Possible bug in Cassandra MapReduce
"blow all the data away" ... how do you do that? What is the timestamp precision that you are using when creating key/col or key/supercol/col items? I have seen a fail to write a key when the timestamp is identical to the previous timestamp of a deleted key/col. While I didn't examine the source code, I'm certain that this is do to delete tombstones. I view this as a application error because I was attempting to do this within the GCGraceSeconds time period. If I, however, stopped cassandra, blew away data & commitlogs and restarted the write always succeeds (no surprise there). I turned this behavior into a feature (of sorts). When this happens I increment a formally non-zero portion of the timestamp (the last digit of precision which was always zero) and use this as a counter to track how many times a key/col was updated (max 9 for my purposes). -phil On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote: > > We are using MapReduce to periodical verify and rebuild our secondary indexes > along with counting total records. We started to noticed double counting of > unique keys on single machine standalone tests. We were finally able to > reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count > example and just re-running it multiple times. We are hoping someone can > verify the bug. > > re-run the tests and the word count for /tmp/word_count3/part-r-0 will be > 1000 +~200 and will change if you blow the data away and re-run. Notice the > setup script loops and only inserts 1000 records so we expect count to be > 1000. Once the data is generated then re-running the setup script and/or > mapreduce doesn't change the number (still off). The key is to blow all the > data away and start over which will cause it to change. > > Can someone please verify this behavior? > > -Corey
Re: Possible bug in Cassandra MapReduce
I thought the same thing, but using the supplied contrib example I just delete the /var/lib/data dirs and commit log. -Corey On Fri, Jun 18, 2010 at 3:11 PM, Phil Stanhope wrote: > "blow all the data away" ... how do you do that? What is the timestamp > precision that you are using when creating key/col or key/supercol/col > items? > > I have seen a fail to write a key when the timestamp is identical to the > previous timestamp of a deleted key/col. While I didn't examine the source > code, I'm certain that this is do to delete tombstones. > > I view this as a application error because I was attempting to do this > within the GCGraceSeconds time period. If I, however, stopped cassandra, > blew away data & commitlogs and restarted the write always succeeds (no > surprise there). > > I turned this behavior into a feature (of sorts). When this happens I > increment a formally non-zero portion of the timestamp (the last digit of > precision which was always zero) and use this as a counter to track how many > times a key/col was updated (max 9 for my purposes). > > -phil > > On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote: > > > > > We are using MapReduce to periodical verify and rebuild our secondary > indexes along with counting total records. We started to noticed double > counting of unique keys on single machine standalone tests. We were finally > able to reproduce the problem using the > apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it > multiple times. We are hoping someone can verify the bug. > > > > re-run the tests and the word count for /tmp/word_count3/part-r-0 > will be 1000 +~200 and will change if you blow the data away and re-run. > Notice the setup script loops and only inserts 1000 records so we expect > count to be 1000. Once the data is generated then re-running the setup > script and/or mapreduce doesn't change the number (still off). The key is > to blow all the data away and start over which will cause it to change. > > > > Can someone please verify this behavior? > > > > -Corey > >
Re: AVRO client API
On Jun 18, 2010, at 2:12 PM, Eric Evans wrote: > On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote: >> At the risk of asking about religion (but with no interest in hearing >> about it), why Avro instead of something like plain-old-JSON over >> HTTP? > At the risk of having this thread veer off on a very long tangent... > In a nutshell, we need a way of processing requests and responses over > the network with typed data. You could of course put something together > to do this using JSON and HTTP, but not without reimplementing another > framework like Avro or Thrift (both of which can do JSON encoding, and > both of which have an HTTP transport). "Rich, natively-provided types" is a fair answer; I was more interested in motivation that making a value judgement. Cheers. -- Paul
Re: Possible bug in Cassandra MapReduce
OK...I just verified on a clean EC2 small single instance box using apache-cassandra-0.6.2-src. I'm pertty sure the Cassandra MapReduce functionality is broken. If your MapReduce jobs are idempotent then you are OK, but if you are doing things like word count (as in the supplied example) or key count you will get double counts. -Corey On Fri, Jun 18, 2010 at 3:15 PM, Corey Hulen wrote: > > I thought the same thing, but using the supplied contrib example I just > delete the /var/lib/data dirs and commit log. > > -Corey > > > > > On Fri, Jun 18, 2010 at 3:11 PM, Phil Stanhope wrote: > >> "blow all the data away" ... how do you do that? What is the timestamp >> precision that you are using when creating key/col or key/supercol/col >> items? >> >> I have seen a fail to write a key when the timestamp is identical to the >> previous timestamp of a deleted key/col. While I didn't examine the source >> code, I'm certain that this is do to delete tombstones. >> >> I view this as a application error because I was attempting to do this >> within the GCGraceSeconds time period. If I, however, stopped cassandra, >> blew away data & commitlogs and restarted the write always succeeds (no >> surprise there). >> >> I turned this behavior into a feature (of sorts). When this happens I >> increment a formally non-zero portion of the timestamp (the last digit of >> precision which was always zero) and use this as a counter to track how many >> times a key/col was updated (max 9 for my purposes). >> >> -phil >> >> On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote: >> >> > >> > We are using MapReduce to periodical verify and rebuild our secondary >> indexes along with counting total records. We started to noticed double >> counting of unique keys on single machine standalone tests. We were finally >> able to reproduce the problem using the >> apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it >> multiple times. We are hoping someone can verify the bug. >> > >> > re-run the tests and the word count for /tmp/word_count3/part-r-0 >> will be 1000 +~200 and will change if you blow the data away and re-run. >> Notice the setup script loops and only inserts 1000 records so we expect >> count to be 1000. Once the data is generated then re-running the setup >> script and/or mapreduce doesn't change the number (still off). The key is >> to blow all the data away and start over which will cause it to change. >> > >> > Can someone please verify this behavior? >> > >> > -Corey >> >> >
Re: AVRO client API
i'll jump in ... why AVRO over Thrift. can you guys point me at a comparison? (i know next to nothing about both of them) On 06/18/2010 03:41 PM, Paul Brown wrote: On Jun 18, 2010, at 2:12 PM, Eric Evans wrote: On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote: At the risk of asking about religion (but with no interest in hearing about it), why Avro instead of something like plain-old-JSON over HTTP? At the risk of having this thread veer off on a very long tangent... In a nutshell, we need a way of processing requests and responses over the network with typed data. You could of course put something together to do this using JSON and HTTP, but not without reimplementing another framework like Avro or Thrift (both of which can do JSON encoding, and both of which have an HTTP transport). "Rich, natively-provided types" is a fair answer; I was more interested in motivation that making a value judgement. Cheers. -- Paul
Re: AVRO client API
On Fri, Jun 18, 2010 at 2:12 PM, Eric Evans wrote: > On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote: >> At the risk of asking about religion (but with no interest in hearing >> about it), why Avro instead of something like plain-old-JSON over >> HTTP? > > At the risk of having this thread veer off on a very long tangent... > > In a nutshell, we need a way of processing requests and responses over > the network with typed data. You could of course put something together > to do this using JSON and HTTP, but not without reimplementing another > framework like Avro or Thrift (both of which can do JSON encoding, and > both of which have an HTTP transport). Not that I wanted to criticize choices, but do they actually allow use of JSON as encoding? Avro does use JSON for specifying schemas, but I wasn't aware of being able to use it for encoding data. Likewise with Thrift. I think there's also important question of whether schemas/formatting choice for payload should follow that of framing. Avro/Thrift/PB seem reasonable for framing, use by protocol itself; but for open payload it might make sense to allow different pluggable formats. Mostly because Avro/Thrift/PB are schema-bound formats which is not an optimal choice for many use cases (but are fine for many others) It is of course possible to just use byte[]/String as payload, handle encoding and decoding on client end, and maybe that's how it should be, for cases where strict schema doesn't work for use cases. -+ Tatu +-
Re: AVRO client API
On Fri, Jun 18, 2010 at 6:23 PM, Tatu Saloranta wrote: > Not that I wanted to criticize choices, but do they actually allow use > of JSON as encoding? > Avro does use JSON for specifying schemas, but I wasn't aware of being > able to use it for encoding data. > Likewise with Thrift. > Yes, each supports a JSON data encoding. See http://avro.apache.org/docs/1.3.3/spec.html#json_encoding for Avro and the JSONProtocol in Thrift. One clear advantage of these two is that they support either stringified JSON or a compact binary encoding, and that they each support (or intend to support) a more efficient TCP-based protocol instead of only allowing HTTP. Re: Avro vs Thrift, Cassandra has historically had difficulty getting Thrift bugs fixed and Avro is more malleable at this point. Additionally, Avro has the potential for a more compact encoding and easier integration with dynamic languages.
Re: AVRO client API
On Fri, Jun 18, 2010 at 4:57 PM, Miguel Verde wrote: > On Fri, Jun 18, 2010 at 6:23 PM, Tatu Saloranta > wrote: >> >> Not that I wanted to criticize choices, but do they actually allow use >> of JSON as encoding? >> Avro does use JSON for specifying schemas, but I wasn't aware of being >> able to use it for encoding data. >> Likewise with Thrift. > > Yes, each supports a JSON data encoding. See > http://avro.apache.org/docs/1.3.3/spec.html#json_encoding for Avro and the > JSONProtocol in Thrift. One clear advantage of these two is that they Ok thanks. I learnt something new today. :-) > support either stringified JSON or a compact binary encoding, and that they > each support (or intend to support) a more efficient TCP-based protocol > instead of only allowing HTTP. Right. Latter is actually useful, then, as that would suggest possibility of using alternative binary encodings with other pieces (schema definition, protocol handling) (encoding that supports their respective data sets) > Re: Avro vs Thrift, Cassandra has historically had difficulty getting Thrift > bugs fixed and Avro is more malleable at this point. Additionally, Avro has > the potential for a more compact encoding and easier integration with > dynamic languages. Yes, that has been my impression as well, so I was not surprised to see plans for this change. Although I have been interested in learning more about progress, to know when would new versions be available. -+ Tatu +-
Re: ec2 tests
I tried the following : - always one cassandra node on one EC2 m.large instance. two other m.large instance, I run 4 stress.py (50 thread each, 2 stress.py on each instance) - RAID0 EBS for data and ephemeral EBS (/dev/sda1 partition) for commit log. - -Xmx4G and I did not see any improvements (Cassandra stays around 7000 W/sec). CPU is running up to 130% (spike) but I have two 2,5Ghz CPU the avgqu-sz goes up to 20 (sometimes more) (for the device /dev/sda1 that stores the commitlog) Do you think concurrentWrites or MemtableThroughputInMB parameters must be increased (using default value right now) Any suggestions are welcomed. ;o) On Fri, Jun 18, 2010 at 7:42 PM, Benjamin Black wrote: > On Fri, Jun 18, 2010 at 8:00 AM, Olivier Mallassi > wrote: > > I use the default conf settings (Xmx 1G, concurrentwrite 32...) except > for > > commitlog and DataFileDirectory : I have a raid0 EBS for commit log and > > another raid0 EBS for data. > > I can't get through 7500 write/sec (when launching 4 stress.py in the > same > > time). > > Moreover I can see some pending tasks in the > > org.cassandra.db.ColumnFamilyStores.Keyspace1.Standard1 MBean > > Any ideas on the bottleneck? > > Your instance has 7.5G of RAM, but you are limiting Cassandra to 1G. > Increase -Xmx to 4G for a start. You are likely to get significantly > better performance with the ephemeral drive, as well. I suggest > testing with commitlog on the ephemeral drive for comparison. > > > b > -- Olivier Mallassi OCTO Technology 50, Avenue des Champs-Elysées 75008 Paris Mobile: (33) 6 28 70 26 61 Tél: (33) 1 58 56 10 00 Fax: (33) 1 58 56 10 01 http://www.octo.com Octo Talks! http://blog.octo.com
Re: ec2 tests
On Jun 18, 2010, at 6:39 PM, Olivier Mallassi wrote: > and I did not see any improvements (Cassandra stays around 7000 W/sec). It's a brave new world where N+1 scaling with 7,000 writes per second per node is considered suboptimal performance. --Joe
Learning-by-doing (also announcing a new Ruby Client Codename: "Greek Architect")
Howdy! So, last week I finally got around to playing with Cassandra. After a while I understood the basics. To test this assumption I started working on my own Client implementation since "Learning-by-doing" is what I do and existing Ruby Clients (which are awesome) already abstracted too much for me to really grasp what was going on. Java is not really my thing (anymore) so I began with the Thrift API and Ruby. Anyways back to Topic. This library is now is available at: http://github.com/thheller/greek_architect Since I have virtually no experience with Cassandra (but plenty with SQL) I started with the first use-case which I have programmed a bunch of times before. User Management. I build websites which are used by other people, so I need to store them somewhere. Step #1: Creating Users and persisting them in Cassandra Example here: http://github.com/thheller/greek_architect/blob/master/spec/examples/user_create_spec.rb I hope my rspec-style documentation doesnt confuse too many people since I already have a gazillion questions for this simple, but also VERY common use-case. Since a question is best asked with a concrete example to refer to, here goes my first one: Would any of you veterans build what I built the way I did? (refering to the cassandra design, not the ruby client) I insert Users with UUID keys into one ColumnFamily. I then index them by creating a row in another ColumnFamily using the Name as Key and then adding one column holding a reference to the User UUID. I also insert a reference into another ColumnFamily holding a List of Users partitioned by Date. I'm really unsure about the index design, since they dont get updated when a User row is removed. I could hook into the remove call (like I did into mutations) and cascade the deletes where needed, but 10+ years of SQL always want to tell me I'm crazy for doing this stuff! I'd really appreciate some feedback. Cheers, Thomas
Re: ec2 tests
> @Chris, Did you get any bench you could share with us? We're still working on it. It's a lower priority task so it will take a while to finish. So far we've run on all the AWS data centers in the US and used several different setups. We also did a test on Rackspace with one setup and some whitebox servers we had in the office. (The whitebox servers are still running I believe.) I don't have the numbers here, but the fastest by far is the non-virtualized whitebox servers. No real surprise. Rackspace was faster than AWS US-West; US-West faster than the than US-East. We always use 3 Cassandra servers and one or two machines to run stress.py. I don't think we're seeing the 7500 writes/sec so maybe our config is wrong. You'll have to be patient until my colleague writes this all up. Cheers, Chris Dean
Re: Occasional 10s Timeouts on Read
set log level to TRACE and see if the OutboundTcpConnection is going bad. that would explain the message never arriving. On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater wrote: > To summarize: > > If a request for a column comes in *after a period of several hours > with no requests*, then the node servicing the request hangs while > looking for its peer rather than servicing the request like it should. > It then throws either a TimedOutException or a (wrong) > NotFoundExeption. > > And it doen't appear to actually send the message it says it does to > its peer. Or at least its peer doesn't report the request being > received. > > And then the situation magically clears up after approximately 2 minutes. > > However, if the idle period never occurs, then the problem does not > manifest. If I run a cron job with wget against my server every > minute, I do not see the problem. > > I'll be looking at some tcpdump logs to see if i can suss out what's > really happening, and perhaps file this as a bug. The several hours > between reproducible events makes this whole thing aggravating for > detection, debugging and I'll assume, fixing, if it is indeed a > cassandra problem. > > It was suggested on IRC that it may be my network. But gossip is > continually sending heartbeats and nodetool and the logs show the > nodes as up and available. If my network was flaking out I'd think it > would be dropping heartbeats and I'd see that. > > AJ > > On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater wrote: >> These are physical machines. >> >> storage-conf.xml.fs03 is here: >> >> http://pastebin.com/weL41NB1 >> >> Diffs from that for the other two storage-confs are inline here: >> >> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 >> storage-conf.xml.fs01 >> 185c185 >> >>> 71603818521973537678586548668074777838 >> 229c229 >> < 10.33.2.70 >> --- >>> 10.33.3.10 >> 241c241 >> < 10.33.2.70 >> --- >>> 10.33.3.10 >> 341c341 >> < 16 >> --- >>> 4 >> >> >> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 >> storage-conf.xml.fs02 >> 185c185 >> < 0 >> --- >>> 120215585224964746744782921158327379306 >> 206d205 >> < 10.33.3.20 >> 229c228 >> < 10.33.2.70 >> --- >>> 10.33.3.20 >> 241c240 >> < 10.33.2.70 >> --- >>> 10.33.3.20 >> 341c340 >> < 16 >> --- >>> 4 >> >> >> Thank you for your attention, >> >> AJ >> >> >> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black wrote: >>> Are these physical machines or virtuals? Did you post your >>> cassandra.in.sh and storage-conf.xml someplace? >>> >>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater wrote: Total data size in the entire cluster is about twenty 12k images. With no other load on the system. I just ask for one column and I get these timeouts. Performing multiple gets on the columns leads to multiple timeouts for a period of a few seconds or minutes and then the situation magically resolves itself and response times are down to single digit milliseconds for a column get. On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater wrote: > Cassandra 0.6.2 from the apache debian source. > Ubunutu Jaunty. Sun Java6 jvm. > > All nodes in separate racks at 365 main. > > On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater wrote: >> I'm seing 10s timeouts on reads few times a day. Its hard to reproduce >> consistently but seems to happen most often after its been a long time >> between reads. After presenting itself for a couple minutes the >> problem then goes away. >> >> I've got a three node cluster with replication factor 2, reading at >> consistency level ONE. The columns being read are around 12k each. The >> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB. >> >> Here's an application log from early this morning when a developer in >> Belgrade accessed the system: >> >> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 | >> Requested image_id: 5827067133c3d670071c17d9144f0b49 >> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 | >> TimedOutException for Image 5827067133c3d670071c17d9144f0b49 >> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image >> Get took 10005.388975 ms >> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 | >> Requested image_id: af8caf3b76ce97d13812ddf795104a5c >> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image >> Get took 3.658056 ms >> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image >> Transform took 0.978947 ms >> >> That's a Timeout and then a successful get of another column. >> >> Here's the cassandra log for 10.33.2.70: >> >> DEBUG 03:54:17,070 get_slice >> DEBUG 03:54:17,071 weakreadremote reading >> SliceFromReadCommand(table='jolitics.com', >> key='5827067133c3d670071c17d9144f0b49', >> column_parent='QueryPath(columnFami
Re: Failover and slow nodes
My guess? 8-10 weeks. On Fri, Jun 18, 2010 at 1:31 PM, James Golick wrote: > What's the current timeframe on 0.7? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Possible bug in Cassandra MapReduce
Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042 On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen wrote: > > We are using MapReduce to periodical verify and rebuild our secondary > indexes along with counting total records. We started to noticed double > counting of unique keys on single machine standalone tests. We were finally > able to reproduce the problem using > the apache-cassandra-0.6.2-src/contrib/word_count example and just > re-running it multiple times. We are hoping someone can verify the bug. > re-run the tests and the word count for /tmp/word_count3/part-r-0 will > be 1000 +~200 and will change if you blow the data away and re-run. Notice > the setup script loops and only inserts 1000 records so we expect count to > be 1000. Once the data is generated then re-running the setup script and/or > mapreduce doesn't change the number (still off). The key is to blow all the > data away and start over which will cause it to change. > Can someone please verify this behavior? > -Corey -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Java-Client returns empty data sets after Cassandra sits for a while
I can't think of any scenario where leaving Cassandra idle would affect the results returned. I think something else is going on here. On Fri, Jun 18, 2010 at 2:05 AM, Manfred Muench wrote: > Hi, > > I have noticed the following behaviour (bug?) which I don't completely > understand: > 1. start Cassandra (I'm using 0.6.2, but it also appears in 0.6.1) > 2. work with it (I'm using Java thrift API) > 3. let it sit for a long time (in my case: a day or more) without > issuing any command > 4. go back to (2) -- but now Cassandra always returns empty data sets to > queries in Java. The command line interface works, no matter if left > open or started newly. > > Here's how I connect to Cassandra (leaving exception handling out for better > readability): > > - > ... > import org.apache.cassandra.thrift.Cassandra; > import org.apache.thrift.protocol.TBinaryProtocol; > import org.apache.thrift.protocol.TProtocol; > import org.apache.thrift.transport.TSocket; > ... > > TTransport transport = new TSocket(cassandraHost, cassandraPort); > TProtocol protocol = new TBinaryProtocol(transport); > Cassandra.Client client = new Cassandra.Client(protocol); > transport.open(); > ... > List keySlices = client.get_range_slices(...); > ... > transport.flush(); > transport.close(); > ... > - > > This code usually works, but after leaving Cassandra running unused for a > couple of hours (days), this code connects fine to Cassandra, but the > client.get_range_slices returns an empty result set. > > I am not very sure, but I believe it happens after compacting. Need to do > more tests on this one. > > Does anybody know what I'm doing wrong here? Is there any kind of > "initialisation step" that I should have taken before running queries? > > If you need more (debug) information on this matter, please let me know how > I can provide you with it. The log files didn't show anything while running > the query. The last log message was: > > INFO [COMPACTION-POOL:1] 2010-06-18 14:07:45,882 CompactionManager.java > (line 246) Compacting [] > > I ran the query at around 14:20, no other message after this one. > > Thanks for your help in advance! > > Cheers, > Manfred > > -- > Dr. Manfred Muench > Nanjing Imperiosus Technology Co. Ltd. > Wu Xing Nian Hua Da Sha, Room 1004 > 134 Hanzhong Lu, Nanjing, P.R. China > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Possible bug in Cassandra MapReduce
Awesome...thanks. I just downloaded the patch and applied it and verified it fixes our problems. what's the ETA on 0.6.3? (debating on weather to tolerate it or maintain our own 0.6.2+patch). -Corey On Fri, Jun 18, 2010 at 8:21 PM, Jonathan Ellis wrote: > Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042 > > On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen wrote: > > > > We are using MapReduce to periodical verify and rebuild our secondary > > indexes along with counting total records. We started to noticed double > > counting of unique keys on single machine standalone tests. We were > finally > > able to reproduce the problem using > > the apache-cassandra-0.6.2-src/contrib/word_count example and just > > re-running it multiple times. We are hoping someone can verify the bug. > > re-run the tests and the word count for /tmp/word_count3/part-r-0 > will > > be 1000 +~200 and will change if you blow the data away and re-run. > Notice > > the setup script loops and only inserts 1000 records so we expect count > to > > be 1000. Once the data is generated then re-running the setup script > and/or > > mapreduce doesn't change the number (still off). The key is to blow all > the > > data away and start over which will cause it to change. > > Can someone please verify this behavior? > > -Corey > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: Possible bug in Cassandra MapReduce
Looks like the end of June. On Fri, Jun 18, 2010 at 8:38 PM, Corey Hulen wrote: > Awesome...thanks. > I just downloaded the patch and applied it and verified it fixes our > problems. > what's the ETA on 0.6.3? (debating on weather to tolerate it or maintain > our own 0.6.2+patch). > -Corey > > On Fri, Jun 18, 2010 at 8:21 PM, Jonathan Ellis wrote: >> >> Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042 >> >> On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen wrote: >> > >> > We are using MapReduce to periodical verify and rebuild our secondary >> > indexes along with counting total records. We started to noticed double >> > counting of unique keys on single machine standalone tests. We were >> > finally >> > able to reproduce the problem using >> > the apache-cassandra-0.6.2-src/contrib/word_count example and just >> > re-running it multiple times. We are hoping someone can verify the bug. >> > re-run the tests and the word count for /tmp/word_count3/part-r-0 >> > will >> > be 1000 +~200 and will change if you blow the data away and re-run. >> > Notice >> > the setup script loops and only inserts 1000 records so we expect count >> > to >> > be 1000. Once the data is generated then re-running the setup script >> > and/or >> > mapreduce doesn't change the number (still off). The key is to blow all >> > the >> > data away and start over which will cause it to change. >> > Can someone please verify this behavior? >> > -Corey >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Lucandra issues
Hi Maxim, Lucandra doesn't support numeric queries quite yet. A workaround would be to load your numbers and convert them to strings. I'll eventually add support for this. Please feel free to help out if you can :) Jake On Jun 17, 2010, at 1:16 PM, Maxim Kramarenko wrote: Hello! I am trying to rework our current lucene-based application to lucandra. Note the following problem: when I try to use NumericRangeQuery like this one: query.add(NumericRangeQuery.newLongRange("deliveryTimestampMinute", 6, fromDate, toDate, true, true), BooleanClause.Occur.MUST); I got the following exception: java.lang.NullPointerException org.apache.lucene.search.NumericRangeQuery$NumericRangeTermEnum.next (NumericRangeQuery.java:536) org.apache.lucene.search.MultiTermQuery $ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:248) org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java: 371) org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:386) org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:267) org.apache.lucene.search.Query.weight(Query.java:100) org.apache.lucene.search.Searcher.createWeight(Searcher.java:147) org.apache.lucene.search.Searcher.search(Searcher.java:98) org.apache.lucene.search.Searcher.search(Searcher.java:108) === Any workaround for this issue ? -- Best regards, Maximmailto:maxi...@trackstudio.com