Re: Commitlog Disk Full

2011-05-19 Thread Mike Malone
Just noticed this thread and figured I'd chime in since we've had similar issues with the commit log growing too large on our clusters. Tuning down the flush timeout wasn't really an acceptable solution for us since we didn't want to be constantly flushing and generating extra SSTables for no reaso

Re: b-tree

2011-07-22 Thread Mike Malone
On Fri, Jul 22, 2011 at 12:05 AM, Eldad Yamin wrote: > In order order to split the nodes. > SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if > the number is >1,000 they split the node. > In order to avoid that more then 1 process will edit/split the node - > transaction i

Re: Write everywhere, read anywhere

2011-08-04 Thread Mike Malone
2011/8/3 Patricio Echagüe > > > On Wed, Aug 3, 2011 at 4:00 PM, Philippe wrote: > >> Hello, >> I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at >> CL.ONE. When I take one of the nodes down, writes fail which is what I >> expect. >> When I run a repair, I see data being st

Re: Write everywhere, read anywhere

2011-08-04 Thread Mike Malone
On Thu, Aug 4, 2011 at 10:25 AM, Jeremiah Jordan < jeremiah.jor...@morningstar.com> wrote: > If you have RF=3 quorum won’t fail with one node down. So R/W quorum > will be consistent in the case of one node down. If two nodes go down at > the same time, then you can get inconsistent data from q

Re: Geohash nearby query implementation in Cassandra.

2012-02-17 Thread Mike Malone
2012/2/17 Raúl Raja Martínez > Hello everyone, > > I'm working on a application that uses Cassandra and has a geolocation > component. > I was wondering beside the slides and video at > http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that > simplegeo published regarding t

Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-13 Thread Mike Malone
Hey folks, We've discovered an issue on Ubuntu/Lenny with libc6 2.11.1-0ubuntu7.5 (it may also affect versions between 2.11.1-0ubuntu7.1 and 2.11.1-0ubuntu7.4). The bug affects systems when a large number of threads (or processes) are created rapidly. Once triggered, the system will become complet

Re: Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-13 Thread Mike Malone
g I'd > have a pretty good idea here, but such is life in the cloud. > > > > I also should say that I don't think any issues we had were at all related > specifically to Cassandra. We were running fine in the first AZ, no problems > other than needing to grow capacit

Re: cassandra row cache

2011-01-14 Thread Mike Malone
Digest reads could be being dropped..? On Thu, Jan 13, 2011 at 4:11 PM, Jonathan Ellis wrote: > On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo > wrote: > > Is it possible that your are reading at READ.ONE and that READ.ONE > > only warms cache on 1 of your three nodes= 20. 2nd read warms anot

Re: Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-14 Thread Mike Malone
nehalem architecture made a lot of changes to >>> the way it manges TLBs for memory, largely as a virtualization optimization. >>> I doubt this is the case but assuming the guest isn't seeing a different >>> architecture, we did see this issue only on E5507 proc

Re: GeoIndexing in Cassandra, Open Sourced?

2011-01-21 Thread Mike Malone
A more recent preso I gave about the SimpleGeo architecture is up at http://strangeloop2010.com/system/talks/presentations/000/014/495/Malone-DimensionalDataDHT.pdf Mike On Fri, Jan 21, 2011 at 10:02 AM, Joseph Stein wrote: > I hear that a bunch of folks have GeoIndexing built on top of Cassand

Re: Do supercolumns have a purpose?

2011-02-03 Thread Mike Malone
On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote: > On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: > >> The advantage would be to enable secondary indexes on supercolumn >> families. >> > > Then I suggest opening a ticket for adding secondary indexes to supercolumn > families and vo

Re: postgis > cassandra?

2011-02-07 Thread Mike Malone
It's not really the storage of spatial data that's tricky. We use geojson as a wire-line format at the higher levels of our system (e.g., the HTTP API). But the hard part is organizing the data for efficient retrieval and keeping those indices consistent with the data being indexed. Efficient multi

Re: Do supercolumns have a purpose?

2011-02-09 Thread Mike Malone
> >> My data model is full of supercolumns. I used them, even though I knew it >> didn't *have to*, "because they were there", which implied to me that I was >> supposed to use them for some good reason. Now I suspect that they will >> gradually become less

Re: Is SuperColumn necessary?

2010-05-05 Thread Mike Malone
t;> > >> It might make sense to create a CompositeType subclass of AbstractType > for > >> the purpose of constructing and comparing these types of "composite" > column > >> names so that if you could more easily do that sort of thing rather than > >>

Re: pagination through slices with deleted keys

2010-05-06 Thread Mike Malone
Our solution at SimpleGeo has been to hack Cassandra to (optionally, at least) be sensible and drop Rows that don't have any Columns. The claim from the FAQ that "Cassandra would have to check if there are any other columns in the row" is inaccurate. The common case for us at least is that we're on

Re: pagination through slices with deleted keys

2010-05-06 Thread Mike Malone
On Thu, May 6, 2010 at 3:27 PM, Ian Kallen wrote: > Cool, is this a patch you've applied on the server side? Are you running > 0.6.x? I'm wondering if this kind of thing can make it into future versions > of Cassandra. > Yea, server side. It's basically doing the same thing clients typically wan

Re: Is SuperColumn necessary?

2010-05-06 Thread Mike Malone
On Thu, May 6, 2010 at 5:38 PM, Vijay wrote: > I would rather be interested in Tree type structure where supercolumns have > supercolumns in it. you dont need to compare all the columns to find a > set of columns and will also reduce the bytes transfered for separator, at > least string conca

Re: pagination through slices with deleted keys

2010-05-07 Thread Mike Malone
On Fri, May 7, 2010 at 5:29 AM, Joost Ouwerkerk wrote: > +1. There is some disagreement on whether or not the API should > return empty columns or skip rows when no data is found. In all of > our use cases, we would prefer skipped rows. And based on how > frequently new cassandra users appear t

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
tuple tree: "Column family" replaced by top-level tuple, whose value >> is the set of keys, whose value is the set of supercolumns of the key, whose >> value is the set of columns for the supercolumn, etc. >> >> 4. Etc. >> >> On Thu, May 6, 2010 at

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
ing point. > If things are done properly, client libraries could expose simplified query interfaces without much effort. Most ORMs these days work by building a propositional directed acyclic graph that's serialized to SQL. This would work the same way, but it wouldn't be converted into a

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
more about this stuff sometime. > > > -Original Message- > From: "Mike Malone" > Sent: Monday, May 10, 2010 11:37am > To: user@cassandra.apache.org > Subject: Re: Is SuperColumn necessary? > > Maybe... but honestly, it doesn't affect the architecture or

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
t for the types of use cases for which people use SuperColumns. If there's a particular use case that you feel you can only implement with SuperColumns, please share! I honestly can't think of any. Mike > On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook wrote: > >> Agree

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
uperColumns, though, you need to look at the Cassandra source. Removing SuperColumns would make the code-base much cleaner and tighter, and would probably reduce SLOC by 20%. I think a replacement that assumed nested Columns (or Entries, or Thingies) would be much cleaner. That's what Stu is wo

Re: Is SuperColumn necessary?

2010-05-10 Thread Mike Malone
atenate multiple comment times >>> together as you suggested. >>> >>> requiring user to concatenating data fields together is not only an extra >>> burden on user but also a less clean design. there will be cases where the >>> list property of a profile

Re: How to write WHERE .. LIKE query ?

2010-05-10 Thread Mike Malone
On Mon, May 10, 2010 at 9:00 PM, Shuge Lee wrote: > Hi all: > > How to write WHERE ... LIKE query ? > For examples(described in Python): > > Schema: > > # columnfamily name > resources = [ ># key > 'foo': { > # columns and value > 'url': 'foo.com', > 'pushlier': 'f

Re: How to write WHERE .. LIKE query ?

2010-05-11 Thread Mike Malone
r specific columns in a row or rows (e.g., please give me the "first_name," "last_name" and "hashed_password" fields from my Users column family where the key equals "mmalone"). See the get_range_slices() method in the thrift service. Mike > > > >

Re: Is SuperColumn necessary?

2010-05-11 Thread Mike Malone
On Tue, May 11, 2010 at 7:46 AM, David Boxenhorn wrote: > I would like an API with a variable number of arguments. Using Java > varargs, something like > > value = keyspace.get("articles", "cars", "John Smith", "2010-05-01", > "comment-25"); > > or > > valueArray = keyspace.get("articles", predic

Re: How to write WHERE .. LIKE query ?

2010-05-11 Thread Mike Malone
e. There's been talk of adding coprocessors. It will probably happen one day. Unfortunately, that day is probably a ways off. Mike > > > On Tue, May 11, 2010 at 11:35 PM, Mike Malone wrote: > >> On Mon, May 10, 2010 at 11:36 PM, vd wrote: >> >>> Hi Mike >&g

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Mike Malone
> > Yes, I know. And I might end up doing this in the end. I do though have > pretty hard upper limits of how many rows I will end up with for each key, > but anyways it might be a good idea none the less. Thanks for the advice on > that one. > > You set count to Integer.MAX. Did you try with say 3

Re: Cassandra Write Performance, CPU usage

2010-06-11 Thread Mike Malone
Jonathan, while I agree with you re: this being an unusual load for the system, it is interesting that he's found at least one use-case where Cassandra is CPU-bound, not IO-bound. I'd definitely be interested in learning what his critical path is and seeing if there's some low-hanging fruit that ma

Re: Cassandra and Thrift on the Server Side

2010-06-29 Thread Mike Malone
> > Still, to Clint's point, everyone knows how to make an HTTP request. If you > want a cassandra client running on, let's say, an iPhone for some reason, a > REST API is going to be a lot more straight forward to implement. There's no reason an HTTP service would have to live inside the Cassand

Re: Coke Products at Digg?

2010-07-07 Thread Mike Malone
On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote: > > I heard a rumor that Digg was moving away from Coca-Cola products in all > of its vending machines and break rooms. Can anyone from Digg comment on > this? > > My near-term beverage consumption strategy is based largely on my > understanding o

Re: Coke Products at Digg?

2010-07-07 Thread Mike Malone
something other than the cola-blend that Angelo Mariani invented in 1863! Mike > On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone wrote: > >> On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote: >> >>> >>> I heard a rumor that Digg was moving away from Coca-Cola

Re: get_range_slices

2010-07-08 Thread Mike Malone
I think the answer to your question is no, you shouldn't. I'm feeling far too lazy to do even light research on the topic, but I remember there being a bug where replicas weren't consolidated and you'd get a result set that included data from each replica that was consulted for a query. That could

Re: Authentication

2010-07-13 Thread Mike Malone
Yep, as Ben said, we're not asking for anyone to write this for us. We've been playing with some ideas around encryption between EC2 data-centers/regions (intra-region is already secure enough for us -- it's all switches / dedicate lines) and the easiest solution seems to be to wrap the inter-Cass

Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-08-04 Thread Mike Malone
This may be your problem: https://issues.apache.org/jira/browse/CASSANDRA-1358 The message deserializer executor is being created with a core pool size of 1. Since it uses a queue with unbounded capacity new requests are always queued and the thread pool never grows. So the message deserializer be

Re: what causes MESSAGE-DESERIALIZER-POOL to spike

2010-08-04 Thread Mike Malone
e requests are coming in. > > On Wed, Aug 4, 2010 at 2:21 PM, Mike Malone wrote: > > This may be your > > problem: https://issues.apache.org/jira/browse/CASSANDRA-1358 > > The message deserializer executor is being created with a core pool size > of > > 1. Since

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-25 Thread Mike Malone
Hey Takayuki, I don't think you're going to find anyone willing to promise that Cassandra will fit your petabyte scale data analysis problem. That's a lot of data, and there's not a ton of operational experience at that scale within the community. And the people who do work on that sort of problem

Re: Ring management and load balance

2010-03-25 Thread Mike Malone
On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis wrote: > The advantage to doing it the way Cassandra does is that you can keep > keys sorted with OrderPreservingPartitioner for range scans. grabbing > one token of many from each node in the ring would prohibit that. > > So we rely on active load

Re: Ring management and load balance

2010-03-26 Thread Mike Malone
2010/3/26 Roland Hänel > Jonathan, > > I agree with your idea about a tool that could 'propose' good token choices > for optimal load-balancing. > > If I was going to write such a tool: do you think the thrift API provides > the necessary information? I think with the RandomPartitioner you cannot

Re: Range scan performance in 0.6.0 beta2

2010-03-29 Thread Mike Malone
On Mon, Mar 29, 2010 at 7:13 AM, Henrik Schröder wrote: > On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote: > >> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder >> wrote: >> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis >> wrote: >> >> It's a unique index then? And you're trying to read

Re: Memcached protocol?

2010-04-05 Thread Mike Malone
> > Here are a couple of example projects for info. > > Django: > > http://docs.djangoproject.com/en/dev/topics/cache/ > > It says of "increment/decrement": "incr()/decr() methods are not > guaranteed to be atomic. On those backends that support atomic > increment/decrement (most notably, the memca

Re: Memcached protocol?

2010-04-05 Thread Mike Malone
> > That's useful information Mike. I am a bit curious about what the most > common use cases are for atomic increment/decrement. I'm familiar with > atomic add as a sort of locking mechanism. > They're useful for caching denormalized counts of things. Especially things that change rapidly. Instea

Re: Memcached protocol?

2010-04-05 Thread Mike Malone
On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod wrote: > On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone wrote: > >> That's useful information Mike. I am a bit curious about what the most > >> common use cases are for atomic increment/decrement. I'm familiar with >

Re: cascal - high level scala cassandra client (yes - another one)

2010-04-05 Thread Mike Malone
On Sat, Apr 3, 2010 at 12:12 PM, Matthew Chambers wrote: > Your git page looks great, I like your cassandra explanation and graphic. +1 on the docs - they're very nice. Off-topic, but what'd you use to create that graphic? Mike

Re: How do vector clocks and conflicts work?

2010-04-06 Thread Mike Malone
> > As long as the conflict resolver knows that two writers each tried to > increment, then it can increment twice. The conflict resolver must know > about the semantics of "increment" or "decrement" or "string append" or > "binary patch" or whatever other merge strategy you choose. You'll register

Re: How do vector clocks and conflicts work?

2010-04-06 Thread Mike Malone
On Tue, Apr 6, 2010 at 11:03 AM, Tatu Saloranta wrote: > On Tue, Apr 6, 2010 at 8:45 AM, Mike Malone wrote: > >> As long as the conflict resolver knows that two writers each tried to > >> increment, then it can increment twice. The conflict resolver must know > &

Re: Reading thousands of columns

2010-04-14 Thread Mike Malone
On Wed, Apr 14, 2010 at 7:45 AM, Jonathan Ellis wrote: > 35-50ms for how many rows of 1000 columns each? > > get_range_slices does not use the row cache, for the same reason that > oracle doesn't cache tuples from sequential scans -- blowing away > 1000s of rows worth of recently used rows querie

Re: timestamp not found

2010-04-15 Thread Mike Malone
Looks like the timestamp, in this case, is 0. Does Cassandra allow zero timestamps? Could be a bug in Cassandra doing an implicit boolean coercion in a conditional where it shouldn't. Mike On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker wrote: > We are currently migrating about 70G of data from mys

Re: At what point does the cluster get faster than the individual nodes?

2010-04-22 Thread Mike Malone
On Wed, Apr 21, 2010 at 9:50 AM, Mark Greene wrote: > Right it's a similar concept to DB sharding where you spread the write load > around to different DB servers but won't necessarily increase the throughput > of an one DB server but rather collectively. Except with Cassandra, read-repair caus

Re: Is SuperColumn necessary?

2010-04-28 Thread Mike Malone
On Wed, Apr 28, 2010 at 5:24 AM, David Boxenhorn wrote: > If I understand correctly, the distinction between supercolumns and > subcolumns is critical to good database design if you want to use random > partitioning: you can do range queries on subcolumns but not on > supercolumns. > > Is this co