Just noticed this thread and figured I'd chime in since we've had similar
issues with the commit log growing too large on our clusters. Tuning down
the flush timeout wasn't really an acceptable solution for us since we
didn't want to be constantly flushing and generating extra SSTables for no
reaso
On Fri, Jul 22, 2011 at 12:05 AM, Eldad Yamin wrote:
> In order order to split the nodes.
> SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if
> the number is >1,000 they split the node.
> In order to avoid that more then 1 process will edit/split the node -
> transaction i
2011/8/3 Patricio Echagüe
>
>
> On Wed, Aug 3, 2011 at 4:00 PM, Philippe wrote:
>
>> Hello,
>> I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at
>> CL.ONE. When I take one of the nodes down, writes fail which is what I
>> expect.
>> When I run a repair, I see data being st
On Thu, Aug 4, 2011 at 10:25 AM, Jeremiah Jordan <
jeremiah.jor...@morningstar.com> wrote:
> If you have RF=3 quorum won’t fail with one node down. So R/W quorum
> will be consistent in the case of one node down. If two nodes go down at
> the same time, then you can get inconsistent data from q
2012/2/17 Raúl Raja Martínez
> Hello everyone,
>
> I'm working on a application that uses Cassandra and has a geolocation
> component.
> I was wondering beside the slides and video at
> http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php that
> simplegeo published regarding t
Hey folks,
We've discovered an issue on Ubuntu/Lenny with libc6 2.11.1-0ubuntu7.5 (it
may also affect versions between 2.11.1-0ubuntu7.1 and 2.11.1-0ubuntu7.4).
The bug affects systems when a large number of threads (or processes) are
created rapidly. Once triggered, the system will become complet
g I'd
> have a pretty good idea here, but such is life in the cloud.
>
>
>
> I also should say that I don't think any issues we had were at all related
> specifically to Cassandra. We were running fine in the first AZ, no problems
> other than needing to grow capacit
Digest reads could be being dropped..?
On Thu, Jan 13, 2011 at 4:11 PM, Jonathan Ellis wrote:
> On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo
> wrote:
> > Is it possible that your are reading at READ.ONE and that READ.ONE
> > only warms cache on 1 of your three nodes= 20. 2nd read warms anot
nehalem architecture made a lot of changes to
>>> the way it manges TLBs for memory, largely as a virtualization optimization.
>>> I doubt this is the case but assuming the guest isn't seeing a different
>>> architecture, we did see this issue only on E5507 proc
A more recent preso I gave about the SimpleGeo architecture is up at
http://strangeloop2010.com/system/talks/presentations/000/014/495/Malone-DimensionalDataDHT.pdf
Mike
On Fri, Jan 21, 2011 at 10:02 AM, Joseph Stein wrote:
> I hear that a bunch of folks have GeoIndexing built on top of Cassand
On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote:
> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote:
>
>> The advantage would be to enable secondary indexes on supercolumn
>> families.
>>
>
> Then I suggest opening a ticket for adding secondary indexes to supercolumn
> families and vo
It's not really the storage of spatial data that's tricky. We use geojson as
a wire-line format at the higher levels of our system (e.g., the HTTP
API). But the hard part is organizing the data for efficient retrieval and
keeping those indices consistent with the data being indexed. Efficient
multi
>
>> My data model is full of supercolumns. I used them, even though I knew it
>> didn't *have to*, "because they were there", which implied to me that I was
>> supposed to use them for some good reason. Now I suspect that they will
>> gradually become less
t;>
> >> It might make sense to create a CompositeType subclass of AbstractType
> for
> >> the purpose of constructing and comparing these types of "composite"
> column
> >> names so that if you could more easily do that sort of thing rather than
> >>
Our solution at SimpleGeo has been to hack Cassandra to (optionally, at
least) be sensible and drop Rows that don't have any Columns. The claim from
the FAQ that "Cassandra would have to check if there are any other columns
in the row" is inaccurate. The common case for us at least is that we're
on
On Thu, May 6, 2010 at 3:27 PM, Ian Kallen wrote:
> Cool, is this a patch you've applied on the server side? Are you running
> 0.6.x? I'm wondering if this kind of thing can make it into future versions
> of Cassandra.
>
Yea, server side. It's basically doing the same thing clients typically wan
On Thu, May 6, 2010 at 5:38 PM, Vijay wrote:
> I would rather be interested in Tree type structure where supercolumns have
> supercolumns in it. you dont need to compare all the columns to find a
> set of columns and will also reduce the bytes transfered for separator, at
> least string conca
On Fri, May 7, 2010 at 5:29 AM, Joost Ouwerkerk wrote:
> +1. There is some disagreement on whether or not the API should
> return empty columns or skip rows when no data is found. In all of
> our use cases, we would prefer skipped rows. And based on how
> frequently new cassandra users appear t
tuple tree: "Column family" replaced by top-level tuple, whose value
>> is the set of keys, whose value is the set of supercolumns of the key, whose
>> value is the set of columns for the supercolumn, etc.
>>
>> 4. Etc.
>>
>> On Thu, May 6, 2010 at
ing point.
>
If things are done properly, client libraries could expose simplified query
interfaces without much effort. Most ORMs these days work by building a
propositional directed acyclic graph that's serialized to SQL. This would
work the same way, but it wouldn't be converted into a
more about this stuff sometime.
>
>
> -Original Message-
> From: "Mike Malone"
> Sent: Monday, May 10, 2010 11:37am
> To: user@cassandra.apache.org
> Subject: Re: Is SuperColumn necessary?
>
> Maybe... but honestly, it doesn't affect the architecture or
t for the types of use cases for which people use
SuperColumns.
If there's a particular use case that you feel you can only implement with
SuperColumns, please share! I honestly can't think of any.
Mike
> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook wrote:
>
>> Agree
uperColumns, though, you need to look
at the Cassandra source. Removing SuperColumns would make the code-base much
cleaner and tighter, and would probably reduce SLOC by 20%. I think a
replacement that assumed nested Columns (or Entries, or Thingies) would be
much cleaner. That's what Stu is wo
atenate multiple comment times
>>> together as you suggested.
>>>
>>> requiring user to concatenating data fields together is not only an extra
>>> burden on user but also a less clean design. there will be cases where the
>>> list property of a profile
On Mon, May 10, 2010 at 9:00 PM, Shuge Lee wrote:
> Hi all:
>
> How to write WHERE ... LIKE query ?
> For examples(described in Python):
>
> Schema:
>
> # columnfamily name
> resources = [
># key
> 'foo': {
> # columns and value
> 'url': 'foo.com',
> 'pushlier': 'f
r specific columns in a row or
rows (e.g., please give me the "first_name," "last_name" and
"hashed_password" fields from my Users column family where the key equals
"mmalone").
See the get_range_slices() method in the thrift service.
Mike
>
>
>
>
On Tue, May 11, 2010 at 7:46 AM, David Boxenhorn wrote:
> I would like an API with a variable number of arguments. Using Java
> varargs, something like
>
> value = keyspace.get("articles", "cars", "John Smith", "2010-05-01",
> "comment-25");
>
> or
>
> valueArray = keyspace.get("articles", predic
e.
There's been talk of adding coprocessors. It will probably happen one day.
Unfortunately, that day is probably a ways off.
Mike
>
>
> On Tue, May 11, 2010 at 11:35 PM, Mike Malone wrote:
>
>> On Mon, May 10, 2010 at 11:36 PM, vd wrote:
>>
>>> Hi Mike
>&g
> > Yes, I know. And I might end up doing this in the end. I do though have
> pretty hard upper limits of how many rows I will end up with for each key,
> but anyways it might be a good idea none the less. Thanks for the advice on
> that one.
>
> You set count to Integer.MAX. Did you try with say 3
Jonathan, while I agree with you re: this being an unusual load for the
system, it is interesting that he's found at least one use-case where
Cassandra is CPU-bound, not IO-bound. I'd definitely be interested in
learning what his critical path is and seeing if there's some low-hanging
fruit that ma
>
> Still, to Clint's point, everyone knows how to make an HTTP request. If you
> want a cassandra client running on, let's say, an iPhone for some reason, a
> REST API is going to be a lot more straight forward to implement.
There's no reason an HTTP service would have to live inside the Cassand
On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote:
>
> I heard a rumor that Digg was moving away from Coca-Cola products in all
> of its vending machines and break rooms. Can anyone from Digg comment on
> this?
>
> My near-term beverage consumption strategy is based largely on my
> understanding o
something other than the
cola-blend that Angelo Mariani invented in 1863!
Mike
> On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone wrote:
>
>> On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote:
>>
>>>
>>> I heard a rumor that Digg was moving away from Coca-Cola
I think the answer to your question is no, you shouldn't.
I'm feeling far too lazy to do even light research on the topic, but I
remember there being a bug where replicas weren't consolidated and you'd get
a result set that included data from each replica that was consulted for a
query. That could
Yep, as Ben said, we're not asking for anyone to write this for us.
We've been playing with some ideas around encryption between EC2
data-centers/regions (intra-region is already secure enough for us -- it's
all switches / dedicate lines) and the easiest solution seems to be to wrap
the inter-Cass
This may be your problem:
https://issues.apache.org/jira/browse/CASSANDRA-1358
The message deserializer executor is being created with a core pool size of
1. Since it uses a queue with unbounded capacity new requests are always
queued and the thread pool never grows. So the message deserializer be
e requests are coming in.
>
> On Wed, Aug 4, 2010 at 2:21 PM, Mike Malone wrote:
> > This may be your
> > problem: https://issues.apache.org/jira/browse/CASSANDRA-1358
> > The message deserializer executor is being created with a core pool size
> of
> > 1. Since
Hey Takayuki,
I don't think you're going to find anyone willing to promise that Cassandra
will fit your petabyte scale data analysis problem. That's a lot of data,
and there's not a ton of operational experience at that scale within the
community. And the people who do work on that sort of problem
On Thu, Mar 25, 2010 at 9:56 AM, Jonathan Ellis wrote:
> The advantage to doing it the way Cassandra does is that you can keep
> keys sorted with OrderPreservingPartitioner for range scans. grabbing
> one token of many from each node in the ring would prohibit that.
>
> So we rely on active load
2010/3/26 Roland Hänel
> Jonathan,
>
> I agree with your idea about a tool that could 'propose' good token choices
> for optimal load-balancing.
>
> If I was going to write such a tool: do you think the thrift API provides
> the necessary information? I think with the RandomPartitioner you cannot
On Mon, Mar 29, 2010 at 7:13 AM, Henrik Schröder wrote:
> On Mon, Mar 29, 2010 at 14:15, Jonathan Ellis wrote:
>
>> On Mon, Mar 29, 2010 at 4:06 AM, Henrik Schröder
>> wrote:
>> > On Fri, Mar 26, 2010 at 14:47, Jonathan Ellis
>> wrote:
>> >> It's a unique index then? And you're trying to read
>
> Here are a couple of example projects for info.
>
> Django:
>
> http://docs.djangoproject.com/en/dev/topics/cache/
>
> It says of "increment/decrement": "incr()/decr() methods are not
> guaranteed to be atomic. On those backends that support atomic
> increment/decrement (most notably, the memca
>
> That's useful information Mike. I am a bit curious about what the most
> common use cases are for atomic increment/decrement. I'm familiar with
> atomic add as a sort of locking mechanism.
>
They're useful for caching denormalized counts of things. Especially things
that change rapidly. Instea
On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod wrote:
> On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone wrote:
> >> That's useful information Mike. I am a bit curious about what the most
> >> common use cases are for atomic increment/decrement. I'm familiar with
>
On Sat, Apr 3, 2010 at 12:12 PM, Matthew Chambers
wrote:
> Your git page looks great, I like your cassandra explanation and graphic.
+1 on the docs - they're very nice. Off-topic, but what'd you use to create
that graphic?
Mike
>
> As long as the conflict resolver knows that two writers each tried to
> increment, then it can increment twice. The conflict resolver must know
> about the semantics of "increment" or "decrement" or "string append" or
> "binary patch" or whatever other merge strategy you choose. You'll register
On Tue, Apr 6, 2010 at 11:03 AM, Tatu Saloranta wrote:
> On Tue, Apr 6, 2010 at 8:45 AM, Mike Malone wrote:
> >> As long as the conflict resolver knows that two writers each tried to
> >> increment, then it can increment twice. The conflict resolver must know
> &
On Wed, Apr 14, 2010 at 7:45 AM, Jonathan Ellis wrote:
> 35-50ms for how many rows of 1000 columns each?
>
> get_range_slices does not use the row cache, for the same reason that
> oracle doesn't cache tuples from sequential scans -- blowing away
> 1000s of rows worth of recently used rows querie
Looks like the timestamp, in this case, is 0. Does Cassandra allow zero
timestamps? Could be a bug in Cassandra doing an implicit boolean coercion
in a conditional where it shouldn't.
Mike
On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker wrote:
> We are currently migrating about 70G of data from mys
On Wed, Apr 21, 2010 at 9:50 AM, Mark Greene wrote:
> Right it's a similar concept to DB sharding where you spread the write load
> around to different DB servers but won't necessarily increase the throughput
> of an one DB server but rather collectively.
Except with Cassandra, read-repair caus
On Wed, Apr 28, 2010 at 5:24 AM, David Boxenhorn wrote:
> If I understand correctly, the distinction between supercolumns and
> subcolumns is critical to good database design if you want to use random
> partitioning: you can do range queries on subcolumns but not on
> supercolumns.
>
> Is this co
51 matches
Mail list logo