I wrote some thoughts about this on my blog. I think it's still mostly correct:
* http://www.ayogo.com/techblog/2010/04/sorting-in-cassandra/
On Fri, Oct 15, 2010 at 11:14 AM, Wicked J wrote:
> Hi,
> I'm using TimeUUID/Sort by column name mechanism. The column value can
> contain text data (in
There are a lot of variables that go into a proper benchmark. The bottleneck
could be in many different places.
How many client threads are you using? What kind of network?
On Mon, Jul 26, 2010 at 8:29 AM, SSam wrote:
>
> From Cassandra Website:
>
>- *Elastic*
>
>Read and write throughp
On Mon, Jul 12, 2010 at 11:44 PM, Benjamin Black wrote:
> We use Cassandra (multidimensional metrics) *and* redis (counters and
> alerts) *and* MySQL (supporting Rails). Right tool for each job. The
> idea that it is a good thing to cram everything into a single database
> (and data model), beat
re still
>>>>> testing for bugs and might go live in couple of weeks. You can ask any
>>>>> specific questions about vbulletin and cassandra and i will answer to the
>>>>> best of my knowledge.
>>>>> I our case a combination of cassandra and r
you are confident you will have trouble scaling
traditional technologies, it might not make business sense.
Paul Prescod
As Paul said, you need to re-build your data in a Cassandra-friendly
manner. Reading SQL files does not seem a very efficient way to do
that though. Most databases can output in much simpler formats, like
CSV. But then, why export at all? If the MySQL instance and the
Cassandra instance are both ad
security
complexities. Which keys will a particular browser client be allowed
to overwrite? What prevents an end-user from deleting your database
through AJAX calls?
I think you'd need some form of ACL and access token system. That's a
lot of complexity.
Paul Prescod
bus.lan%3e
Follow the thread links to learn more about AVRO, which will replace
Thrift in Cassandra.
Paul Prescod
ek or so ago:
* https://issues.apache.org/jira/browse/CASSANDRA-1072
Paul Prescod
I hope Cassandra is competitive with other solutions well before 50TB
of data. There is a middle ground where you might choose one or the
other. Just as there are areas where you might choose PostGres or
Cassandra.
They claim it will scale all the way up. Right now the likely
dealbreaker will be i
ssue to think that an enterprise IT department would prefer one or
the other on the basis of it. Neither has foreign keys or
transactions. Both shift work from the datastore to the application.
If that's not what you want, neither is a good choice.
Paul Prescod
I'm curious what the relevance of CASSANDRA-1016 is.
On Thu, May 13, 2010 at 2:24 PM, Tobias Jungen wrote:
> I don't think this is currently possible. There is some work underway to add
> it in the future, however:
>
> https://issues.apache.org/jira/browse/CASSANDRA-721
> https://issues.apache.or
No, but there is ongoing work on it:
* https://issues.apache.org/jira/browse/CASSANDRA-580
* http://www.formspring.me/joestump/q/420668558
* http://permalink.gmane.org/gmane.comp.db.cassandra.user/3740
And in the meantime, an interim patch:
* https://issues.apache.org/jira/browse/CASSANDRA
is what you're talking about:
https://issues.apache.org/jira/browse/CASSANDRA-1016
Paul Prescod
degradation.
> So by your math, 100 nodes with each node getting 5k wps, I would assume the
> total capacity is 500k wps. But perhaps I've misunderstood some key
> concepts. Still a novice myself ;-)
If the replication factor is 2, then everything is written twice. So
your throughput is cut in half.
Paul Prescod
hadoop
You can read criticisms of MapReduce in the first link there.
> On May 10, 2010, at 11:22 AM, Paul Prescod wrote:
>
> This is a very, very big topic. For the most part, the issues are
> covered in the various SQL versus NoSQL debates all over the Internet.
> For example:
&
Also:
* you should Google "eventual consistency" to learn about the
strengths and weaknesses of that.
On Mon, May 10, 2010 at 11:22 AM, Paul Prescod wrote:
> This is a very, very big topic. For the most part, the issues are
> covered in the various SQL versus NoSQL deba
This is a very, very big topic. For the most part, the issues are
covered in the various SQL versus NoSQL debates all over the Internet.
For example:
* Cassandra and its NoSQL siblings have no concept of an in-database "join"
* Cassandra and its NoSQL siblings do not allow you to update
multipl
Does the Caasandra performance start fast and slow down (indicating
some buffer being filled) or does it start slow and stay slow?
On Mon, May 10, 2010 at 2:05 AM, David Boxenhorn wrote:
> I read something like 80,000 rows from Oracle and write them to Cassandra in
> chunks of 1000 rows - so I'm
On Mon, Apr 26, 2010 at 2:15 PM, Anthony Molinaro
wrote:
> I think it might be worse case that you read all the disks. If your
> block size is large enough to hold an entire row, you should only have to
> read one disk to get that data.
And conversely, for a large enough row you might benefit fro
s will go to *all* hard drives.
RAID0 is designed specifically to improve performance (both latency
and bandwidth). I'm unclear about why you think it would decrease
importance. Perhaps you're thinking of another RAID type?
Paul Prescod
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/
http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html
On Sat, Apr 24, 2010 at 10:20 AM, dir dir wrote:
> In general what is the difference between Cassandra and HBase??
>
> Thanks.
>
http://wiki.apache.org/cassandra/Operations
===
A Cassandra cluster always divides up the key space into ranges
delimited by Tokens as described above, but additional replica
placement is customizable via !IReplicaPlacementStrategy in the
configuration file. The standard strategies are
RackUnawa
ample of why you need vector clocks.
The description for CASSANDRA-580 is "Allow a ColumnFamily to be
versioned via vector clocks, instead of long timestamps. Purpose:
enable incr/decr; flexible conflict resolution."
https://issues.apache.org/jira/browse/CASSANDRA-580
Paul Prescod
I'm not an expert, so take what I say with a grain of salt.
2010/4/21 Даниел Симеонов :
> Hello,
> I am pretty new to Cassandra and I have some questions, they may seem
> trivial, but still I am pretty new to the subject. First is about the lack
> of a compareAndSet() operation, as I understood
you
> mean.
Do you have a pressing need to use Cassandra right now, before version
1.0 is even available?
That limitation will go away before 1.0, so you could simply wait and
not worry about it. Documentation will also be much more complete in
the future.
Paul Prescod
http://www.google.ca/search?hl=en&q=cassandra+terabyte
On Thu, Apr 15, 2010 at 11:28 PM, Linton N wrote:
> hi ,
> I am working for the past 1 year with hadoop, but quite new to
> cassandra, I would like to get clarified few things regarding the
> scalability of Cassandra. Can it scall up
There is a tutorial here:
* http://www.sodeso.nl/?p=80
This page includes data inserts:
* http://www.sodeso.nl/?p=251
Like:
c.setColumn(new Column("email".getBytes("utf-8"), "ronald (at)
sodeso.nl".getBytes("utf-8"), timestamp))
columns.add(c);
The Sample code is attached to that blog post.
a single query (perhaps entered
interactively) would replace the entire row caching all of the data
for the systems' interactive users. For example, a summary page of who
is most over the last month active could replace the profile
information for the actual users who are using the system at that
moment.
Paul Prescod
If you want to use Cassandra, you should probably store each
historical value as a new column in the row.
On Wed, Apr 14, 2010 at 12:34 AM, Yésica Rey wrote:
> I am new to using cassandra. In the documentation I have read, understand,
> that as in other non-documentary databases, to update the va
ache is to avoid
looking through a bunch off SSTable's Bloom Filters? (how big do the
bloom filters grow to...too much to be cached themselves?)
I'd like to document the detail.
Paul Prescod
On Tue, Apr 13, 2010 at 5:26 PM, Rob Coli wrote:
> On 4/13/10 5:04 PM, Paul Prescod wrote:
>>
>> Am I correct in my understanding that the unit of caching (and
>> fetching from disk?) is a full row?
>
> Cassandra has both a Key and a Row cache. Unfortunately there appe
(columnFamily_)), Integer.MIN_VALUE);
Paul Prescod
I notice that the documentation on the read path is quite compressed
on this page:
* http://wiki.apache.org/cassandra/ArchitectureOverview
What is the best documentation of the read path? I'm also curious
about the granularity and policies around caching.
Paul Prescod
>
>
> Why does RF enter this?
A simplistic model for a consistent read that is asking all replicas
what their value is for the key. If the key is in the fourth SSTable
of all nodes, won't they all have to do 12 IOPs to find it?
Paul Prescod
On Tue, Apr 13, 2010 at 11:52 AM, Scott White wrote:
>
>...
>
> Agreed.
Kind of sorry to see Scott White and Benjamin Black being in
agreementbut I guess that's the way yin and yang works. Opposition
is illusory in any case.
Paul Prescod
ce"? The document above implies that it
is nearly impossible. It implies that you will have between 1 and 4
SSTables. Does the administrator have a choice in this matter?
I am probably being totally naive, but is the answer to the question
"worst iops on read" just:
3 reads per SSTable * 4 SStables * ReplicationFactor ?
= 3 * 4 * 3 = 36?
Paul Prescod
3 / 131k) * 3 = 150M / 131k = 11,450.
This line isn't internally consistent. Where did 150M come from? 500 M
* 9 = 4.5 Billion.
My calculation for the whole thing is 3433.
I am not claiming to be a Cassandra expert and therefore cannot vouch
for the model at all.
Paul Prescod
contrib/py_stress
Although that's still written in a scripting language, it at least
uses threading.
Anyhow, what's your real goal? Inserting 100K or 1M rows in 30 seconds
from a single-threaded environment like PHP is pretty good. Do your
business goals require more?
Also: Is it 100K or 1M? In
allows you to implement your specific policies.
You might also want to investigate "Microsoft Sync Framework" and its
competitors.
Paul Prescod
How will they know whether the performance problem is caused by
Cassandra or Pandra if you do not have raw Cassandra performance
numbers for your setup?
On Mon, Apr 12, 2010 at 5:51 AM, vineet daniel wrote:
> I dont think it would be a good idea not to use pandra for benchmarks as we
> are going
ingle value (in a way which avoids race
> conditions, of course).
How do you avoid the race condition? Don't you need a lock?
Paul Prescod
Ayogo, Inc.
incremental numeric id as key and
>> >> >> > keeping
>> >> >> > the
>> >> >> > name
>> >> >> > and value same in the column family.
>> >> >> >
>> >> >> > Example :
&
gt;> >> >
>> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black wrote:
>> >> >>
>> >> >> You would have a Column Family, not a column for that; let's call it
>> >> >> the Users CF. You'd use username as the row key and have a column
&g
This tutorial may help.
http://www.sodeso.nl/?p=251
Cassandra is very early software...not even version 1.0 yet. You'll
need to figure out a lot yourself by reading blog posts, examples,
comparing to API documentation, etc. Cassandra is an entirely
different model in almost every way, and not ent
p; lat < 80.0M) { f = 69.32M; }
>> else if (lat >= 80.0M) { f = 69.38M; }
>>
>> return f;
>> }
>>
>>
>> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude);
>> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)
to research yourself starting here:
* http://en.wikipedia.org/wiki/MapReduce
* http://hadoop.apache.org/
* http://wiki.apache.org/cassandra/HadoopSupport
I don't think it is all documented in any one place yet...
Paul Prescod
DType.getUUID(o2).timestamp();
return t1 < t2 ? -1 : (t1 > t2 ? 1 :
FBUtilities.compareByteArrays(o1, o2));
I'll add a bit to the document to clarify.
> Otherwise, great reading so far. Very helpful and wish I found this earlier.
Glad to help!
Paul Prescod
igure out why it is bailing.
Yes, I had the same problem. I didn't dig into it, but perhaps all
users have this problem now.
Paul Prescod
C mode for use on
fast LANs.
Paul Prescod
¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/
sn't it RAID-0 that's for pure speed?
Paul Prescod
"With OrderPreservingPartitioner the keys themselves are used to place
on the ring. One of the potential drawbacks of this approach is that
if rows are inserted with sequential keys, all the write load will go
to the same node."
http://wiki.apache.org/cassandra/StorageConfiguration
Wouldn't the "
queries.
>
>
> b
>
> On Wed, Apr 7, 2010 at 3:51 AM, Paul Prescod wrote:
>> I have one append-oriented workload and I would like to know if
>> Cassandra is appropriate for it.
>>
>> Given:
>>
>> * 100 nodes
>>
>> * an OrderPreserv
de from pulling mutations out of
MessagingService as fast as it can only to take up space in the
mutation queue and eventually fill up memory."
Or is it "MessagingService" itself which is OOMing?
On Wed, Apr 7, 2010 at 9:06 AM, Jonathan Ellis wrote:
> Great!
>
> On Wed, Apr 7
ers would only use this setting if they (think they)
> know what they are doing. :-)
I added this note to the API docs:
* ConsistencyLevel.ZERO: Ensure nothing. A write happens
asynchronously in background. If too many of these queue up, buffers
will explode and bad things will happen.
Apologies if I violated any community conventions. I'm happy to fix
the text if someone has a better suggestion.
Paul Prescod
I'm working on a blog post that combines all of the information and
ideas I can find relative to managing sorted lists in Cassandra.
http://jottit.com/s8c4a/#
Not only do I greatly appreciate comments, I actually don't think I
can publish it without some feedback because there are some embedded
q
share the load more fairly?
Paul Prescod
I *believe* that the key messages of those blog posts was:
1. Using distributed vector clocks are easy once they are implemented.
2. Implementing distributed vector clocks is hard on the datastore vendor.
3. If you have long-term network partitions you're kind of screwed (which
is probably tr
suppose
for the beginning of the discussion that some sort of interface will be
implemented to allow pluggable logic to be added to the server, personalized
scripts were an idea, I have heard. "
Kevin Kakugawa replies that they'll just use Java class libraries as a first
pass.
Paul Prescod
olver will do the summation for you
properly.
If I'm wrong, I'd love to hear more, though.
Paul Prescod
ether a future cassandra "eventually
consistent" increment/decrement feature based on vector clocks would
have semantics that are incompatible with most deployed uses of
memcached increment/decrement.
Paul Prescod
atomic increment/decrement. I'm familiar with
atomic add as a sort of locking mechanism.
Paul Prescod
ed by a function called "write_byte" (which is
implemented in Ruby!). I would be happy to hear that I'm Doing
Something Wrong, but I think it's just a consequence of the thrift
protocol and the client implementation.
I have no idea whether Avro is better. I'm not sure if it works well
enough to be tested yet...
Paul Prescod
supported with
most cache stores.
* http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#M001029
I checked a few of my own apps. They use get/set/add/delete, but the
add is almost always used as an optimization.
Paul Prescod
ion you've deployed? I have always
imagined it as being primarily for simple counters.
Paul Prescod
On Mon, Apr 5, 2010 at 12:01 AM, David Strauss wrote:
> On 2010-04-05 03:42, Paul Prescod wrote:
>...
>
> There is a difference between Cassandra allowing inc/dec on values and
> actually *knowing* the resultant value at the time of the write. It's
> likely that inc/dec sup
articular client sees intermediate values, nor that they see
unique values.
Paul Prescod
On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black wrote:
> ...
>
> Are you suggesting this would give you counter semantics?
Yes: My understanding of cassandra-580 is that it gives you increment
and decrement which are the basis of counters.
Paul Prescod
They could either continue on that basis or
retry with exponential back-off.
> ...
>
> that said, if people see a use case for this, I would do it.
I personally think that it would hit a nice 80/20 point, and once
vector clocks are implemented it might be easy to get to 99% memcached
compatibility.
Paul Prescod
once delivery though...
> In other words, Cassandra is quickly becoming the hammer to everyone's
> cluster nails. :)
>
> --Joe
>
> On Apr 4, 2010, at 12:47 PM, Paul Prescod wrote:
>
> Many Cassandra implementations seem to be memcached+X migrations, and some
> m
y, or they could define a convention for
splitting their keys based on special namespace characters like ":" or "_".
The user could say how to interpret keys without enough parts (i.e. whether
to treat the missing part as the keyspace or the columnfamily).
Paul Prescod
71 matches
Mail list logo