Re: Recommended sort mechanism and partitioner

2010-10-15 Thread Paul Prescod
I wrote some thoughts about this on my blog. I think it's still mostly correct: * http://www.ayogo.com/techblog/2010/04/sorting-in-cassandra/ On Fri, Oct 15, 2010 at 11:14 AM, Wicked J wrote: > Hi, > I'm using TimeUUID/Sort by column name mechanism. The column value can > contain text data (in

Re: Cassandra Horizontal Scalability

2010-07-26 Thread Paul Prescod
There are a lot of variables that go into a proper benchmark. The bottleneck could be in many different places. How many client threads are you using? What kind of network? On Mon, Jul 26, 2010 at 8:29 AM, SSam wrote: > > From Cassandra Website: > >- *Elastic* > >Read and write throughp

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-13 Thread Paul Prescod
On Mon, Jul 12, 2010 at 11:44 PM, Benjamin Black wrote: > We use Cassandra (multidimensional metrics) *and* redis (counters and > alerts) *and* MySQL (supporting Rails).  Right tool for each job.  The > idea that it is a good thing to cram everything into a single database > (and data model), beat

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-12 Thread Paul Prescod
re still >>>>> testing for bugs and might go live in couple of weeks. You can ask any >>>>> specific questions about vbulletin and cassandra and i will answer to the >>>>> best of my knowledge. >>>>> I our case a combination of cassandra and r

Re: advice, is cassandra suitable for a multi-tanency vBulletin type application?

2010-07-11 Thread Paul Prescod
you are confident you will have trouble scaling traditional technologies, it might not make business sense. Paul Prescod

Re: How to import data from MYSQL to Cassandra

2010-07-01 Thread Paul Prescod
As Paul said, you need to re-build your data in a Cassandra-friendly manner. Reading SQL files does not seem a very efficient way to do that though. Most databases can output in much simpler formats, like CSV. But then, why export at all? If the MySQL instance and the Cassandra instance are both ad

Re: Cassandra and Thrift on the Server Side

2010-06-28 Thread Paul Prescod
security complexities. Which keys will a particular browser client be allowed to overwrite? What prevents an end-user from deleting your database through AJAX calls? I think you'd need some form of ACL and access token system. That's a lot of complexity. Paul Prescod

Re: Cassandra and Thrift on the Server Side

2010-06-28 Thread Paul Prescod
bus.lan%3e Follow the thread links to learn more about AVRO, which will replace Thrift in Cassandra. Paul Prescod

Re: Beginner Assumptions

2010-06-13 Thread Paul Prescod
ek or so ago: * https://issues.apache.org/jira/browse/CASSANDRA-1072 Paul Prescod

Re: http://voltdb.com/ ?

2010-06-09 Thread Paul Prescod
I hope Cassandra is competitive with other solutions well before 50TB of data. There is a middle ground where you might choose one or the other. Just as there are areas where you might choose PostGres or Cassandra. They claim it will scale all the way up. Right now the likely dealbreaker will be i

Re: how does cassandra compare with mongodb?

2010-05-14 Thread Paul Prescod
ssue to think that an enterprise IT department would prefer one or the other on the basis of it. Neither has foreign keys or transactions. Both shift work from the datastore to the application. If that's not what you want, neither is a good choice. Paul Prescod

Re: Increment and Decrement operation

2010-05-13 Thread Paul Prescod
I'm curious what the relevance of CASSANDRA-1016 is. On Thu, May 13, 2010 at 2:24 PM, Tobias Jungen wrote: > I don't think this is currently possible. There is some work underway to add > it in the future, however: > > https://issues.apache.org/jira/browse/CASSANDRA-721 > https://issues.apache.or

Re: Increment and Decrement operation

2010-05-13 Thread Paul Prescod
No, but there is ongoing work on it: * https://issues.apache.org/jira/browse/CASSANDRA-580 * http://www.formspring.me/joestump/q/420668558 * http://permalink.gmane.org/gmane.comp.db.cassandra.user/3740 And in the meantime, an interim patch: * https://issues.apache.org/jira/browse/CASSANDRA

Re: Human readable Cassandra limitations

2010-05-12 Thread Paul Prescod
is what you're talking about: https://issues.apache.org/jira/browse/CASSANDRA-1016 Paul Prescod

Re: replication impact on write throughput

2010-05-11 Thread Paul Prescod
degradation. > So by your math, 100 nodes with each node getting 5k wps, I would assume the > total capacity is 500k wps. But perhaps I've misunderstood some key > concepts. Still a novice myself ;-) If the replication factor is 2, then everything is written twice. So your throughput is cut in half. Paul Prescod

Re: Human readable Cassandra limitations

2010-05-10 Thread Paul Prescod
hadoop You can read criticisms of MapReduce in the first link there. > On May 10, 2010, at 11:22 AM, Paul Prescod wrote: > > This is a very, very big topic. For the most part, the issues are > covered in the various SQL versus NoSQL debates all over the Internet. > For example: &

Re: Human readable Cassandra limitations

2010-05-10 Thread Paul Prescod
Also: * you should Google "eventual consistency" to learn about the strengths and weaknesses of that. On Mon, May 10, 2010 at 11:22 AM, Paul Prescod wrote: > This is a very, very big topic. For the most part, the issues are > covered in the various SQL versus NoSQL deba

Re: Human readable Cassandra limitations

2010-05-10 Thread Paul Prescod
This is a very, very big topic. For the most part, the issues are covered in the various SQL versus NoSQL debates all over the Internet. For example: * Cassandra and its NoSQL siblings have no concept of an in-database "join" * Cassandra and its NoSQL siblings do not allow you to update multipl

Re: Tuning Cassandra

2010-05-10 Thread Paul Prescod
Does the Caasandra performance start fast and slow down (indicating some buffer being filled) or does it start slow and stay slow? On Mon, May 10, 2010 at 2:05 AM, David Boxenhorn wrote: > I read something like 80,000 rows from Oracle and write them to Cassandra in > chunks of 1000 rows - so I'm

Re: Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Paul Prescod
On Mon, Apr 26, 2010 at 2:15 PM, Anthony Molinaro wrote: > I think it might be worse case that you read all the disks. If your > block size is large enough to hold an entire row, you should only have to > read one disk to get that data. And conversely, for a large enough row you might benefit fro

Re: Can Cassandra make real use of several DataFileDirectories?

2010-04-26 Thread Paul Prescod
s will go to *all* hard drives. RAID0 is designed specifically to improve performance (both latency and bandwidth). I'm unclear about why you think it would decrease importance. Perhaps you're thinking of another RAID type? Paul Prescod

Re: The Difference Between Cassandra and HBase

2010-04-24 Thread Paul Prescod
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/ http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html On Sat, Apr 24, 2010 at 10:20 AM, dir dir wrote: > In general what is the difference between Cassandra and HBase?? > > Thanks. >

Re: Question about a potential configuration scenario

2010-04-23 Thread Paul Prescod
http://wiki.apache.org/cassandra/Operations === A Cassandra cluster always divides up the key space into ranges delimited by Tokens as described above, but additional replica placement is customizable via !IReplicaPlacementStrategy in the configuration file. The standard strategies are RackUnawa

Re: questions about consistency

2010-04-22 Thread Paul Prescod
ample of why you need vector clocks. The description for CASSANDRA-580 is "Allow a ColumnFamily to be versioned via vector clocks, instead of long timestamps. Purpose: enable incr/decr; flexible conflict resolution." https://issues.apache.org/jira/browse/CASSANDRA-580 Paul Prescod

Re: questions about consistency

2010-04-21 Thread Paul Prescod
I'm not an expert, so take what I say with a grain of salt. 2010/4/21 Даниел Симеонов : > Hello, >    I am pretty new to Cassandra and I have some questions, they may seem > trivial, but still I am pretty new to the subject. First is about the lack > of a compareAndSet() operation, as I understood

Re: Regarding Cassandra Scalability

2010-04-18 Thread Paul Prescod
you > mean. Do you have a pressing need to use Cassandra right now, before version 1.0 is even available? That limitation will go away before 1.0, so you could simply wait and not worry about it. Documentation will also be much more complete in the future. Paul Prescod

Re: Regarding Cassandra Scalability

2010-04-16 Thread Paul Prescod
http://www.google.ca/search?hl=en&q=cassandra+terabyte On Thu, Apr 15, 2010 at 11:28 PM, Linton N wrote: > hi , > I am working for the past 1 year with hadoop, but quite new to > cassandra, I would like to get clarified few things regarding the > scalability of Cassandra. Can it scall up

Re: Starting Cassandra Fauna

2010-04-14 Thread Paul Prescod
There is a tutorial here: * http://www.sodeso.nl/?p=80 This page includes data inserts: * http://www.sodeso.nl/?p=251 Like: c.setColumn(new Column("email".getBytes("utf-8"), "ronald (at) sodeso.nl".getBytes("utf-8"), timestamp)) columns.add(c); The Sample code is attached to that blog post.

Re: Reading thousands of columns

2010-04-14 Thread Paul Prescod
a single query (perhaps entered interactively) would replace the entire row caching all of the data for the systems' interactive users. For example, a summary page of who is most over the last month active could replace the profile information for the actual users who are using the system at that moment. Paul Prescod

Re: History values

2010-04-14 Thread Paul Prescod
If you want to use Cassandra, you should probably store each historical value as a new column in the row. On Wed, Apr 14, 2010 at 12:34 AM, Yésica Rey wrote: > I am new to using cassandra. In the documentation I have read, understand, > that as in other non-documentary databases, to update the va

KeysCached and sstable

2010-04-14 Thread Paul Prescod
ache is to avoid looking through a bunch off SSTable's Bloom Filters? (how big do the bloom filters grow to...too much to be cached themselves?) I'd like to document the detail. Paul Prescod

Re: Caching is a full row?

2010-04-13 Thread Paul Prescod
On Tue, Apr 13, 2010 at 5:26 PM, Rob Coli wrote: > On 4/13/10 5:04 PM, Paul Prescod wrote: >> >> Am I correct in my understanding that the unit of caching (and >> fetching from disk?) is a full row? > > Cassandra has both a Key and a Row cache. Unfortunately there appe

Caching is a full row?

2010-04-13 Thread Paul Prescod
(columnFamily_)), Integer.MIN_VALUE); Paul Prescod

Re: Worst case #iops to read a row

2010-04-13 Thread Paul Prescod
I notice that the documentation on the read path is quite compressed on this page: * http://wiki.apache.org/cassandra/ArchitectureOverview What is the best documentation of the read path? I'm also curious about the granularity and policies around caching. Paul Prescod

Re: Worst case #iops to read a row

2010-04-13 Thread Paul Prescod
> > > Why does RF enter this? A simplistic model for a consistent read that is asking all replicas what their value is for the key. If the key is in the fourth SSTable of all nodes, won't they all have to do 12 IOPs to find it? Paul Prescod

Re: Worst case #iops to read a row

2010-04-13 Thread Paul Prescod
On Tue, Apr 13, 2010 at 11:52 AM, Scott White wrote: > >... > > Agreed. Kind of sorry to see Scott White and Benjamin Black being in agreementbut I guess that's the way yin and yang works. Opposition is illusory in any case. Paul Prescod

Re: Worst case #iops to read a row

2010-04-13 Thread Paul Prescod
ce"? The document above implies that it is nearly impossible. It implies that you will have between 1 and 4 SSTables. Does the administrator have a choice in this matter? I am probably being totally naive, but is the answer to the question "worst iops on read" just: 3 reads per SSTable * 4 SStables * ReplicationFactor ? = 3 * 4 * 3 = 36? Paul Prescod

Re: Worst case #iops to read a row

2010-04-13 Thread Paul Prescod
3 / 131k) * 3 = 150M / 131k = 11,450. This line isn't internally consistent. Where did 150M come from? 500 M * 9 = 4.5 Billion. My calculation for the whole thing is 3433. I am not claiming to be a Cassandra expert and therefore cannot vouch for the model at all. Paul Prescod

Re: compare cassandra read n write results

2010-04-12 Thread Paul Prescod
contrib/py_stress Although that's still written in a scripting language, it at least uses threading. Anyhow, what's your real goal? Inserting 100K or 1M rows in 30 seconds from a single-threaded environment like PHP is pretty good. Do your business goals require more? Also: Is it 100K or 1M? In

Re: Off line client nodes?

2010-04-12 Thread Paul Prescod
allows you to implement your specific policies. You might also want to investigate "Microsoft Sync Framework" and its competitors. Paul Prescod

Re: compare cassandra read n write results

2010-04-12 Thread Paul Prescod
How will they know whether the performance problem is caused by Cassandra or Pandra if you do not have raw Cassandra performance numbers for your setup? On Mon, Apr 12, 2010 at 5:51 AM, vineet daniel wrote: > I dont think it would be a good idea not to use pandra for benchmarks as we > are going

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-11 Thread Paul Prescod
ingle value (in a way which avoids race > conditions, of course). How do you avoid the race condition? Don't you need a lock? Paul Prescod Ayogo, Inc.

Re: How to perform queries on Cassandra?

2010-04-11 Thread Paul Prescod
incremental numeric id as key and >> >> >> > keeping >> >> >> > the >> >> >> > name >> >> >> > and value same in the column family. >> >> >> > >> >> >> > Example : &

Re: How to perform queries on Cassandra?

2010-04-11 Thread Paul Prescod
gt;> >> > >> >> > On Sun, Apr 11, 2010 at 1:13 PM, Benjamin Black wrote: >> >> >> >> >> >> You would have a Column Family, not a column for that; let's call it >> >> >> the Users CF.  You'd use username as the row key and have a column &g

Re: How to perform queries on Cassandra?

2010-04-10 Thread Paul Prescod
This tutorial may help. http://www.sodeso.nl/?p=251 Cassandra is very early software...not even version 1.0 yet. You'll need to figure out a lot yourself by reading blog posts, examples, comparing to API documentation, etc. Cassandra is an entirely different model in almost every way, and not ent

Re: How to perform queries on Cassandra?

2010-04-09 Thread Paul Prescod
p; lat < 80.0M) { f = 69.32M; } >> else if (lat >= 80.0M) { f = 69.38M; } >> >> return f; >> } >> >> >> Decimal MilesPerDegreeLatitude = GetLatitudeMiles(zList[0].Latitude); >> Decimal MilesPerDegreeLongitude = ((Decimal) Math.Abs(Math.Cos((Double)

Re: How to perform queries on Cassandra?

2010-04-09 Thread Paul Prescod
to research yourself starting here: * http://en.wikipedia.org/wiki/MapReduce * http://hadoop.apache.org/ * http://wiki.apache.org/cassandra/HadoopSupport I don't think it is all documented in any one place yet... Paul Prescod

Re: Sorting and ordering in Cassandra

2010-04-08 Thread Paul Prescod
DType.getUUID(o2).timestamp(); return t1 < t2 ? -1 : (t1 > t2 ? 1 : FBUtilities.compareByteArrays(o1, o2)); I'll add a bit to the document to clarify. > Otherwise, great reading so far. Very helpful and wish I found this earlier. Glad to help! Paul Prescod

Re: Starting Cassandra Fauna

2010-04-08 Thread Paul Prescod
igure out why it is bailing. Yes, I had the same problem. I didn't dig into it, but perhaps all users have this problem now. Paul Prescod

Write consistency

2010-04-08 Thread Paul Prescod
C mode for use on fast LANs. Paul Prescod ¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Paul Prescod
sn't it RAID-0 that's for pure speed? Paul Prescod

Is this sentence slightly inaccurate

2010-04-07 Thread Paul Prescod
"With OrderPreservingPartitioner the keys themselves are used to place on the ring. One of the potential drawbacks of this approach is that if rows are inserted with sequential keys, all the write load will go to the same node." http://wiki.apache.org/cassandra/StorageConfiguration Wouldn't the "

Re: OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Paul Prescod
queries. > > > b > > On Wed, Apr 7, 2010 at 3:51 AM, Paul Prescod wrote: >> I have one append-oriented workload and I would like to know if >> Cassandra is appropriate for it. >> >> Given: >> >>  * 100 nodes >> >>  * an OrderPreserv

Re: ConsistencyLevel.ZERO

2010-04-07 Thread Paul Prescod
de from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory." Or is it "MessagingService" itself which is OOMing? On Wed, Apr 7, 2010 at 9:06 AM, Jonathan Ellis wrote: > Great! > > On Wed, Apr 7

ConsistencyLevel.ZERO

2010-04-07 Thread Paul Prescod
ers would only use this setting if they (think they) > know what they are doing. :-) I added this note to the API docs: * ConsistencyLevel.ZERO: Ensure nothing. A write happens asynchronously in background. If too many of these queue up, buffers will explode and bad things will happen. Apologies if I violated any community conventions. I'm happy to fix the text if someone has a better suggestion. Paul Prescod

Sorting and ordering in Cassandra

2010-04-07 Thread Paul Prescod
I'm working on a blog post that combines all of the information and ideas I can find relative to managing sorted lists in Cassandra. http://jottit.com/s8c4a/# Not only do I greatly appreciate comments, I actually don't think I can publish it without some feedback because there are some embedded q

OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Paul Prescod
share the load more fairly? Paul Prescod

Consistent counters (was: Memcached protocol)

2010-04-06 Thread Paul Prescod
I *believe* that the key messages of those blog posts was: 1. Using distributed vector clocks are easy once they are implemented. 2. Implementing distributed vector clocks is hard on the datastore vendor. 3. If you have long-term network partitions you're kind of screwed (which is probably tr

How do vector clocks and conflicts work?

2010-04-06 Thread Paul Prescod
suppose for the beginning of the discussion that some sort of interface will be implemented to allow pluggable logic to be added to the server, personalized scripts were an idea, I have heard. " Kevin Kakugawa replies that they'll just use Java class libraries as a first pass. Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
olver will do the summation for you properly. If I'm wrong, I'd love to hear more, though. Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
ether a future cassandra "eventually consistent" increment/decrement feature based on vector clocks would have semantics that are incompatible with most deployed uses of memcached increment/decrement. Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
atomic increment/decrement. I'm familiar with atomic add as a sort of locking mechanism. Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
ed by a function called "write_byte" (which is implemented in Ruby!). I would be happy to hear that I'm Doing Something Wrong, but I think it's just a consequence of the thrift protocol and the client implementation. I have no idea whether Avro is better. I'm not sure if it works well enough to be tested yet... Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
supported with most cache stores. * http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#M001029 I checked a few of my own apps. They use get/set/add/delete, but the add is almost always used as an optimization. Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
ion you've deployed? I have always imagined it as being primarily for simple counters. Paul Prescod

Re: Memcached protocol?

2010-04-05 Thread Paul Prescod
On Mon, Apr 5, 2010 at 12:01 AM, David Strauss wrote: > On 2010-04-05 03:42, Paul Prescod wrote: >... > > There is a difference between Cassandra allowing inc/dec on values and > actually *knowing* the resultant value at the time of the write. It's > likely that inc/dec sup

Re: Memcached protocol?

2010-04-04 Thread Paul Prescod
articular client sees intermediate values, nor that they see unique values. Paul Prescod

Re: Memcached protocol?

2010-04-04 Thread Paul Prescod
On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black wrote: > ... > > Are you suggesting this would give you counter semantics? Yes: My understanding of cassandra-580 is that it gives you increment and decrement which are the basis of counters. Paul Prescod

Re: Memcached protocol?

2010-04-04 Thread Paul Prescod
They could either continue on that basis or retry with exponential back-off. > ... > > that said, if people see a use case for this, I would do it. I personally think that it would hit a nice 80/20 point, and once vector clocks are implemented it might be easy to get to 99% memcached compatibility. Paul Prescod

Re: Memcached protocol?

2010-04-04 Thread Paul Prescod
once delivery though... > In other words, Cassandra is quickly becoming the hammer to everyone's > cluster nails. :) > > --Joe > > On Apr 4, 2010, at 12:47 PM, Paul Prescod wrote: > > Many Cassandra implementations seem to be memcached+X migrations, and some > m

Memcached protocol?

2010-04-04 Thread Paul Prescod
y, or they could define a convention for splitting their keys based on special namespace characters like ":" or "_". The user could say how to interpret keys without enough parts (i.e. whether to treat the missing part as the keyspace or the columnfamily). Paul Prescod