Misc Performance Questions

2011-06-08 Thread AJ
Is there a performance hit when dropping a CF? What if it contains .5 TB of data? If not, is there a quick and painless way to drop a large amount of data w/minimal perf hit? Is there a performance hit running multiple keyspaces on a cluster versus only one keyspace given a constant total

Re: how to retrieve data from supercolumns by phpcassa ?

2011-06-08 Thread amrita
Hi, Can u please tell me how to create a supercolumn and retrieve data from it using phpcassa??? student_details{{,,{,}}}

Re: how to retrieve data from supercolumns by phpcassa ?

2011-06-08 Thread Sasha Dolgy
you'll find a response to this question on the phpcassa mailing list ... where you asked the same question. -sd On Wed, Jun 8, 2011 at 10:22 AM, amrita wrote: > Hi, > Can u please tell me how to create a supercolumn and retrieve data from it > using > phpcassa??? > > student_details{{,,{,}}} >

Re: Misc Performance Questions

2011-06-08 Thread Richard Low
Hi AJ, On Wed, Jun 8, 2011 at 9:29 AM, AJ wrote: > Is there a performance hit when dropping a CF?  What if it contains .5 TB of > data?  If not, is there a quick and painless way to drop a large amount of > data w/minimal perf hit? Dropping a CF is quick - it snapshots the files (which creates

Re: how to know there are some columns in a row

2011-06-08 Thread Patrick de Torcy
There is no reason for ambiguities... We could add in the api another method call (similar to get_count) : get_columnNames - list get_columnNames(key, column_parent, predicate, consistency_level) Get the columns names present in column_parent within the predicate. The method is not O(

Data directories

2011-06-08 Thread Héctor Izquierdo Seliva
Hi, Is there a way to control what sstables go to what data directory? I have a fast but space limited ssd, and a way slower raid, and i'd like to put latency sensitive data into the ssd and leave the other data in the raid. Is this possible? If not, how well does cassandra play with symlinks?

Re: Misc Performance Questions

2011-06-08 Thread AJ
Thank you Richard! On 6/8/2011 2:57 AM, Richard Low wrote: There is however a difference in running multiple column families versus putting everything in the same column family and separating them with e.g. a key prefix. E.g. if you have a large data set and a small one, it will be quicker to

Re: Misc Performance Questions

2011-06-08 Thread Richard Low
On Wed, Jun 8, 2011 at 12:30 PM, AJ wrote: >> There is however a difference in running multiple column families >> versus putting everything in the same column family and separating >> them with e.g. a key prefix.  E.g. if you have a large data set and a >> small one, it will be quicker to query

Re: Data directories

2011-06-08 Thread Jonathan Ellis
No. https://issues.apache.org/jira/browse/CASSANDRA-2749 is open to track this but nobody is working on it to my knowledge. Cassandra is fine with symlinks at the data directory level but I don't think that helps you, since you really want to move the sstables themselves. (Cassandra is NOT fine wi

Re: Data directories

2011-06-08 Thread Héctor Izquierdo Seliva
El mié, 08-06-2011 a las 08:42 -0500, Jonathan Ellis escribió: > No. https://issues.apache.org/jira/browse/CASSANDRA-2749 is open to > track this but nobody is working on it to my knowledge. > > Cassandra is fine with symlinks at the data directory level but I > don't think that helps you, since y

Re: Installing Thrift with Solandra

2011-06-08 Thread Jean-Nicolas Boulay Desjardins
Krish Pan THANKS! Also thank you for making build successful in uppercase :) But it seems it is still not working. This time when I go into solandra-app directory I get the start-solandra.sh and when I use the command: ./start-solandra.sh I get this: http://dl.dropbox.com/u/20599297/Screen%20sh

Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread Héctor Izquierdo Seliva
Hi, I have an index I use to translate ids. I usually only read a column at a time, and it's becoming a bottleneck. I could rewrite the application to read a bunch at a time but it would make the application logic much harder, as it would involve buffering incoming data. As far as I know, to read

Re: Multiple large disks in server - setup considerations

2011-06-08 Thread Edward Capriolo
On Wed, Jun 8, 2011 at 12:19 AM, AJ wrote: > On 6/7/2011 9:32 PM, Edward Capriolo wrote: > > >> >> I do not like large disk set-ups. I think they end up not being >> economical. Most low latency use cases want high RAM to DISK ratio. Two >> machines with 32GB RAM is usually less expensive then

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread Peter Schuller
> As far as I know, to read a single column cassandra will deserialize a > bunch of them and then pick the correct one (64KB of data right?) Assuming the default setting of 64kb, the average amount deserialized given random column access should be 8 kb (not true with row cache, but with large rows

Re: Installing Thrift with Solandra

2011-06-08 Thread Krish Pan
looks like it is running, you can verify by running jps it will show you a process with name "jar" try this, cd ../reuters-demo ./1-download_data.sh ./2-import_data.sh While data is loading, open the file ./website/index.html in your favorite browser. On Wed, Jun 8, 2011 at 8:04 AM, Jean-Nico

nosql yes but yescql, no?

2011-06-08 Thread SriSatish Ambati
Gotta love, Eric! http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no -- SriSatish Ambati Director of Engineering, DataStax @srisatish

Re: nosql yes but yescql, no?

2011-06-08 Thread Marcos Ortiz
On 06/08/2011 01:23 PM, SriSatish Ambati wrote: Gotta love, Eric! http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no -- SriSatish Ambati Director of Engineering, DataStax @srisatish Good resource. Thanks for share it with us SriSatish Regards -- Marcos Luís Ortíz Valmaseda Soft

Re: nosql yes but yescql, no?

2011-06-08 Thread Jeffrey Kesselman
While I agree the Thrift API sucks, Id love to see that sovled on a binary level, and CQl on top of that. JK On Wed, Jun 8, 2011 at 2:50 PM, Marcos Ortiz wrote: > On 06/08/2011 01:23 PM, SriSatish Ambati wrote: > > Gotta love, Eric! > http://www.slideshare.net/jericevans/nosql-yes-but-yescql-no

Re: nosql yes but yescql, no?

2011-06-08 Thread Jeremy Hanna
I think that's partly the idea of it. CQL could end up being a way forward and it currently builds on thrift. Then if it becomes the API/client of record to build on, then it could move to something else underneath that's more efficient and CQL itself wouldn't have to change at all. On Jun 8,

Re: Installing Thrift with Solandra

2011-06-08 Thread Jean-Nicolas Boulay Desjardins
Thanks again... Here it gets a bit more complex. I added Solandra to /tmp folder like you told me. And the data also... Everything seems to work. The problem is I am running Solandra in a VM on my Mac OS X the VM is Ubuntu Server. On that VM I have a DNS server... And one of my domain names i

Re: Installing Thrift with Solandra

2011-06-08 Thread Jean-Nicolas Boulay Desjardins
Also how can I backup the data that I loaded. Because in the next reboot I am going to loose all the data that I loaded and like you know it takes time... I tried to copy the folder Solandra in another folder outside the /tmp... But I am not sure that is enough. Thanks! On Wed, Jun 8, 2011 at 2

RE: how to know there are some columns in a row

2011-06-08 Thread Jeremiah Jordan
I am pretty sure this would cut down on network traffic, but not on Disk IO or CPU use. I think Cassandra would still have to deserialize the whole column to get to the name. So if you really have a use case where you just want the name, it would be better to store a separate "name with no data"

Re: nosql yes but yescql, no?

2011-06-08 Thread Jeffrey Kesselman
That makes sense :) On Wed, Jun 8, 2011 at 2:37 PM, Jeremy Hanna wrote: > I think that's partly the idea of it.  CQL could end up being a way forward > and it currently builds on thrift.  Then if it becomes the API/client of > record to build on, then it could move to something else underneath

hadoop/pig notes

2011-06-08 Thread William Oberman
I decided to try out hadoop/pig + cassandra. I had my ups and downs to get the script I wanted to run to work. I'm sure everyone who tries will have their own experiences/problems, but mine were: -Everything I need to know was in http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html an

Re: CLI set command returns null, ver 0.8.0

2011-06-08 Thread aaron morton
Can you provide the cli script to create the schema and info on how many nodes you have. Thanks - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jun 2011, at 16:12, AJ wrote: > Can anyone help? The CLI seems to be having issues. The

Re: hadoop/pig notes

2011-06-08 Thread Jeremy Hanna
I need to update the wiki with better pig info. I did put some information in the getting started docs of pygmalion, but it would be good to transfer that to cassandra's wiki and add to it. fwiw - https://github.com/jeromatron/pygmalion/wiki/Getting-Started Thanks for the rundown William! On

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread aaron morton
Just to make things less clear, if you have one row that you are continually writing it may end up spread out over several SSTables. Compaction helps here to reduce the number of files that must be accessed so long as is can keep up. But if you want to read column X and the row is fragmented ove

Re: how to know there are some columns in a row

2011-06-08 Thread Patrick de Torcy
| I am pretty sure this would cut down on network traffic, but not on Disk IO or CPU use. Well, that's the same for the get_count method ! I think that would be ok,since the network traffic is the real problem (big values...). To store the column names in a separate column could be a solution of

Re: how to know there are some columns in a row

2011-06-08 Thread aaron morton
> Forgive me if I am a little insistent, but it's important for us and I'm sure > we are not the only ones interested in this feature... Not an issue, it's how things get done on :) Create a jira ticket https://issues.apache.org/jira/browse/CASSANDRA with your ideas to start the process and ask

Re: [RELEASE] 0.8.0

2011-06-08 Thread Yi Yang
Is there anyone willing to upgrade the libcassandra for C++, to support new features in 0.8.0? Or has anyone started to work on it? Thanks On Jun 3, 2011, at 7:36 AM, Eric Evans wrote: > > I am very pleased to announce the official release of Cassandra 0.8.0. > > If you haven't been paying a

Running a cluster with 256mb RAM nodes

2011-06-08 Thread Donny Nadolny
I'd like to start using cassandra for a certain part of my database that has high write volume. I'm setting up a 3 node cluster, however my site doesn't make enough money yet to justify 3 nodes meeting the hardware recommendationof 4gb RAM. Instea

Re: Running a cluster with 256mb RAM nodes

2011-06-08 Thread Watanabe Maki
I once built a 4 node ring on my laptop, with 64MB heap for each instances. I could write and read on it, but nodetool repair caused OOM. You should test essential operations with estimated data loaded, under expected traffic. Btw I'm using 96MBx4 node ring on my laptop now just for my private l

Re: CLI set command returns null, ver 0.8.0

2011-06-08 Thread AJ
Thanks Aaron, I created a script and everything went OK. I think that the problem is when you try to update a CF. Below, I try to change the column comparator and it complains that the 'comparators do not match'. Can you enlighten me on what that means? There is no data in the CF at this

Is there a way from a running Cassandra node to determine whether or not itself is "up"?

2011-06-08 Thread Suan Aik Yeo
Is there a way (preferably an exposed method accessible through Thrift), from a running Cassandra node to determine whether or not itself is "up"? (Per Cassandra standards, I'm assuming based on the gossip protocol). Another way to think of what I'm looking for is basically running "nodetool ring"