Re: performance degradation in cluster

2011-02-03 Thread Peter Schuller
> First time I tun single instance of Cassandra and my application on a system > (16GB ram and 8 core), the time taken was 480sec. > When I added one more system ,(means this time I was running 2 instance > of Cassandra in cluster) and running application from single client , I > found time taken i

Re: Tracking down read latency

2011-02-03 Thread Peter Schuller
> $ iostat As rcoli already mentioned you don't seen to have an I/O problem, but as a point of general recommendation: When determining whether you are blocking on disk I/O, pretty much *always* use "iostat -x" rather than the much less useful default mode of iostat. The %util and queue wait/avera

Re: Secondary indexes on super columns

2011-02-03 Thread Sébastien Druon
Thanks a lot for the info Sebastien On 2 February 2011 16:53, Jonathan Ellis wrote: > On Wed, Feb 2, 2011 at 7:37 AM, Sébastien Druon > wrote: > > Hi! > > I would like to know if secondary indexes are foreseen for super columns > / > > columns inside of super columns? > > No. > > > If yes, wil

Re: Counters in 0.8 -- conditional?

2011-02-03 Thread Sylvain Lebresne
> > Thanks. Yes I know it's by no means trivial. I thought in case there was an > index on the column on which I want to place condition, the index machinery > itself can do the counting (i.e. when the index is updated, the counter is > incremented). It doesn't seem too orthogonal to the current im

Re: performance degradation in cluster

2011-02-03 Thread abhinav prakash rai
Hi Peter, Thanks for your reply. Our application is multi-threaded. we are using 8 core machine. In our application we are using 4 column families out of which one column family is containing rows whose size is huge relative to size of the rows in other column families. In the ring the balance i

Re: Does Consistency QUORUM broken on cassandra 0.7.0 and 0.6.11

2011-02-03 Thread aaron morton
The affected versions are listed as 0.6.10 and 7.1, it affects get_range_slice at quorum https://issues.apache.org/jira/browse/CASSANDRA-2094 impacts 0.7.1 and and will break QUORUM reads where RF > 3 for get_slice() AFAIK it's not in 0.7 , and 0.7.1 is not released yet. Aaron On 3/02/201

Re: Slow network writes

2011-02-03 Thread aaron morton
It's in the src distro http://cassandra.apache.org/download/ Aaron On 3/02/2011, at 12:27 PM, buddhasystem wrote: > > Never mind, I found it in SVN... > (not in gz) > > Thanks. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-net

Re: Schema Design Question : Supercolumn family or just a Standard column family with columns containing serialized aggregate data?

2011-02-03 Thread Aditya Narayan
Thanks Tyler! On Thu, Feb 3, 2011 at 12:06 PM, Tyler Hobbs wrote: > On Wed, Feb 2, 2011 at 3:27 PM, Aditya Narayan wrote: >> >> Can I have some more feedback about my schema perhaps somewhat more >> criticisive/harsh ? > > It sounds reasonable to me. > > Since you're writing/reading all of the

Re: performance degradation in cluster

2011-02-03 Thread aaron morton
This page has a guide to setting the initial tokens for the nodes http://wiki.apache.org/cassandra/Operations#Ring_management You can also use the bin/nodetool cfstats command or JConsole to check the maximum row size in each node, to see if you have a monster row. Aaron On 3/02/2011, at 10:22

Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Aditya Narayan
Hey all, I want to store some columns that are reminders to the users on my application, in time sorted order in a row(timeline row of the user). Would it be recommended to store these reminder columns in the timeline row with column names like: combination of timestamp(of time when the reminder

Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
Is there any advantage to using supercolumns (columnFamilyName[superColumnName[columnName[val]]]) instead of regular columns with concatenated keys (columnFamilyName[superColumnName@columnName[val]])? When I designed my data model, I used supercolumns wherever I needed two levels of key depth - j

Re: Do supercolumns have a purpose?

2011-02-03 Thread Sylvain Lebresne
> Is there any advantage to using supercolumns > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular > columns with concatenated keys > (columnFamilyName[superColumnName@columnName[val]])? > > When I designed my data model, I used supercolumns wherever I needed two > levels of k

Re: Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Sylvain Lebresne
On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan wrote: > Hey all, > > I want to store some columns that are reminders to the users on my > application, in time sorted order in a row(timeline row of the user). > > Would it be recommended to store these reminder columns in the > timeline row with c

Re: Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Aditya Narayan
If I use : : : as key pattern for the rows of reminders, then I am storing the key, just as it is, as the column name and thus column values need not contain a link to the row containing the reminder details. I think UserId would be required along with timestamp in the key pattern to provide un

Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
Thanks Sylvain! Can I vote for internally implementing supercolumn families as regular column families? (With a smooth upgrade process that doesn't require shutting down a live cluster.) What if supercolumn families were supported as regular column families + an index (on what used to be supercol

Using Cassandra to store files

2011-02-03 Thread Brendan Poole
Hi Would anyone recommend using Cassandra for storing hundreds of thousands of documents in Word/PDF format? The manual says it can store documents under 64MB with no issue but was wondering if anyone is using it for this specific perpose. Would it be efficient/reliable and is there anything I

Re: Sorting in time order without using TimeUUID type column names

2011-02-03 Thread Aditya Narayan
If I use : | | as key pattern for the rows of reminders, then I am storing the key, just as it is, as the column name and thus column values need not contain a link to the row containing the reminder details. I think UserId would be required along with timestamp in the key pattern to provide un

Re: Do supercolumns have a purpose?

2011-02-03 Thread Sylvain Lebresne
On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn wrote: > Thanks Sylvain! > > Can I vote for internally implementing supercolumn families as regular > column families? (With a smooth upgrade process that doesn't require > shutting down a live cluster.) > I forgot to add that I don't know if this

Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
The advantage would be to enable secondary indexes on supercolumn families. I understand from this thread that indexes are supercolumn families are not going to be: http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html Which, it seems to me, effectively deprecates supercolumn famil

cassandra 0.6.11 binary package problem

2011-02-03 Thread Jean-Yves LEBLEU
Hi all, Just for info, in apache-cassandra-0.6.11-bin.tar.gz there are both apache-cassandra-0.6.10.jar and apache-cassandra-0.6.11.jar in the lib directory. Causing troubles to my upgrade scripts which use this file to get installed version and check if upgrade needed . :( Thanks for the g

Re: 0.7.0 mx4j, get attribute

2011-02-03 Thread Chris Burroughs
On 02/02/2011 01:41 PM, Ryan King wrote: > On Wed, Feb 2, 2011 at 10:40 AM, Chris Burroughs > wrote: >> I'm using 0.7.0 and experimenting with the new mx4j support. >> >> http://host:port/mbean?objectname=org.apache.cassandra.request%3Atype%3DReadStage >> >> Returns a nice pretty html page. For p

Re: cassandra 0.6.11 binary package problem

2011-02-03 Thread Jonathan Ellis
Well, that's odd. :) Do any of the other tar.gz balls contain multiple jars? On Thu, Feb 3, 2011 at 6:06 AM, Jean-Yves LEBLEU wrote: > Hi all, > > Just for info, in apache-cassandra-0.6.11-bin.tar.gz there are both > apache-cassandra-0.6.10.jar  and apache-cassandra-0.6.11.jar in the > lib direc

Re: cassandra 0.6.11 binary package problem

2011-02-03 Thread Jean-Yves LEBLEU
Don't known, only checked http://www.apache.org/dyn/closer.cgi?path=/cassandra/0.6.11/apache-cassandra-0.6.11-bin.tar.gz Rgds. JY On Thu, Feb 3, 2011 at 3:36 PM, Jonathan Ellis wrote: > Well, that's odd. :) > > Do any of the other tar.gz balls contain multiple jars? > > On Thu, Feb 3, 2011 at 6:0

Re: Do supercolumns have a purpose?

2011-02-03 Thread Sylvain Lebresne
On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: > The advantage would be to enable secondary indexes on supercolumn families. > Then I suggest opening a ticket for adding secondary indexes to supercolumn families and voting on it. This will be 1 or 2 order of magnitude less work than gett

Re: Do supercolumns have a purpose?

2011-02-03 Thread Jonathan Ellis
On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote: > On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: >> >> The advantage would be to enable secondary indexes on supercolumn >> families. > > Then I suggest opening a ticket for adding secondary indexes to supercolumn > families and voti

Re: Using Cassandra to store files

2011-02-03 Thread buddhasystem
CouchDB -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: 0.7.0 mx4j, get attribute

2011-02-03 Thread Jonathan Ellis
No idea. Is there an mx4j list you could try maybe? :) On Wed, Feb 2, 2011 at 10:40 AM, Chris Burroughs wrote: > I'm using 0.7.0 and experimenting with the new mx4j support. > > http://host:port/mbean?objectname=org.apache.cassandra.request%3Atype%3DReadStage > > Returns a nice pretty html page.

Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
Well, I am an "actual active developer" and I have "managed to do pretty nice stuffs with Cassandra" - without secondary indexes so far. But I'm looking forward to having secondary indexes in my arsenal when new functional requirements come up, and I'm bummed out that my early design decision to us

Re: Using Cassandra to store files

2011-02-03 Thread Dan Kuebrich
> > > CouchDB > That's not what document-oriented means! (har har) I don't know all the details of your case, but with serving static files I suspect you could do ok with something that has a much smaller memory/cpu footprint as you won't have as great of write throughput / read latency concerns.

Re: Using Cassandra to store files

2011-02-03 Thread Victor Kabdebon
Dear Brendan, I would really be interested by your findings too. I need a system to store various documents, I am thinking of Cassandra (that I am already using) or using a second type of database or any other system. Maybe like dan suggested, using mogilefs. Thank you, Victor Kabdebon http://www

Mitigating CASSANDRA-2059 -- leftover files

2011-02-03 Thread Omer van der Horst Jansen
Jonathan pointed out in another thread that it looks like I'm running into CASSANDRA-2059, where secondary files are not being properly deleted. My production data set at any given time is less than 100 MB in size, but the Cassandra data directories on each instance are using 30 to 40 times as much

Re: Mitigating CASSANDRA-2059 -- leftover files

2011-02-03 Thread Jonathan Ellis
On Thu, Feb 3, 2011 at 7:45 AM, Omer van der Horst Jansen wrote: > In the meantime, is it safe to manually delete stale files while > Cassandra is running?  And how do I determine when a set of files is > stale? > > I'd assume that a given set of files is deletable if there is no > -Data.db file a

Re: 0.7.0 mx4j, get attribute

2011-02-03 Thread Ran Tavory
Try adding this to the end of the URL: ?template=identity On Thu, Feb 3, 2011 at 4:23 PM, Chris Burroughs wrote: > On 02/02/2011 01:41 PM, Ryan King wrote: > > On Wed, Feb 2, 2011 at 10:40 AM, Chris Burroughs > > wrote: > >> I'm using 0.7.0 and experimenting with the new mx4j support. > >> > >>

Re: 0.7.0 mx4j, get attribute

2011-02-03 Thread Chris Burroughs
On 02/03/2011 11:29 AM, Ran Tavory wrote: > Try adding this to the end of the URL: ?template=identity > That works, thanks!

Re: unsubscribe

2011-02-03 Thread Eric Evans
On Wed, 2011-02-02 at 21:04 +0200, Janne Jalkanen wrote: > How about adding an autosignature with unsubscription info? I might be overly cynical, but I'd wager that would serve no purpose other than the comical value of seeing it appended to these unsubscribe messages. > /Janne > > On Feb 2, 201

Re: rolling window of data

2011-02-03 Thread Peter Schuller
> The correct way to accomplish what you describe is the new (in 0.7) > per-column TTL.  Simply set this to 60 * 60 * 24 * 90 (90 day's worth of > seconds) and your columns will magically disappear after that length of > time. Although that assumes it's okay to loose data or that there is some oth

Re: Tracking down read latency

2011-02-03 Thread sridhar basam
The data provided is also a average value since boot time. Run the -x as suggested below but run it via a interval of around 5 seconds. You very well could be having i/o issue, it is hard to tell from the overall average value you provided. Collect "iostat -x 5" during the times when you see slow r

Re: Using Cassandra to store files

2011-02-03 Thread Daniel Doubleday
Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure. We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total. Generally the problem is that much data usually means that cassandra beco

Re: Do supercolumns have a purpose?

2011-02-03 Thread Ryan King
On Thu, Feb 3, 2011 at 6:49 AM, Jonathan Ellis wrote: > On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote: >> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: >>> >>> The advantage would be to enable secondary indexes on supercolumn >>> families. >> >> Then I suggest opening a ticket

Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-03 Thread Aklin_81
Hi all, To generate new keys/ UserIds for new users on my application, I am thinking of using a simple synchronized counter that can keep track of the no. of users registered on my application and when a new user signs up, he can be allotted the next available id. Since Cassandra is eventually co

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-03 Thread Matthew E. Kennedy
Unless you need your user identifiers to be sequential for some reason, I would save yourself the headache of this kind of complexity and just use UUIDs if you have to generate an identifier. On Feb 3, 2011, at 2:03 PM, Aklin_81 wrote: > Hi all, > To generate new keys/ UserIds for new users on

Re: performance degradation in cluster

2011-02-03 Thread Nick Santini
Are you using Virtual Machines to run Cassandra? Ive found that performance in VMs is crap Nicolas Santini On Thu, Feb 3, 2011 at 11:17 PM, aaron morton wrote: > This page has a guide to setting the initial tokens for the nodes > http://wiki.apache.org/cassandra/Operations#Ring_management > > <

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-03 Thread Ryan King
You could also consider snowflake: http://github.com/twitter/snowflake which gives you ids that roughly sort by time (but aren't sequential). -ryan On Thu, Feb 3, 2011 at 11:13 AM, Matthew E. Kennedy wrote: > Unless you need your user identifiers to be sequential for some reason, I > would sa

for counters: does read have to be ALL ?

2011-02-03 Thread Yang
the pdf at the design doc https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf does say so: page 2 "- strongly consistent read: requires consistency level ALL. (QUORUM is insufficient.) " but the wiki http://wiki.apache.org/cassandra/Counters gave a code exa

Re: for counters: does read have to be ALL ?

2011-02-03 Thread Anthony John
>From the architecture section of wiki. And it makes sense! More specifically: R=read replica count W=write replica count N=replication factor Q=*QUORUM* (Q = N / 2 + 1) - If W + R > N, you will have consistency - W=1, R=N - W=N, R=1 - W=Q, R=Q where Q = N / 2 + 1 On Thu, Feb 3,

RE: rolling window of data

2011-02-03 Thread Jeffrey Wang
Thanks for the response, but unfortunately a TTL is not enough for us. We would like to be able to dynamically control the window in case there is an unusually large amount of data or something so we don't run out of disk space. One question I have in particular is: if I use the timestamp of my

Re: Do supercolumns have a purpose?

2011-02-03 Thread Mike Malone
On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote: > On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: > >> The advantage would be to enable secondary indexes on supercolumn >> families. >> > > Then I suggest opening a ticket for adding secondary indexes to supercolumn > families and vo

RE: rolling window of data

2011-02-03 Thread Jeffrey Wang
To be a little more clear, a simplified version of what I'm asking is: Let's say you add 1K columns with timestamps 1 to 1000. Then, at an arbitrarily distant point in the future, if you call remove on that CF with timestamp 500 (so the timestamps are logically out of order), will it delete exac

Problems with Python Stress Test

2011-02-03 Thread Sameer Farooqui
Hi guys, I was playing around with the stress.py test this week and noticed a few things. 1) Progress-interval does not always work correctly. I set it to 5 in the example below, but am instead getting varying intervals: *techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ pytho

Re: Problems with Python Stress Test

2011-02-03 Thread Brandon Williams
On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui wrote: > Hi guys, > > I was playing around with the stress.py test this week and noticed a few > things. > > 1) Progress-interval does not always work correctly. I set it to 5 in the > example below, but am instead getting varying intervals: > Gener

Re: Do supercolumns have a purpose?

2011-02-03 Thread Jonathan Ellis
On Thu, Feb 3, 2011 at 3:35 PM, Mike Malone wrote: > It seems to me that super columns are a historical artifact from Cassandra's > early life as Facebook's inbox storage system. They needed posting lists of > messages, sharded by user. So that's what they built. In my dealings with > the Cassandr

Re: rolling window of data

2011-02-03 Thread Jonathan Ellis
On Thu, Feb 3, 2011 at 3:59 PM, Jeffrey Wang wrote: > To be a little more clear, a simplified version of what I'm asking is: > > Let's say you add 1K columns with timestamps 1 to 1000. Then, at an > arbitrarily distant point in the future, if you call remove on that CF with > timestamp 500 (so t

Re: Slow network writes

2011-02-03 Thread Keith Tanaka
Unsubscribe, please. On Feb 2, 2011, at 4:27 PM, buddhasystem wrote: > > Never mind, I found it in SVN... > (not in gz) > > Thanks. > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986949.html > Sent fro

Unavalible Exception

2011-02-03 Thread ruslan usifov
Hello Why i can get Unavalible Exception on live cluster (all nodes is up and never shutdown) PS: v 0.7.0

Re: Slow network writes

2011-02-03 Thread buddhasystem
Dude, are you asking me to unsubscribe? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5991488.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow network writes

2011-02-03 Thread Roshan Dawrani
I think that was originally a voice command - for whoever happened to hear it first :-) On Fri, Feb 4, 2011 at 9:57 AM, buddhasystem wrote: > > Dude, are you asking me to unsubscribe? > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-n

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-03 Thread Aklin_81
Thanks Matthew & Ryan, The main inspiration behind me trying to generate Ids in sequential manner is to reduce the size of the userId, since I am using it for heavy denormalization. UUIDs are 16 bytes long, but I can also have a unique Id in just 4 bytes, and since this is just a one time process

Re: performance degradation in cluster

2011-02-03 Thread Arijit Mukherjee
Hi I'll explain a bit. I'm working with Abhinav. We've an application which was earlier based on Lucene which would index a huge volume of data, and later use the indices to fetch data and perform a fuzzy matching operation. We wanted to use Cassandra primarily because of the sharding/availabilit