If a user has millions of followers, is there millions of iterate? (ref Twissandra)

2010-04-15 Thread Allen He
Hello folks, When Twissandra (Twitter clone example for Cassandra) post a tweet, it iterate all of the followers to insert a tweet_id to their time lines(see highlight): def save_tweet(tweet_id, user_id, tweet): """ Saves the tweet record. """ # Generate a

Re: Time-series data model

2010-04-15 Thread Ilya Maykov
Hi Jean-Pierre, I'm investigating using Cassandra for a very similar use case, maybe we can chat and compare notes sometime. But basically, I think you want to pull the metric name into the row key and use simple CF instead of SCF. So, your example: "my_server_1": { "cpu_usage": {

Row key: string or binary (byte[])?

2010-04-15 Thread Roland Hänel
Is there any effort ongoing to make the row key a binary (byte[]) instead of a string? In the current cassandra.thrift file (0.6.0), I find: const string VERSION = "2.1.0" [...] struct KeySlice { 1: required *string* key, 2: required list columns, } while on the current (?) SVN https://sv

Re: If a user has millions of followers, is there millions of iterate? (ref Twissandra)

2010-04-15 Thread gabriele renzi
On Thu, Apr 15, 2010 at 9:56 AM, Allen He wrote: > Hello folks, > > When Twissandra (Twitter clone example for Cassandra) post a tweet, it > iterate all of the followers to insert a tweet_id to their time lines(see > for follower_id in follower_ids: > TIMELINE.insert(str(follower_id)

AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Ran Tavory
When restarting one of the nodes in my cluster I found this error in the log. What does this mean? INFO [GC inspection] 2010-04-15 05:03:04,898 GCInspector.java (line 110) GC for ConcurrentMarkSweep: 712 ms, 11149016 reclaimed leaving 442336680 used; max is 4432068608 ERROR [HINTED-HANDOFF-POOL:1

Re: Time-series data model

2010-04-15 Thread Jean-Pierre Bergamin
Am 14.04.2010 15:22, schrieb Ted Zlatanov: On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin" wrote: JB> The metrics are stored together with a timestamp. The queries we want to JB> perform are: JB> * The last value of a specific metric of a device JB> * The values of a specific m

inserting rows in columns inside a supercolumn

2010-04-15 Thread Julio Carlos Barrera Juez
Hi all, I'm working with Cassandra 0.5 and Thrift API. I have a simple doubt: I want to insert a row in columns inside a supercolumn, like this (without timestamps): SuperColumnNameA ==> keyA valueA ==> columnB ==> key1 value1 ==> key2 value2 ==> key3 value3

Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Gary Dusbabek
Ran, It looks like you're seeing https://issues.apache.org/jira/browse/CASSANDRA-866. It's fixed in 0.6.1. Gary On Thu, Apr 15, 2010 at 04:06, Ran Tavory wrote: > When restarting one of the nodes in my cluster I found this error in the > log. What does this mean? > >  INFO [GC inspection] 2010

Re: Row key: string or binary (byte[])?

2010-04-15 Thread Gary Dusbabek
2010/4/15 Roland Hänel : > Is there any effort ongoing to make the row key a binary (byte[]) instead of > a string? Yes. It went into trunk last night. Please see https://issues.apache.org/jira/browse/CASSANDRA-767. Gary. > In the current cassandra.thrift file (0.6.0), I find: > > const string V

How to implement TOP TEN in Cassandra

2010-04-15 Thread Allen He
Hi , all How to implement *TOP TEN* in Cassandra, For example , *Top ten stories in Digg.com* How to model. Thanks

Get super-columns using SimpleCassie

2010-04-15 Thread Yésica Rey
I'm using SimpleCassie like cassandra client. I have a question: can I get all super-columns that there in one column-family? If yes, how can i do it? Regards!

Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160

2010-04-15 Thread Jonathan Ellis
sounds like https://issues.apache.org/jira/browse/THRIFT-347 On Wed, Apr 14, 2010 at 11:58 PM, richard yao wrote: > I am having a try on cassandra, and I use php to access cassandra by thrift > API. > I got an error like this: >     TException:  Error: TSocket: timed out reading 1024 bytes from >

Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160

2010-04-15 Thread richard yao
Thank you!

Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Ran Tavory
yes, this looks like the same issue, thanks Gary. Other than seeing the errors in the log I haven't seen any other irregularities. (maybe there are, but they haven't surfaced). Does this assertion mean data corruption or something else that's worth waiting to 0.6.1 for? On Thu, Apr 15, 2010 at 2:

Re: AssertionError: DecoratedKey(...) != DecoratedKey(...)

2010-04-15 Thread Gary Dusbabek
No data corruption. There was a bug in the way that the index was scanned that was manifesting itself when when the index got bigger than 2GB. Gary. On Thu, Apr 15, 2010 at 08:03, Ran Tavory wrote: > yes, this looks like the same issue, thanks Gary. > Other than seeing the errors in the log I

Re: Time-series data model

2010-04-15 Thread Ted Zlatanov
On Thu, 15 Apr 2010 11:27:47 +0200 Jean-Pierre Bergamin wrote: JB> Am 14.04.2010 15:22, schrieb Ted Zlatanov: >> On Wed, 14 Apr 2010 15:02:29 +0200 "Jean-Pierre Bergamin" >> wrote: >> JB> The metrics are stored together with a timestamp. The queries we want to JB> perform are: JB> * The last

Re: SuperColumns

2010-04-15 Thread Ted Zlatanov
On Wed, 14 Apr 2010 23:34:52 -0700 Vijay wrote: V> On Wed, Apr 14, 2010 at 10:28 PM, Christian Torres wrote: >> I'm defining a ColumnFamily (Table) type Super, It's posible to have a >> SuperColumn inside another SuperColumn or SuperColumns can only have normal >> columns? V> Yes a super colum

Re: SuperColumns

2010-04-15 Thread Christian Torres
Ok, thanks both 2010/4/15 Ted Zlatanov > On Wed, 14 Apr 2010 23:34:52 -0700 Vijay wrote: > > V> On Wed, Apr 14, 2010 at 10:28 PM, Christian Torres >wrote: > > >> I'm defining a ColumnFamily (Table) type Super, It's posible to have a > >> SuperColumn inside another SuperColumn or SuperColumns c

Re: How to implement TOP TEN in Cassandra

2010-04-15 Thread Pablo Viojo
Your question is too general to give you an appropiate answer. Can you elaborate it a little more? Regards, Pablo Viojo Project lead on data storage et al. http://www.needish.com | pa...@needish.com http://twitter.com/tiopaul (@tiopaul) | LinkedIn profile: http://cl.linkedin.com/in/pviojo On T

Re: How to implement TOP TEN in Cassandra

2010-04-15 Thread Jesse McConnell
http://arin.me/code/wtf-is-a-supercolumn-cassandra-data-model if memory serves that article explains it cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Thu, Apr 15, 2010 at 09:36, Pablo Viojo wrote: > Your question is too general to give you an appropiate answer. Can you > elab

framed transport

2010-04-15 Thread Lee Parker
What is the benefit of moving to framed transport as opposed to buffered transport? Lee Parker l...@spredfast.com [image: Spredfast]

RackAware and replication strategy

2010-04-15 Thread Ran Tavory
I'm reading this on this page http://wiki.apache.org/cassandra/ArchitectureInternals : AbstractReplicationStrategy controls what nodes get secondary, tertiary, > etc. replicas of each key range. Primary replica is always determined by the > token ring (in TokenMetadata) but you can do a lot of va

Re: framed transport

2010-04-15 Thread Eric Evans
On Thu, 2010-04-15 at 09:49 -0500, Lee Parker wrote: > What is the benefit of moving to framed transport as opposed to > buffered transport? The framed transport probably provides better (smarter) buffering, but it was added in Thrift to support asynchronous servers. In a perfect world, there wou

Re: framed transport

2010-04-15 Thread Miguel Verde
On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans wrote: > But, if you've enabled framing on the server, you will not > be able to use C# clients (last I checked, there was no framed transport > for C#). There *are* many clients that don't have framed transports, but the C# client had it added in No

timestamp not found

2010-04-15 Thread Lee Parker
We are currently migrating about 70G of data from mysql to cassandra. I am occasionally getting the following error: Required field 'timestamp' was not found in serialized data! Struct: Column(name:74 65 78 74, value:44 61 73 20 6C 69 65 62 20 69 63 68 20 76 6F 6E 20 23 49 6E 61 3A 20 68 74 74 70

Re: inserting rows in columns inside a supercolumn

2010-04-15 Thread Miguel Verde
Just to nitpick your representation a little bit, columnB/etc... are supercolumnB/etc..., key1/etc... are column1/etc..., and you can probably omit valueA/valueD designations entirely, it would still be understood. Columns in Cassandra always have timestamps, you can't omit them. Can you post a s

Re: timestamp not found

2010-04-15 Thread Mike Malone
Looks like the timestamp, in this case, is 0. Does Cassandra allow zero timestamps? Could be a bug in Cassandra doing an implicit boolean coercion in a conditional where it shouldn't. Mike On Thu, Apr 15, 2010 at 8:39 AM, Lee Parker wrote: > We are currently migrating about 70G of data from mys

Re: timestamp not found

2010-04-15 Thread Lee Parker
When I am verifying the columns in the mutation map before sending it to cassandra, none of the timestamps are 0. I have had a difficult time recreating the error in a controlled environment so I can see the mutation map that was actually sent. Lee Parker l...@spredfast.com [image: Spredfast] On

Re: timestamp not found

2010-04-15 Thread Jonathan Ellis
Looks like you are using C++ and not setting the "isset" flag on the timestamp field, so it's getting the default value for a Java long ("0"). If it works "most of the time" then possibly you are using a Thrift connection from multiple threads at the same time, which is not safe. On Thu, Apr 15,

Re: timestamp not found

2010-04-15 Thread Lee Parker
I'm actually using PHP. I do have several php processes running, but each one should have it's own Thrift connection. Lee Parker l...@spredfast.com [image: Spredfast] On Thu, Apr 15, 2010 at 10:53 AM, Jonathan Ellis wrote: > Looks like you are using C++ and not setting the "isset" flag on the

Re: RackAware and replication strategy

2010-04-15 Thread Benjamin Black
Have a look at locator/DatacenterShardStrategy.java. On Thu, Apr 15, 2010 at 8:16 AM, Ran Tavory wrote: > I'm reading this on this > page http://wiki.apache.org/cassandra/ArchitectureInternals : >> >> AbstractReplicationStrategy controls what nodes get secondary, tertiary, >> etc. replicas of eac

busy thread on IncomingStreamReader ?

2010-04-15 Thread Ingram Chen
Hi all, We setup two nodes and simply set replication factor=2 for test run. After both nodes, say, node A and node B, serve several hours, we found that "node A" always keep 300% cpu usage. (the other node is under 100% cpu, which is normal) thread dump on "node A" shows that there are 3 busy

Re: BMT flush on windows?

2010-04-15 Thread Sonny Heer
>From the jconsole, I go under ColumnFamilyStores->CF1->Column1->Operations and clicked force flush(). I'm getting a "Operation return value" null OK message box. what am I doing wrong? On Tue, Apr 13, 2010 at 3:12 PM, Jonathan Ellis wrote: > you have three options > > (a) connect with jconsol

Re: BMT flush on windows?

2010-04-15 Thread Jonathan Ellis
probably because there is nothing to flush. On Thu, Apr 15, 2010 at 11:53 AM, Sonny Heer wrote: > From the jconsole, I go under > ColumnFamilyStores->CF1->Column1->Operations and clicked force > flush(). > > I'm getting a "Operation return value" null OK message box.  what am I > doing wrong? > >

Re: Recovery from botched compaction

2010-04-15 Thread Jonathan Ellis
On Tue, Apr 13, 2010 at 3:59 PM, Anthony Molinaro wrote: > I actually got lucky and while it hovered in the 91-95% full, compaction > finished and its now at 60%.  However, I still have around a dozen or so > data files.  I thought 'nodeprobe compact' did a major compaction, and > that a major com

Re: batch_mutate silently failing

2010-04-15 Thread Jonathan Ellis
Could you create a ticket for us to return an error message in this situation? -Jonathan On Tue, Apr 13, 2010 at 4:24 PM, Lee Parker wrote: > nevermind. I figured out what the problem was. I was not putting the > column inside a ColumnOrSuperColumn container. > > > Lee Parker > l...@spredfast

Re: batch_mutate silently failing

2010-04-15 Thread Lee Parker
The entire thing was completely my own fault. I was making an invalid request and, somewhere in the code, I was catching the exception and not handling it at all. So it only appeared to be silent when in reality it was throwing a nice descriptive exception. Lee Parker l...@spredfast.com [image:

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Jonathan Ellis
You're right, to get those numbers on debian something is very wrong. Have you looked at http://spyced.blogspot.com/2010/01/linux-performance-basics.html ? What is the bottleneck on the linux machines? With the kind of speed you are seeing I wouldn't be surprised if it is swapping. -Jonathan On

Re: batch_mutate silently failing

2010-04-15 Thread Jonathan Ellis
Ah, I see. Glad you resolved that. :) On Thu, Apr 15, 2010 at 12:31 PM, Lee Parker wrote: > The entire thing was completely my own fault. I was making an invalid > request and, somewhere in the code, I was catching the exception and not > handling it at all. So it only appeared to be silent w

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Heath Oderman
Thanks Jonathan, I'll check this out right away. On Thu, Apr 15, 2010 at 1:32 PM, Jonathan Ellis wrote: > You're right, to get those numbers on debian something is very wrong. > > Have you looked at > http://spyced.blogspot.com/2010/01/linux-performance-basics.html ? > What is the bottleneck on

Re: server crash - how to invertigate

2010-04-15 Thread Jonathan Ellis
There's a few things it could be: Out of memory: usually it can log the exception before dying but not always. there will be a java_$pid.hprof file with the heap dumped. JVM crash: there will be hs_err$pid.log file OS bug or hardware problem: sometimes your OS will log something -Jonathan On

Re: Reading thousands of columns

2010-04-15 Thread Jonathan Ellis
How long to read just 10 columns? On Wed, Apr 14, 2010 at 3:19 PM, James Golick wrote: > The values are empty. It's 3000 UUIDs. > > On Wed, Apr 14, 2010 at 12:40 PM, Avinash Lakshman > wrote: >> >> How large are the values? How much data on disk? >> >> On Wednesday, April 14, 2010, James Golick

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Heath Oderman
So checking it out quickly: vmstat - Never swaps. si and so stay at 0 during the load. iostat -x the %util never climbs above 0.00, but the avgrg-sz jumps bewteen samples from 0 - 30 - 90 - 0 (5 second intervals) top shows the cpu barely working and mem utilization is below 20%. Still slow.

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Jonathan Ellis
What kind of numbers do you get from contrib/py_stress? (that's located somewhere else in 0.5, but you should really be using 0.6 anyway.) On Thu, Apr 15, 2010 at 12:53 PM, Heath Oderman wrote: > So checking it out quickly: > vmstat - > Never swaps.  si and so  stay at 0 during the load. > iosta

Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-15 Thread Heath Oderman
I upgraded to 0.6 yesterday and it's bang on the same. I'll go read up on py_stress and give it a try. On Thu, Apr 15, 2010 at 1:57 PM, Jonathan Ellis wrote: > What kind of numbers do you get from contrib/py_stress? > > (that's located somewhere else in 0.5, but you should really be using > 0.6

Re: Time-series data model

2010-04-15 Thread Dan Di Spaltro
This is actually fairly similar to how we store metrics at Cloudkick. Below has a much more in depth explanation of some of that https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ So we store each natural point in the NumericArchive table. our keys look like: . Anyway

Re: framed transport

2010-04-15 Thread Nathan McCall
FWIW, We just exposed this as an option in hector. -Nate On Thu, Apr 15, 2010 at 8:38 AM, Miguel Verde wrote: > On Thu, Apr 15, 2010 at 10:22 AM, Eric Evans wrote: >> >> But, if you've enabled framing on the server, you will not >> be able to use C# clients (last I checked, there was no framed

Re: BMT flush on windows?

2010-04-15 Thread Sonny Heer
Hmmm. Same code runs on ubuntu, and I'm able to flush using the nodetool. What is the difference between inserting data using : StorageProxy.mutateBlocking vs. sending oneway message using the MessagingService? On Thu, Apr 15, 2010 at 10:14 AM, Jonathan Ellis wrote: > probably because there is n

Re: BMT flush on windows?

2010-04-15 Thread Sonny Heer
If I use Storage.mutateBlocking, and hit force flush from jconsole, it flushes but with this error message: "Problem invoking forceFlush: java.rmi.UnmarshalExecption: error unmarshalling return; nested exception is: java.io.WriteAbortedException: writing aborted; java.io.NotSerializableException:

Re: timestamp not found

2010-04-15 Thread Lee Parker
I have done more error checking and I am relatively certain that I am sending a valid timestamp to the thrift library. I was testing a switch to the Framed Transport instead of Buffered Transport and I am getting fewer errors, but now the cassandra server dies when this happens. It is starting to

json2sstable

2010-04-15 Thread Lee Parker
Has anyone used json2sstable to migrate a large amount of data into cassandra? What was your methodology? I assume that this will be much faster than stepping through my data and doing writes via PHP/Thrift. Lee Parker

Re: framed transport

2010-04-15 Thread Lee Parker
It appears that after some testing, the buffered transport seems more stable. I am occasionally getting a missing timestamp error during batch_mutate calls. It happens both on framed and buffered transports, but when it happens on a framed transport, the server crashes. Is this typical? Lee Par

Re: framed transport

2010-04-15 Thread Jonathan Ellis
Have you tried other client machines? It sounds like your client is generating garbage, which is Bad. https://issues.apache.org/jira/browse/THRIFT-601 On Thu, Apr 15, 2010 at 4:20 PM, Lee Parker wrote: > It appears that after some testing, the buffered transport seems more > stable.  I am occas

Data model question - column names sort

2010-04-15 Thread Sonny Heer
Need a way to have two different types of indexes. Key: aTextKey ColumnName: aTextColumnName:55 Value: "" Key: aTextKey ColumnName: 55:aTextColumnName Value: "" All the valuable information is stored in the column name itself. Above two can be in different column families... Queries: Given a ke

Clarification on Ring operations in Cassandra 0.5.1

2010-04-15 Thread Anthony Molinaro
Hi, I have a cluster running on ec2, and would like to do some ring management. Specifically, I'd like to replace an existing node without another node (I want to change the instance type). I was looking over http://wiki.apache.org/cassandra/Operations and it seems like I could do something

Re: Lucandra or some way to query

2010-04-15 Thread malsmith
We looking into migrating from a replicated solr infrastructure to some form of clustered approach. Lucandra looks fantastic -- but this statement is troubling: "No normalizations are stored (no scoring)" from http://github.com/tjake/Lucandra When I use the demo/samples get do get a relevance s

Re: Is that possible to write a file system over Cassandra?

2010-04-15 Thread Jeff Zhang
Jonathan, Previously we use the cassandra-0.6, but we'd like to leverage the hector java client since it has more advanced features. And hector currently only support cassandra-0.5. Why you think using casandra-0.5 is a stange way to do it ? Is cassandra-0.6 incompatibility with cassandra-0.5 ? Th

Is it possible to get all records in a CF?

2010-04-15 Thread Jared Laprise
If you do not have the key for SuperColumn in a ColumnFamily is it not possible to browse all the data in the ColumnFamily? Thus far I've only been able to find a way to pull out data if I know the key.

Re: Is it possible to get all records in a CF?

2010-04-15 Thread Gary Dusbabek
You'll have to scan the CF. If you're using OrderPreservingPartitioner please see 'get_range_slices' (http://wiki.apache.org/cassandra/API). It would help if you had an idea of where the key might be, so you would know where to start scanning. Gary. On Thu, Apr 15, 2010 at 21:01, Jared Laprise

Re: frequent "unknown result" errors

2010-04-15 Thread Michael Pearson
Lee, I dropped (official) 0.5 support from Pandra yesterday and committed 0.6 Thrift files, if you're still considering that upgrade... worth a shot imo. -michael On Tue, Apr 13, 2010 at 7:19 AM, Lee Parker wrote: > So, it didn't get rid of the problem, i'm still getting the errors.  The > only

Re: json2sstable

2010-04-15 Thread 孔令华
I tried that and found that it cannot handle large file at present. But you can write a tool according to it. eg: first sorting your data file according to it's hash key; second, write to a SSTable directly On Fri, Apr 16, 2010 at 4:47 AM, Lee Parker wrote: > Has anyone used json2sstable to migr

Re: json2sstable

2010-04-15 Thread Brandon Williams
On Thu, Apr 15, 2010 at 3:47 PM, Lee Parker wrote: > Has anyone used json2sstable to migrate a large amount of data into > cassandra? What was your methodology? I assume that this will be much > faster than stepping through my data and doing writes via PHP/Thrift. If you're looking to do a bu

Re: Is that possible to write a file system over Cassandra?

2010-04-15 Thread Nathan McCall
In regards to hector, please check all the available branches on github. We have supported 0.6 for a little while now. http://github.com/rantav/hector/tree/0.6.0 The master is still based on 0.5, but that is changing in the next couple of days to match the 0.6 release. -Nate On Thu, Apr 15,

Re: Is that possible to write a file system over Cassandra?

2010-04-15 Thread Jeff Zhang
Thanks, Nathan. On Fri, Apr 16, 2010 at 12:04 PM, Nathan McCall wrote: > In regards to hector, please check all the available branches on > github. We have supported 0.6 for a little while now. > > http://github.com/rantav/hector/tree/0.6.0 > > The master is still based on 0.5, but that is chan

Regarding Cassandra Scalability

2010-04-15 Thread Linton N
hi , I am working for the past 1 year with hadoop, but quite new to cassandra, I would like to get clarified few things regarding the scalability of Cassandra. Can it scall up to TB of data ? Please provide me some links regarding this.. -- -- With Love Lin N