Re: Issue with TimeUUID

2010-03-19 Thread Sylvain Lebresne
Just looked at the code and it indeed just compare the timestamps. I also find it weird and I would be for changing it, but maybe there was a good reason to do it the way it is (even if I don't see one right now). I'll let people give their opinion on that. In the meantime, if you need a quick fix

geo coding, long/lats?

2010-03-19 Thread Joseph Stein
Hi All, has anyone ever done geo coding to find distance based results from storing long/lats with a starting long/lat and variable? If not then any assistance to pointing me into the cassandra source code where i could make this type of customized addition as i only want to get the results for th

Startup issue when big data in.

2010-03-19 Thread Marcin
Hi guys, is there a way to avoid compacting, flushing and all of this thing on startup and perform it while node is running ? It takes a lot of on startup. cheers, /Marcin

Re: Startup issue when big data in.

2010-03-19 Thread Jonathan Ellis
Flush before you kill the process and restart will be much faster. On Fri, Mar 19, 2010 at 9:40 AM, Marcin wrote: > Hi guys, > > is there a way to avoid compacting, flushing and all of this thing on > startup and perform it while node is running ? > > It takes a lot of on startup. > > > cheers, >

Re: Issue with TimeUUID

2010-03-19 Thread Jesse McConnell
imo it is a terrible bug.. the usage of a TimeUUIDType implies that your actually caring about the unique bits outside of a timestamp... currently it's nothing more then LongType ColumnFamily backed by System.currentTimeInMillis() as a source for name columns. jesse -- jesse mcconnell jesse.mcc

Re: Issue with TimeUUID

2010-03-19 Thread Sylvain Lebresne
As said, I agree with that. I've thus created a jira issue (https://issues.apache.org/jira/browse/CASSANDRA-907). The discussion could continue there. On Fri, Mar 19, 2010 at 4:30 PM, Jesse McConnell wrote: > imo it is a terrible bug.. > > the usage of a TimeUUIDType implies that your actually ca

Re: Issue with TimeUUID

2010-03-19 Thread Jesse McConnell
alternately try using LexicalUUIDType, that seems to work jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Fri, Mar 19, 2010 at 11:00, Sylvain Lebresne wrote: > As said, I agree with that. > I've thus created a jira issue > (https://issues.apache.org/jira/browse/CASSANDRA-907). > The dis

Re: Issue with TimeUUID

2010-03-19 Thread John Alessi
Yes, I tried that but then the date does not sort correctly. -- John Alessi SocketLabs, Inc. 484-418-1282 On Mar 19, 2010, at 12:12 PM, Jesse McConnell wrote: > alternately try using LexicalUUIDType, that seems to work > > jesse > > -- > jesse mcconnell > jesse.mcconn...@gmail.com > > > > O

compact and cleanup

2010-03-19 Thread B. Todd Burruss
does "nodetool compact" do a "nodetool cleanup? i just bootstrapped a new node and i want to get the data per node as small as possilble. do i need to run both?

Re: compact and cleanup

2010-03-19 Thread Jonathan Ellis
you just need to run cleanup. On Fri, Mar 19, 2010 at 11:39 AM, B. Todd Burruss wrote: > does "nodetool compact" do a "nodetool cleanup?  i just bootstrapped a new > node and i want to get the data per node as small as possilble.  do i need > to run both? >

Re: Startup issue when big data in.

2010-03-19 Thread Marcin
I have done that before plus compact and it didn't help? (I have been using nodeprobe) cheers, /Marcin Flush before you kill the process and restart will be much faster. On Fri, Mar 19, 2010 at 9:40 AM, Marcin wrote: Hi guys, is there a way to avoid compacting, flushing and all of thi

Re: Startup issue when big data in.

2010-03-19 Thread Jonathan Ellis
You have to wait for the flush to finish, of course. On Fri, Mar 19, 2010 at 11:44 AM, Marcin wrote: > I have done that before plus compact and it didn't help? (I have been using > nodeprobe) > > > cheers, > /Marcin > >> Flush before you kill the process and restart will be much faster. >> >> On

Re: Startup issue when big data in.

2010-03-19 Thread Marcin
Probably that was the reason of course ;-) thanks for pointing it out. cheers, /Marcin You have to wait for the flush to finish, of course. On Fri, Mar 19, 2010 at 11:44 AM, Marcin wrote: I have done that before plus compact and it didn't help? (I have been using nodeprobe) cheers,

Re: geo coding, long/lats?

2010-03-19 Thread Peter Chang
I'd be curious too. My first instinct is to use some sort of bucketizing algorithm by location which would encapsulate entries near each other (similar coordinates). On Fri, Mar 19, 2010 at 7:06 AM, Joseph Stein wrote: > Hi All, has anyone ever done geo coding to find distance based results > fr

Re: geo coding, long/lats?

2010-03-19 Thread Brandon Williams
On Fri, Mar 19, 2010 at 9:06 AM, Joseph Stein wrote: > Hi All, has anyone ever done geo coding to find distance based results > from storing long/lats with a starting long/lat and variable? > > This thread might be helpful: http://n2.nabble.com/Help-Wrap-My-Head-Around-Cassandra-td4657302.html

Re: question about deleting from cassandra

2010-03-19 Thread Tatu Saloranta
On Thu, Mar 18, 2010 at 7:31 AM, Vick Khera wrote: > On Thu, Mar 18, 2010 at 9:15 AM, Bill Au wrote: >> In theory there is a breaking point somewhere, right? > > I don't think google has hit it yet, so I'd have to say nobody has > reached "the breaking point" yet > > What do the big places do

Re: Startup issue when big data in.

2010-03-19 Thread Tatu Saloranta
On Fri, Mar 19, 2010 at 7:40 AM, Marcin wrote: > Hi guys, > > is there a way to avoid compacting, flushing and all of this thing on > startup and perform it while node is running ? > > It takes a lot of on startup. One sort of related question: given that order of insertions has huge effects on s

Re: Startup issue when big data in.

2010-03-19 Thread Jonathan Ellis
On Fri, Mar 19, 2010 at 12:52 PM, Tatu Saloranta wrote: > One sort of related question: given that order of insertions has huge > effects on some stores, like BDB (where inserting in key order is 10x > faster than arbitrary order), would insertion order possibly have > significant effect on Cassan

Re: Startup issue when big data in.

2010-03-19 Thread Tatu Saloranta
On Fri, Mar 19, 2010 at 10:56 AM, Jonathan Ellis wrote: > On Fri, Mar 19, 2010 at 12:52 PM, Tatu Saloranta wrote: >> One sort of related question: given that order of insertions has huge >> effects on some stores, like BDB (where inserting in key order is 10x >> faster than arbitrary order), woul

Re: Startup issue when big data in.

2010-03-19 Thread Stu Hood
All write patterns should provide the same performance with Cassandra, since all writes to disk occur sequentially. The only variance might be in the data structure used for the Memtable (a concurrent skip list), but I expect that it is quite stable. See http://www.mikeperham.com/2010/03/13/cas

Re: geo coding, long/lats?

2010-03-19 Thread Aanand Prasad
I've implemented a basic geospatial search against a Cassandra dataset by keeping a column family of items indexed by geohash ( http://en.wikipedia.org/wiki/Geohash). Essentially, to search for items within a given area, you calculate a geohash that covers the entire area (but is still as specific

Re: Startup issue when big data in.

2010-03-19 Thread Tatu Saloranta
On Fri, Mar 19, 2010 at 11:25 AM, Stu Hood wrote: > All write patterns should provide the same performance with Cassandra, since > all writes to disk occur sequentially. Ok that makes sense. > The only variance might be in the data structure used for the Memtable (a > concurrent skip list), bu

Digg's data model

2010-03-19 Thread Gary
I am a newbie to bigtable like model and have a question as follows. Take Digg as an example, I want to find a list users who dug a URL and also want to find a list of URLs a user dug. How should the data model look like for the queries to be efficient? If I use the username and the URL for two row

Re: Digg's data model

2010-03-19 Thread Joe Stump
On Mar 19, 2010, at 1:16 PM, Gary wrote: > I am a newbie to bigtable like model and have a question as follows. Take > Digg as an example, I want to find a list users who dug a URL and also want > to find a list of URLs a user dug. How should the data model look like for > the queries to be ef

Re: Digg's data model

2010-03-19 Thread David Strauss
On 2010-03-19 19:16, Gary wrote: > I am a newbie to bigtable like model and have a question as follows. > Take Digg as an example, I want to find a list users who dug a URL and > also want to find a list of URLs a user dug. How should the data model > look like for the queries to be efficient? If I

Re: Digg's data model

2010-03-19 Thread Nathan McCall
Gary, Did you see this larticle linked from the Cassandra wiki? http://about.digg.com/node/564 See http://wiki.apache.org/cassandra/ArticlesAndPresentations for more examples like the above. In general, you structure your data according to how it will be queried. This can lead to duplication, but

Re: Digg's data model

2010-03-19 Thread Jonathan Ellis
Jeff Hodsdon edited the new link in: http://about.digg.com/blog/looking-future-cassandra On Fri, Mar 19, 2010 at 2:49 PM, Nathan McCall wrote: > Gary, > Did you see this larticle linked from the Cassandra wiki? > http://about.digg.com/node/564 > > See http://wiki.apache.org/cassandra/ArticlesAndP

updates on hector, a java cassandra client

2010-03-19 Thread Ran Tavory
Hector is a java client for cassandra, see http://github.com/rantav/hector , http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/ , http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/ Over the past few weeks several contributors and myself added features a