Re: Is SuperColumn necessary?

2010-05-06 Thread Torsten Curdt
+1 on all of that On Thu, May 6, 2010 at 09:04, David Boxenhorn wrote: > That would be a good time to get rid of the confusing "column" term, which > incorrectly suggests a two-dimensional tabular structure. > > Suggestions: > > 1. A hypercube (or hypocube, if only two dimensions): replace "key"

Re: Is SuperColumn necessary?

2010-05-11 Thread Torsten Curdt
Exactly. On Tue, May 11, 2010 at 10:20, David Boxenhorn wrote: > Don't think of it as getting rid of supercolum. Think of it as adding > superdupercolums, supertriplecolums, etc. Or, in sparse array terminology: > array[dim1][dim2][dim3].[dimN] = value > > Or, as said above: > >    Type="UTF8

key path vs super column

2010-05-19 Thread Torsten Curdt
We are currently working on a prototype that is using Cassandra for realtime-ish statistics system. This seems to be quite a common use case. If people are interested - maybe it be worth collaborating on this beyond design discussions on the list. But first let's me explain our approach and where w

sorting by column value

2010-05-31 Thread Torsten Curdt
Is it possible to have columns in a super column sorted by value rather than name? I assume not but I thought I ask anyway. What I would love to do is something along the lines of /user//country/DE += 1 and then get the sorted result of "/user//country" cheers -- Torsten

Re: Range search on keys not working?

2010-06-02 Thread Torsten Curdt
Sounds like you are not using an order preserving partitioner? On Wed, Jun 2, 2010 at 13:48, David Boxenhorn wrote: > Range search on keys is not working for me. I was assured in earlier threads > that range search would work, but the results would not be ordered. > > I'm trying to get all the ro

Re: Continuously increasing RAM usage

2010-06-02 Thread Torsten Curdt
We've also seen something like this. Will soon investigate and try again with 0.6.2 On Wed, Jun 2, 2010 at 20:27, Paul Brown wrote: > > FWIW, I'm seeing similar issues on a cluster.  Three nodes, Cassandra 0.6.1, > SUN JDK 1.6.0_b20.  I will try to get some heap dumps to see what's building > u

ColumnFamilyInputFormat with super columns

2010-06-02 Thread Torsten Curdt
I have a super column along he lines of => { => { att: value }} Now I would like to process a set of rows [from_time..until_time] with Hadoop. I've setup the hadoop job like this job.setInputFormatClass(ColumnFamilyInputFormat.class); ConfigHelper.setColumnFamil

Re: Are 6..8 seconds to read 23.000 small rows - as it should be?

2010-06-04 Thread Torsten Curdt
> Yes, I know. And I might end up doing this in the end. I do though have > pretty hard upper limits of how many rows I will end up with for each key, > but anyways it might be a good idea none the less. Thanks for the advice on > that one. You set count to Integer.MAX. Did you try with say 300

Re: Too many ParNew's

2010-06-09 Thread Torsten Curdt
As promised on IIRC we also have collected some information as we are seeing (probably) the same problem. https://issues.apache.org/jira/browse/CASSANDRA-1177 On Wed, Jun 9, 2010 at 14:11, aaron morton wrote: > May be related to CASSANDRA-1014 > https://issues.apache.org/jira/browse/CASSANDRA-10

Re: Beginner Assumptions

2010-06-13 Thread Torsten Curdt
> Anyways, I want to store some (alot of) Time Series data in Cassandra > and would like to check if my assumptions are correct so far. So if > someone with operational experience could confirm these I'd really > appreciate it. > > Basically the structure I'm going for right now looks like this: >

Re: GC Storm

2010-06-13 Thread Torsten Curdt
> If you were just inserting a lot of data fast, it may be that > background compaction was unable to keep up with the insertion rate. > Simply leaving the node(s) for a while after the insert storm will let > it catch up with compaction. > > (At least this was the behavior for me on a recent trunk

Re: Pelops - a new Java client library paradigm

2010-06-14 Thread Torsten Curdt
Also think this looks really promising. The fact that there are so many API wrappers now (3?) doesn't reflect well on the native API though :) /me ducks and runs On Mon, Jun 14, 2010 at 11:55, Dominic Williams wrote: > Hi Ran, thanks for the compliment. It is true that we benefited enormously >

Re: Beginner Assumptions

2010-06-14 Thread Torsten Curdt
>> >> TBH while we are using super columns, the somehow feel wrong to me. I >> would be happier if we could move what we do with super columns into >> the row key space. But in our case that does not seem to be so easy. >> >> > > I'd be quite interested to learn what you are doing with super colu

bulk loading

2010-06-20 Thread Torsten Curdt
I am trying to get the bulk loading example to work for simple CF. List columnFamilies = new LinkedList(); while(...) { String[] fields = ... ColumnFamily columnFamily = ColumnFamily.create(keyspace, family); long now = System.currentTimeMillis(); for (i

Re: bulk loading

2010-06-21 Thread Torsten Curdt
> You should be using the thrift API, or a wrapper around the thrift API. It > looks like you're using internal cassandra classes. The goal is to get around using the overhead of the Thrift API for a bulk import. > There is a Java wrapper called Hector, and there was another talked about on > t

[OT] Re: unsubscribe

2010-06-22 Thread Torsten Curdt
Hey Dean ...and everyone else not managing to unsubscribe (and sending mails to the list instead): If you don't know how to unsubscribe you can always look at the List-Unsubscribe: header of any of the list emails. These days most of the time you will find that an "-unsubscribe" suffix is used

Re: bulk loading

2010-06-22 Thread Torsten Curdt
I looked at the thrift service implementation and got it working. (Much faster import!) Thanks! On Mon, Jun 21, 2010 at 13:09, Oleg Anastasjev wrote: > Torsten Curdt vafer.org> writes: > >> >> First I tried with my one "cassandra -f" instance then I saw this

Re: CassandraBulkLoader

2010-07-13 Thread Torsten Curdt
On Tue, Jul 13, 2010 at 04:35, Mubarak Seyed wrote: > Where can i find the documentation for BinaryMemTable (btm_example in contrib) > to use CassandraBulkLoader? What is the input to be supplied to > CassandraBulkLoader? > How to form the input data and what is the format of an input data? The

Re: CassandraBulkLoader

2010-07-13 Thread Torsten Curdt
> look at contrib/bmt_example, with the caveat that it's usually > premature optimization I wish that was true for us :) >> Fact: It has always been straightforward to send the output of Hadoop jobs >> to Cassandra, and Facebook, Digg, and others have been using Hadoop like >> this as a Cassandra

Re: CassandraBulkLoader

2010-07-15 Thread Torsten Curdt
> If you could can you please share the command line function (to load TSV)? There is no command line function ... you have to write code for this. > and Can you please help me on storing storage-conf.xml on HDFS part? As I said. Maybe you better start with a simpler scenario and leave out HDFS

Re: CassandraBulkLoader

2010-07-19 Thread Torsten Curdt
> When i run bmt_example, M/R job gets executed, cassandra server  gets the > data but it goes as HintedHandoff to 127.0.0.2 and it is trying to send data > to 127.0.0.2 as if 127.0.0.2 is an actual node. Well, it kind of becomes an actual node. > Any idea, why does StorageService > returns 127.0