Re: Cassandra 1.0

2011-06-16 Thread Terje Marthinussen
There is already so much stuff on the 1.0 branch that I don't think 4 month to feature freeze is a problem. Assuming big stuff like new sstable format will go into 1.0, I am more concerned about the 1 month from freeze to release. Regards, Terje On 17 Jun 2011, at 01:39, Eric Evans wrote: >

Re: [VOTE] Apache Cassandra 0.8.0-beta1

2011-04-19 Thread Terje Marthinussen
Unfortunate as it means beta1 is useless for testing for us, but I guess it does not make much difference if we try 0.8 trunk instead. Terje On Tue, Apr 19, 2011 at 9:46 PM, Jonathan Ellis wrote: > We typically only block betas for major regressions, not bugs that are > already present in a rel

Re: 2GB rows and errros

2011-03-04 Thread Terje Marthinussen
we should probably do additional sanity > checks in the callers, which will have the necessary context to > provide better error messages > > On Fri, Mar 4, 2011 at 1:36 PM, Terje Marthinussen > wrote: > > Hi, > > > > Any good reason this guy > > publ

2GB rows and errros

2011-03-04 Thread Terje Marthinussen
Hi, Any good reason this guy public int bytesPastMark(FileMark mark) { assert mark instanceof BufferedRandomAccessFileMark; long bytes = getFilePointer() - ((BufferedRandomAccessFileMark) mark).pointer; assert bytes >= 0; if (bytes > Integer.MAX_VALUE)

Re: Simple Compression Idea

2011-01-31 Thread Terje Marthinussen
There is a lot of overhead in the serialized data itself (just have a look at a sstable file). It would be great to be able to compress at the byte array level rather than string. Regards, Terje On 1 Feb 2011, at 03:15, "David G. Boney" wrote: > In Cassandra, strings are stored as UTF-8. In

cassandra disk usage

2010-08-30 Thread Terje Marthinussen
Hi, Was just looking at a SSTable file after loading a dataset. The data load has no updates of data but: - Columns can in some rare cases be added to existing super columns - SuperColumns will be added to the same key (but not overwriting existing data). I batch these, but it is quite likely tha

column family names

2010-08-30 Thread Terje Marthinussen
Hi, Now that we can make columns families on the fly, it gets interesting to use column families more as part of the data model (can reduce diskspace quite a bit vs. super columns in some cases). However, currently, the column family name validator is pretty strict allowing only word characters a

Re: Minimizing the impact of compaction on latency and throughput

2010-07-13 Thread Terje Marthinussen
On Tue, Jul 13, 2010 at 10:26 PM, Jonathan Ellis wrote: > > I'm totally fine with saying "Here's a JNI library for Linux [or even > Linux version >= 2.6.X]" since that makes up 99% of our production > deployments, and leaving the remaining 1% with the status quo. > You really need to say Linux >

Re: Minimizing the impact of compaction on latency and throughput

2010-07-13 Thread Terje Marthinussen
> (2) posix_fadvise() feels more obscure and less portable than > O_DIRECT, the latter being well-understood and used by e.g. databases > for a long time. > Due to the need for doing data alignment in the application itself (you are bypassing all the OS magic here), there is really nothing portabl

Re: performance with a "large" number of supercolumns/columns

2010-07-07 Thread Terje Marthinussen
2010 at 8:07 AM, Jonathan Ellis wrote: > Hi Terje, > > Sorry to not get to this sooner. Are you still looking into this? > > On Tue, Jun 22, 2010 at 12:47 PM, Terje Marthinussen > wrote: > > Hi, > > > > I was looking a bit on a case we have with columnfamily which

performance with a "large" number of supercolumns/columns

2010-06-22 Thread Terje Marthinussen
Hi, I was looking a bit on a case we have with columnfamily which has 439k supercolumns, each supercolumn with ~3 columns each so a total of some 1.3 million objects in total. This takes about 9 second to read with thrift on first access, 4-5 second on second access. I took a little closer look