Roger, if you include the last read key as the start key for the next API
call, will that retrieve the same key/row twice?
The documentation says that both keys (start, finish) are included.
Thanks
On Thu, Apr 29, 2010 at 1:31 PM, Brandon Williams wrote:
> On Thu, Apr 29, 2010 at 10:19 AM, Davi
Can you create a ticket?
On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk wrote:
> There's a bug in ColumnFamilyRecordReader that appears when processing
> a single split. When the start and end tokens of the split are equal,
> duplicate rows can be returned.
>
> Example with 5 rows:
> token (st
try specify the InitialToken.
In your cluster, set the token to i*(2**127/6), i = [1,6]. It will helps.
On Sat, May 1, 2010 at 8:03 AM, elsif wrote:
> I upgraded to 0.6.1 and was able to bring up all the nodes and make
> queries.
>
> After adding some new data, the java vm ran out of memory on t
I upgraded to 0.6.1 and was able to bring up all the nodes and make
queries.
After adding some new data, the java vm ran out of memory on three of
the nodes. Cassandra continues to run for about 20 minutes before it
exits completely:
DEBUG [ROW-MUTATION-STAGE:2] 2010-04-30 16:02:27,298
RowMutati
There's a bug in ColumnFamilyRecordReader that appears when processing
a single split. When the start and end tokens of the split are equal,
duplicate rows can be returned.
Example with 5 rows:
token (start and end) = 53193025635115934196771903670925341736
Tokens returned by first get_range_slic
On 4/30/10 6:36 AM, Jonathan Ellis wrote:
each row has a [column] index and bloom filter of column names, and then there
is the overhead of the java objects.
In addition to the aforementioned row column index, there's also the row
key index, which is an int and a key-length-(string now/byte[]
On 4/30/10 4:47 AM, Douglas Santos wrote:
Hi all,
We are writing an article for a magazine and would like to write about
monitoring, more precisely on the nodetools, but did not find many
things about the tool.
I want to help or a brief explanation about nodetools commands ...
Available command
On 4/30/10 5:21 AM, Bingbing Liu wrote:
> hi,
> thanks for your help.
> i run the nodetool -h compact
> but the load keep the same , is there anyone can tell me why?
"compact" and "cleanup" are two different operations. "compact" does a
major compaction. "cleanup" is a superset of "compact" w
I have ever modify the code to set INDEX_INTERVAL = 512, to decrease the
memory usage. And it seems working fine.
Is it right?
2010/4/30 casablinca126.com
> hi,
>It seems changing the INDEX_INTERVAL with conflict with
> AntiEntropyService, right?
>I will reconstruct my sstables.
On Fri, Apr 30, 2010 at 03:58:09PM +0200, Zubair Quraishi wrote:
> %
> % set second property ( fails! - why? )
> %
> MutationMap =
> {
>Key,
>{
> <<"KeyValue">>,
> [
>#mutation{
> column_or_supercolumn = #column{ name = "property" , value =
> "value" , times
Hi,
I've checked two similar scenarios and one of them seem to be more
performant. So timestamped data is being appended, the first use case is
with an OPP and new rows being created every with only one column (there are
about 7-8 CFs). The second cases is to have rows with more columns and
Rand
> [r...@calculus apache-cassandra-0.6.1]# bin/cassandra -f
> Can't start up: not enough memory
My guess is that you don't have enough memory
Great, thanks for testing that.
On Fri, Apr 30, 2010 at 11:45 AM, Daniel Gimenez wrote:
>
> The code is working now without memory leaks using your patch in the 0.6.2. I
> have done more than 100M without problems until now...
>
> Thanks!
> Daniel Gimenez.
> --
> View this message in context:
>
like I told you on the other list, erlang or the erlang thrift
compiler is not exposing the error the cassandra server is sending
you. "bad_return_value" is not it.
Unless someone with actual erlang experience chimes in here, I would
suggest trying with Python first, at least that will show you t
The code is working now without memory leaks using your patch in the 0.6.2. I
have done more than 100M without problems until now...
Thanks!
Daniel Gimenez.
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Problem-with-JVM-concurrent-mode-failure
I meant in the first sentence "running the get_range_slices from a single
point"
On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu wrote:
> Do you mean, running the get_range_slices from a single? Yes, it would be
> reasonable for a relatively small key range, when it comes to analyze a
> really b
Do you mean, running the get_range_slices from a single? Yes, it would be
reasonable for a relatively small key range, when it comes to analyze a
really big range in really big data collection (i.e. like the one we
currently populate) having a way for distributing the reads among the
cluster seems
Great, thank you. Do you have a hypothesis about where things might
be going wrong? Let me know what I can do to help.
On Fri, Apr 30, 2010 at 9:33 AM, Jonathan Ellis wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-1040
>
> On Thu, Apr 29, 2010 at 6:55 PM, Joost Ouwerkerk wrote:
>> Ok
I have the following code in Erlang to set a value and then add a
property. The first set works but the mutate fails. Can anyone
enlighten me?
Thanks
{ok, C} = thrift_client:start_link("127.0.0.1",9160, cassandra_thrift),
Key = "Key1",
%
% set first property
%
thrift_client:call( C,
thanks
according to your explanation , the result sounds reasonable
thanks again~~~
2010-04-30
Bingbing Liu
发件人: Sylvain Lebresne
发送时间: 2010-04-30 20:54:04
收件人: user
抄送:
主题: Re: why the sum of all the nodes' loads is much bigger than the sizeof the
inserted data?
I believe on
[r...@calculus apache-cassandra-0.6.1]# bin/cassandra -f
Can't start up: not enough memory
Am beginner in Cassandra.
My installation failing due to the above error in linux.
Could any one give solution for this.?
Thanks in advance.
On Fri, Apr 30, 2010 at 7:14 AM, Utku Can Topçu wrote:
> Hey All,
>
> I've been looking at the documentation and related articles about Cassandra
> and Hadoop integration, I'm only seeing ColumnFamilyInputFormat for now.
> What if I want to write directly to cassandra after a reduce?
Then you jus
compaction starts but never finishes.
are you inserting all these files into the same row? don't do that.
On Fri, Apr 30, 2010 at 3:04 AM, Spacejatsi wrote:
> I ran again the test, inserting 64 files (15-25MB per file) with 2 threads
> inserting file file at the time.
> First 30 files goes rel
each row has an index and bloom filter of column names, and then there
is the overhead of the java objects.
On Thu, Apr 29, 2010 at 11:05 PM, Andrew Nguyen
wrote:
> When making rough calculations regarding the potential size of a single row,
> what sort of overhead is there to consider? In other
Nope. You could write one using bin/sstablekeys though.
On Thu, Apr 29, 2010 at 8:58 PM, Carlos Sanchez
wrote:
> All,
>
> Does anyone know of a program (series of classes) that can capture the key
> distribution of the rows in a ColumnFamily, sort of a [sub] string-histogram.
>
> Thanks,
>
> Ca
https://issues.apache.org/jira/browse/CASSANDRA-1040
On Thu, Apr 29, 2010 at 6:55 PM, Joost Ouwerkerk wrote:
> Ok, I reproduced without mapred. Here is my recipe:
>
> On a single-node cassandra cluster with basic config (-Xmx:1G)
> loop {
> * insert 5,000 records in a single columnfamily with
Sounds like doing this w/o m/r with get_range_slices is a reasonable way to go.
On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu wrote:
> I'm currently writing collected data continuously to Cassandra, having keys
> starting with a timestamp and a unique identifier (like
> 2009.01.01.00.00.00.RAND
I believe one of the reason is all the metadata. As far as I
understand what you said,
you have 500 millions rows with each having only one column. The
problem is that
a row have a bunch of metadata: a bloom filter, a column index plus a
few other bytes
to store the number of column, if the row is
Dont forget to count timestamps for each column.
2010/4/30 Bingbing Liu
> hi,
>
> thanks for your help.
>
> i run the nodetool -h compact
>
> but the load keep the same , is there anyone can tell me why?
>
>
> 2010-04-30
> --
> Bingbing Liu
> --
hi,
thanks for your help.
i run the nodetool -h compact
but the load keep the same , is there anyone can tell me why?
2010-04-30
Bingbing Liu
发件人: casablinca126.com
发送时间: 2010-04-30 15:52:09
收件人: user@cassandra.apache.org
抄送:
主题: Re: why the sum of all the nodes' loads is muc
Hey All,
I've been looking at the documentation and related articles about Cassandra
and Hadoop integration, I'm only seeing ColumnFamilyInputFormat for now.
What if I want to write directly to cassandra after a reduce?
What comes to my mind is, in the Reducer's setup I'd initialize a Cassandra
c
Hi all,
We are writing an article for a magazine and would like to write about
monitoring, more precisely on the nodetools, but did not find many things
about the tool.
I want to help or a brief explanation about nodetools commands ...
Available commands: ring, info, cleanup, compact, cfstats, sna
hi,
It seems changing the INDEX_INTERVAL with conflict with
AntiEntropyService, right?
I will reconstruct my sstables.
Thank you, Jonathan!
cheers,
Cao Jiguang
--
casablinca126.com
2010-04-30
---
I ran again the test, inserting 64 files (15-25MB per file) with 2 threads
inserting file file at the time.
First 30 files goes relatively fast in, but then it jams, and finally timeouts.
This tpstats is taken when the first timeout came.
I also tested to split the files max of 5 mb per file. T
Two rows are never compared by the MD5 of their keys. The md5 of a row key is
just used to choose which nodes of the cluster are responsible for the row.
On Fri, Apr 30, 2010 at 5:37 AM, Mark Jones wrote:
> MD5 is not a perfect hash, it can produce collisions, how are these dealt
> with?
>
> Is t
hi,
Have you ever run anti-compaction(more than 1 time, maybe), but never run
cleanup on
the anti-compaction node?
cheers,
Cao Jiguang
2010-04-30
casablinca126.com
发件人: Bingbing Liu
发送时间: 2010-04-30 15:24:45
收件人: user
抄送:
主题: why the sum of all the nodes' loads is much bigger
i insert 500,000,000 rows each of which has a key of 20 bytes and a column of
110 bytes.
and the repilcationfactor is set to 3, so i expect the load of the cluster
should be 0.5 billion * 130 * 3 = 195 G bytes.
but in the fact the load i get through "nodetool -h localhost ring" is about
443G.
Here is the ticket: https://issues.apache.org/jira/browse/CASSANDRA-1039
Thanks, Roland
2010/4/29 Jonathan Ellis
> 2010/4/29 Roland Hänel :
> > Imagine the following rule: if we are in doubt whether to repair a column
> > with timestamp T (because two values X and Y are present within the
> clu
Thanks Ellis,
so the common scenario is to store data in one CF and any index (inverted?)
in another CF?
2010/4/30 Jonathan Ellis
> the correct data model is one where you can pull the data you want out
> as a slice of a row, or (sometimes) as a slice of sequential rows.
> usually this involv
39 matches
Mail list logo