Does Cassandra make any guarantees on the outcome of a scenario like this:
Two clients insert the same key/colum with different values at the same
time:
client A does insert(keyspace, key_1,
column_name_1, value_A, timestamp_1, consistency_level.QUORUM)
client B does insert(keyspace, key_1,
Thanks Jonathan, Brandon and Peter for your quick response. I'm going to test
the issue's workaround. Also I will test "batch mode" instead of "periodic
mode" for the commit log and I'll keep you informed.
Thanks!
Daniel Gimenez.
--
View this message in context:
http://cassandra-user-incubator-
There are other threads linked to this issue. Most notable, I think we're
hitting
https://issues.apache.org/jira/browse/CASSANDRA-1014
here.
2010/4/27 Schubert Zhang
> Seems:
>
> ROW-MUTATION-STAGE 32 3349 63897493
> is the clue, too many mutation requests are pending.
>
>
> Yes,
Thanks all!
The reason I was thinking of having two keyspaces is that I expect them to
evolve at different rates. Our normal column families will change rarely
(hopefully never) but our index column families will change whenever we want
to query the data in a new way, that isn't supported by the c
When I change the cluster name in storage-conf.xml, the CLI complains that
the cluster name doesn't equal "Test Cluster".
How do I change the cluster name that the CLI looks for?
> From what you've all said, it doesn't seem like it's worth it.
No. But you will want to follow that
https://issues.apache.org/jira/browse/CASSANDRA-1007
>
> On Wed, Apr 28, 2010 at 1:13 AM, Mark Robson wrote:
>>
>> I can't see any advantage in using multiple keyspaces. It is highly
>> unlikel
I think even through the real deletion is done when compaction.
The get/get_range_slices should not return the deleted-marked keys (or
columns).
Schubert
On Wed, Apr 28, 2010 at 1:39 PM, Jeff Zhang wrote:
> Thanks Lu, it's helpful.
>
>
> On Wed, Apr 28, 2010 at 11:42 AM, Greg Lu wrote:
> > He
I don't think secondary index is necessary for cassandra core, at least it
is not urgent.
I think currently, the first urgent improvements of cassandra are:
1. re-clarify the data-model.
2. re-implement the storage and index, especially the current SSTable
implement is not good.
In fact, the curre
I think, at least currently, we should leave the logic of current
SuperColumn and addational indexing features to application layer of
cassandra core.
On Wed, Apr 28, 2010 at 6:44 PM, Schubert Zhang wrote:
> I don't think secondary index is necessary for cassandra core, at least it
> is not urge
hi,
The compaction process is very slow, when the size of new generating
sstable file grows upon 25GB;
at the meantime, the garbage collector is running frequently.
Firstly, I have a question that, is there a limitation of the sstable
size? if not, is 2GB heap size not
enough f
OK, I have solved my problems with Cassandra data model. Now I am using
Column Families of type Super and SuperColumns with many columns inside.
Thanks!
2010/4/16 Julio Carlos Barrera Juez
> Hi again,
>
> First of all, obviously, I have omitted the timestamps to make easy the
> representation,
Hi all!
I am using org.apache.cassandra.auth.SimpleAuthenticator to use
authentication in my cluster with one node (with cassandra 0.6.1). I have
put:
org.apache.cassandra.auth.SimpleAuthenticator
in storage-conf.xml file, and:
keyspace=username
in access.properties file, and:
username=password
in
If I understand correctly, the distinction between supercolumns and
subcolumns is critical to good database design if you want to use random
partitioning: you can do range queries on subcolumns but not on
supercolumns.
Is this correct?
On Mon, Apr 26, 2010 at 7:11 PM, Jonathan Ellis wrote:
> I
> OK, I have solved my problems with Cassandra data model. Now I am using
> Column Families of type Super and SuperColumns with many columns inside.
You need to be aware of the third point of
http://wiki.apache.org/cassandra/CassandraLimitations.
That is, super columns are not indexed. Which means
Hi,
I have a question about if a row in a Column Family has only columns
whether all of the columns are deserialized in memory if you need any of
them? As I understood it is the case, and if the Column Family is super
Column Family, then only the Super Column (entire) is brought up in memory?
Wh
Hey folks! I found out about Cassandra just yesterday. I took the dive. I
have Cassandra and Java installed on my Ubuntu box on my Rackspace Cloud
server.
I am having a hard time getting things going. I am so used to Relational
Databases such as MySQL and MSSQL that I really do not know where to s
did you check the log for exceptions?
On Wed, Apr 28, 2010 at 12:08 AM, Bingbing Liu wrote:
> but the situation is that ,at the beginning everything goes well, then when
> the get_range_slices gets about 13,000,000 rows (set the key range to 2000)
>
> the exception happens.
>
> and when i do the
2010/4/28 Даниел Симеонов :
> Hi,
> I have a question about if a row in a Column Family has only columns
> whether all of the columns are deserialized in memory if you need any of
> them? As I understood it is the case,
No it's not. Only the columns you request are deserialized in memory. The o
Hi,
Yesterday, I saw a lot of discussion about how to store a file (big one). It
looks like the suggestion is store in multiple rows (even not multiple
column in a single row).
My question is:
Is there any best maximum column size which can help to make the decision on
the segment size? Is
Hi,
Here are some links I collected:
1. http://wiki.apache.org/cassandra/CassandraCli: this is how bring it up
and run
2. http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model is very
good to start to understand the schema
3. http://codeqm.blogspot.com/2010/03/jav
Hi Sylvain,
Thank you very much! I still have some further questions, I didn't find
how row cache is being configured? Regarding the splitting of rows, I
understand that it is not so necessary, still I am curious whether it is
implementable by the client code.
Best regards, Daniel.
2010/4/28 Syl
Hi all,
i'm trying to run a scenario of adding files from specific folder to cassandra.
Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data.
I'm able to insert a round 40 files, but after that the cassandra goes to some
GC loop and I finally get an timeout to the client.
It
2010/4/28 Даниел Симеонов :
> Hi Sylvain,
> Thank you very much! I still have some further questions, I didn't find
> how row cache is being configured?
Provided you don't use trunk but something stable like 0.6.1 (which
you should),
it is in storage-conf.xml. It's one option of the definition o
I also want to know
2010/4/28 David Boxenhorn
> When I change the cluster name in storage-conf.xml, the CLI complains that
> the cluster name doesn't equal "Test Cluster".
>
> How do I change the cluster name that the CLI looks for?
>
On Tue, Apr 27, 2010 at 4:54 PM, S Ahmed wrote:
> Just trying to get my head wrapped around everything here, so bare with me
> :)
> So Thrift can spit out generated code for any language, be it C#, Java or
> python etc.
> Hector is a higher level wrapper around the java generated code by Thrift.
>
If you're running so close to the edge of running out of memory that
creating a ln process pushes you over the edge, you should fix the
broader cause instead of the specific symptom. :)
On Tue, Apr 27, 2010 at 10:09 PM, Lee Parker wrote:
> So, after reading the thread which Eric posted earlier, I
new try, previous went to wrong place...
Hi all,
i'm trying to run a scenario of adding files from specific folder to cassandra.
Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data.
I'm able to insert a round 40 files, but after that the cassandra goes to some
GC loop and I
Is there a Cassandra Navigator, or some way that I can see the data in
Cassandra if I don't know what the keys are?
Hello,
our company has a huge table in a relational database which keeps statistics
of some financional operations.
It looks like the following:
SERVER_ID - server, which served the transaction
ACCOUNT_FROM - account1
ACCOUNT_TO - account2
HOUR - time range for this statistics row (from 0 minutes
2010/4/28 Roland Hänel :
> Two clients insert the same key/colum with different values at the same
> time:
>
> client A does insert(keyspace, key_1,
> column_name_1, value_A, timestamp_1, consistency_level.QUORUM)
> client B does insert(keyspace, key_1,
> column_name_1, value_B, timestamp_1,
I don't think you are missing anything. You'll have to pick your poison.
FWIW, if each BAR has relatively few fields then supercolumns aren't
bad. It's when a BAR has dynamically growing numbers of fields
(subcolumns) that you get in trouble with that model.
On Tue, Apr 27, 2010 at 4:24 PM, Jon
It sounds like either there is a fairly obvious bug, or you're doing
something wrong. :)
Can you reproduce against a single node?
On Tue, Apr 27, 2010 at 5:14 PM, Joost Ouwerkerk wrote:
> Update: I ran a test whereby I deleted ALL the rows in a column
> family, using a consistency level of ALL.
Yes, incremental mode is definitely contraindicated for Cassandra.
On Wed, Apr 28, 2010 at 1:07 AM, Peter Schuller
wrote:
>> -XX:+CMSIncrementalMode \
>> -XX:+CMSIncrementalPacing \
>
> This may not be an issue given your other VM opts, but just FYI I
> have had some difficulty m
The thing is, that I'm not running close to being out of memory. The data
from nodetool info is showing that only about half of the available heap
space is being used and running free from the command line shows that I have
plenty of RAM available and some usage of the 1G swap space which is alway
Compaction time is proportional to the size of the sstable, yes. Not
sure how it could be otherwise. And it does generate a lot of
garbage. So unless you are seeing concurrent failures in the GC and
corresponding large pause times, your heap should be fine, as long as
the rows you are compacting
Thanks Jonathan, that hits exactly the heart of my question. Unfortunately
it kills my original idea to implement a "unique transaction identifier
creation algorithm" - for this, even eventual consistency would be
sufficient, but I would need to know if I am consistent at the time of a
read request
Hi,
What about if the upper bound of columns in a row is loosely defined, i.e.
it is ok that we have maximum of around 100 for example, but not exactly
(maybe 105, 110)?
What if I make a slice query to return say 1/5th of the columns in a row, I
believe that such query again will not deserialize
On Tue, Apr 27, 2010 at 10:49 PM, Jeff Zhang wrote:
> Mark,
>
> Thanks for your suggestion, It's really not a good idea to store one
> file in multiple columns in one row. The heap space problem will still
> exist. And I take your advice to store it in multiple rows, it works,
> I can event store
> One last question (sorry to bother you): isn't the behavior of read repair
> strictly deterministic in this case? You say both read requests could try to
> read repair the result (each time in the opposite direction). Inside the
> read repair algorithm, when we have exactly the same timestamps, w
> Hi,
> What about if the upper bound of columns in a row is loosely defined, i.e.
> it is ok that we have maximum of around 100 for example, but not exactly
> (maybe 105, 110)?
> What if I make a slice query to return say 1/5th of the columns in a row, I
> believe that such query again will not
-XX:CMSInitiatingOccupancyFraction=75
It's at 90 by default on most systems. Turning this down to the above
would trigger the incremental collection to occur 75% full as opposed
to %90.
-Nate
On Wed, Apr 28, 2010 at 7:44 AM, Jonathan Ellis wrote:
> Yes, incremental mode is definitely contraind
I think your file (as cassandra column value) is too large.
And I also think Cassandra is not good at store files.
On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri
wrote:
> new try, previous went to wrong place...
>
> Hi all,
>
> i'm trying to run a scenario of adding files from specific folder to
>
Your schema desigin is a RDBMS schema, not a Cassandra schema.
On Thu, Apr 15, 2010 at 11:44 PM, Miguel Verde wrote:
> Just to nitpick your representation a little bit, columnB/etc... are
> supercolumnB/etc..., key1/etc... are column1/etc..., and you can probably
> omit valueA/valueD designations
On Wed, Apr 28, 2010 at 5:24 AM, David Boxenhorn wrote:
> If I understand correctly, the distinction between supercolumns and
> subcolumns is critical to good database design if you want to use random
> partitioning: you can do range queries on subcolumns but not on
> supercolumns.
>
> Is this co
I was thinking this too, but I think that the overall insert amount is
not that big.
Data is basically map data, and the files are map tiles, which I can
easily make smaller.
We are currently using this data from multiple nodes(GRID), but we want
to get rid off the files system hassle(basically sam
On 4/26/10 2:44 AM, dir dir wrote:
Suppose I have a MPEG video files 15 MB. To save this video file into
Cassandra database I will store
this file into array of byte. One day, I feel this video is not
necessary again,
therefore I delete it from the database. My question is, after I
delete this
There is no column size limitation. As to performance due to the size of a
column and with the speeds that Cassandra are running at, I don't belive it
would make a bit of a difference if it was 1 byte or a million bytes.
Can anyone here prove me right or wrong?
Regards,
Michael
On Wed, Apr 28,
When I tried to build Cassandra inserting the patch I got the error copied
below. I am trying to build the 0.6 version. Is that patch for this version?
build-project:
[echo] apache-cassandra: /opt/cassandra/build.xml
[javac] Compiling 315 source files to /opt/cassandra/build/classes
Hello. I am using Cassandra 0.6.1 on ubuntu 8.04. 3 node cluster.
I notice that when I start making lots of read requests (serially), memory
usage of jsvc keeps climbing until it uses up all memory on the server (happens
for all 3 servers in the cluster). At that point, the box starts swappin
On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung wrote:
> Hello. I am using Cassandra 0.6.1 on ubuntu 8.04. 3 node cluster.
>
> I notice that when I start making lots of read requests (serially), memory
> usage of jsvc keeps climbing until it uses up all memory on the server
> (happens for all 3
Hi Ryan,
Do you mean these settings, or other settings?
64
32
8
64
64
256
0.3
60
Thanks!
Kyusik Chung
On Apr 28, 2010, at 12:28 PM, Ryan King wrote:
> On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung
> wrote:
>> Hello. I am using Cassandra 0.6.1 on ubuntu 8.04. 3 node cluster.
>>
>> I noti
It might make sense to create a CompositeType subclass of AbstractType for
the purpose of constructing and comparing these types of "composite" column
names so that if you could more easily do that sort of thing rather than
having to concatenate into one big string.
On Wed, Apr 28, 2010 at 10:25 A
On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn wrote:
> When I change the cluster name in storage-conf.xml, the CLI complains that
> the cluster name doesn't equal "Test Cluster".
What do you mean? I don't see any checks for cluster name equality in
the CLI code.
--
Jonathan Ellis
Project Ch
On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn wrote:
> When I change the cluster name in storage-conf.xml, the CLI complains that
> the cluster name doesn't equal "Test Cluster".
>
> How do I change the cluster name that the CLI looks for?
>
I don't think you mean the CLI, but the Cassandra d
On 4/28/10 2:47 AM, Schubert Zhang wrote:
I think even through the real deletion is done when compaction.
The get/get_range_slices should not return the deleted-marked keys (or
columns).
http://wiki.apache.org/cassandra/FAQ#range_ghosts
=Rob
Yes! Reproduced on single-node cluster:
10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884
10/04/28 16:30:24 INFO mapred.JobClient: TOMBSTONES=951083
10/04/28 16:42:49 INFO mapred.JobClient: ROWS=166580
10/04/28 16:42:49 INFO mapred.JobClient: TOMBSTONES=1059387
On Wed, Apr 28,
Ah, now I understand. Supercolumns it is.
On Wed, Apr 28, 2010 at 9:40 AM, Jonathan Ellis wrote:
> I don't think you are missing anything. You'll have to pick your poison.
>
> FWIW, if each BAR has relatively few fields then supercolumns aren't
> bad. It's when a BAR has dynamically growing nu
This sounds similar to /proc/sys/vm/swappiness misconfiguration. Is it zero
or close to zero? If setting it 0 solves your problem, make sure all your
nodes get this:
/etc/sysctl.conf:
vm.swappiness=0
On Wed, Apr 28, 2010 at 12:12 PM, Kyusik Chung wrote:
> Hello. I am using Cassandra 0.6.1 on
Hi, I currently can't build Cassandra from the source repository.
I get a bunch of checksum issues like:
[ivy:retrieve] problem while downloading module descriptor:
http://repo1.maven.org/maven2/org/apache/apache/5/apache-5.pom:
invalid sha1: expected=��
[ivy:retrieve] @��/:n+�?���p_��/
[iv
OK so the issue seems to be that the maven repo's web server (nginx)
sends though files gzipped regardless as to whether or not the client
requested as such.
Unfortunately I cant work out to share this information with Ivy.
Switching to Ibiblio repository leads to another set of problems.
On Thu,
Isnt setting swappiness to a lower value a good idea only if you know you have
the physical RAM to support it? What Im observing on my box is that jsvc uses
up all the physical RAM. Its VM size is 4-5GB right now (not sure if it will
continue to grow).
Apologies if Im misunderstanding how the
http://www.reddit.com/r/programming/comments/bcqhi/reddits_now_running_on_cassandra/
It seems to me that they are still using Cassandra in persistant storage
layer as a replacement of memcachedb, not in cache layer.
I'm new here with Cassandra actually, but now I'm also curious about the
possibil
Facebook did a lot of work to keep their huge memcache
cluster consistent and fault-tolerant.
I think a cache infrastructure like Cassandra would make that a lot easier.
On Thu, Apr 29, 2010 at 11:54 AM, Lisen Mu wrote:
>
> http://www.reddit.com/r/programming/comments/bcqhi/reddits_now_running_o
On Wed, Apr 21, 2010 at 10:08 PM, Oleg Anastasjev wrote:
> Hello,
>
> I am testing how cassandra behaves on single node disk failures to know
> what to
> expect when things go bad.
> I had a cluster of 4 cassandra nodes, stress loaded it with client and made
> 2
> tests:
> 1. emulated disk failure
use get_range_slices, with a start key of '', and page through it
On Wed, Apr 28, 2010 at 9:26 AM, David Boxenhorn wrote:
> Is there a Cassandra Navigator, or some way that I can see the data in
> Cassandra if I don't know what the keys are?
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
Interesting. Googling your error turns up
http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error12-cannot-allocate-memory-calling-runt
Why not just leave the swap on? It's usually a Good Thing to be able
to page out unused memory, and use the ram for buffer cache inste
2010/4/28 Roland Hänel :
> Thanks Jonathan, that hits exactly the heart of my question. Unfortunately
> it kills my original idea to implement a "unique transaction identifier
> creation algorithm" - for this, even eventual consistency would be
> sufficient, but I would need to know if I am consist
Good! :)
Can you reproduce w/o map/reduce, with raw get_range_slices?
On Wed, Apr 28, 2010 at 3:56 PM, Joost Ouwerkerk wrote:
> Yes! Reproduced on single-node cluster:
>
> 10/04/28 16:30:24 INFO mapred.JobClient: ROWS=274884
> 10/04/28 16:30:24 INFO mapred.JobClient: TOMBSTONES=951083
>
key : stock ID, e.g. AAPL+year
column family: closting price and valume, tow CFs.
colum name: timestamp LongType
AAPL+2010-> CF:closingPrice -> {'04-13' : 242, '04-14': 245}
AAPL+2010-> CF:volume -> {'04-13' : 242, '04-14': 245}
On Thu, Apr 22, 2010 at 2:00 AM, Miguel Verde wrote:
> On Wed, Ap
I found hector is not a good design.
1. We cannot create multiple threads (each thread have a connection to
cassandra server) to one cassandra server.
As we known, usually, cassandra client should be multiple-threads to
achieve good throughput.
2. The implementation is too fat.
3. Introduce
Hi Schubert, I'm sorry Hector isn't a good fit for you, so let's see what's
missing for your.
On Thu, Apr 29, 2010 at 8:22 AM, Schubert Zhang wrote:
> I found hector is not a good design.
>
> 1. We cannot create multiple threads (each thread have a connection to
> cassandra server) to one cassan
71 matches
Mail list logo