If it cannot protect against lost updates, isn't that an issue? How is client
support to protect against concurrency? I see lot of users mentioning the
use of cages (i.e. use ZooKeeper) but involving locks on every writes at the
application level is certainly not acceptable. And again, the applica
Hi,
How would you use rsync instead of repair in case of a node failure?
Rsync all files from the data directories from the adjacant nodes
(which are part of the quorum group) and then run a compactation which
will? remove all the unneeded keys?
Thanks,
Thibaut
On Thu, Feb 24, 2011 at 4:22 AM,
Himanshi,
you could try adding your public IP address to an internal interface and
DNAT the packets to it. This shouldn't give you any problems with your
normal traffic. Tell Cassandra on listen on the public IPs and it should
work.
Linux commands would be:
# Create an internal interface using b
I dont think i got the point in your question. But if you are thinking
about key indexes (like PKs), take in mind that cassandra will manage
keys using the partition strategy. By doing so, it will be able to
determine on which node the row with such key should be hold.
So, in another words, inside
My 2 cents ..
1. Focus should be on the core problem Cassandra is solving i.e.
Availability, Partitioning and a form of consistency that works (in spite of
all the questions) . All this with high performance is a huge step forward -
architecturally!
2. Any enhancement should shore up the core valu
Thanks Daniel.
But SNAT command is not working and when i try tcpdump it gives
[root@ip-10-136-75-201 ~]# tcpdump -i 50.18.60.117 -n port 7000
tcpdump: Invalid adapter index
Not able to figure out wats this ??
Thanks,
Himanshi
From:
Daniel van Ham Colchete
To:
user@cassandra.apache.org
Da
First of all, in your example W=CL?
If it so, then the success of any read / write operarion will be
determine by if the CL required can be satisfied in that moment.
If you write with CL ONE over a CF with RF 3 when 1 node of the
replicas is down, then the operarion will success and HitedHandOff
To the list owners - the error text that gmail comes back with is below
Now I understand that much of what I write is spam quality, so the mail
filter might actually be smart ;).
New posts works, as this one hopefully will. If is on reply that I have a
problem. Any pointers to avoid this situatio
>>c. Read with CL = QUORUM. If read hits node1 and node2/node3, new data
that was written to node1 will be returned.
>>In this case - N1 will be identified as a discrepancy and the change will
be discarded via read repair
>>[Naren] How will Cassandra know this is a discrepancy?
Because at Q - on
On Thu, Feb 24, 2011 at 3:22 AM, Anthony John wrote:
> Apologies : For some reason my response on the original mail keeps bouncing
> back, thus this new one!
> > From the other hand, the same article says:
> > "For conditional writes to work, the condition must be evaluated at all
> update
> > si
Himanshi,
my bad, try this for iptables:
# SNAT outgoing connections
iptables -t nat -A POSTROUTING -p tcp --dport 7000 -d 175.41.143.192 -j SNAT
--to-source INTERNALIP
As for tcpdump the argument for the -i option is the interface name (eth0,
cassth0, etc...), and not the IP. So, it should be
t
Hi
i'm using a 3 node cluster of cassandra 0.6.1 together with hector as api to
java client.
every few days I get a situation where I cannot connect to cassandra, other
than that the data dir is filling up the whole disk space and the
synchronization stops at these times, the exceptions I get are
On Thu, Feb 24, 2011 at 4:08 AM, Thibaut Britz
wrote:
> Hi,
>
> How would you use rsync instead of repair in case of a node failure?
>
> Rsync all files from the data directories from the adjacant nodes
> (which are part of the quorum group) and then run a compactation which
> will? remove all the
have you tried replying without copying in the entire conversation
thread to the message?
On Thu, Feb 24, 2011 at 1:40 PM, Anthony John wrote:
> To the list owners - the error text that gmail comes back with is below
> Now I understand that much of what I write is spam quality, so the mail
> filt
Do not copy the entire thread, only hit reply!
It seems as the thread grows in responses, the spam word count somehow kicks
in.
Thx,
-JA
On Thu, Feb 24, 2011 at 9:44 AM, Sasha Dolgy wrote:
> have you tried replying without copying in the entire conversation
> thread to the message?
>
> On Thu
Hello,
Have there been Cassandra implementations in non-latin languages. In
particular: Mandarin (China) ,Devanagari (India), Korean (Korea)
I am interested in finding if there are storage, sorting or other
types of issues one should be aware of in these languages.
Thanks.
>
> >>Time stamps are not used for conflict resolution - unless is is part of
> the application logic!!!
>
>>What is you definition of conflict resolution ? Because if you update
twice the same column (which
>>I'll call a conflict), then the timestamps are used to decide which update
wins (which I
On Thu, Feb 24, 2011 at 5:34 PM, Anthony John wrote:
> >>Time stamps are not used for conflict resolution - unless is is part of
>> the application logic!!!
>>
>
> >>What is you definition of conflict resolution ? Because if you update
> twice the same column (which
> >>I'll call a conflict), the
>Time stamps are not used for conflict resolution - unless is is part of the
application logic!!!
This is false. In fact, the main reason Cassandra keeps timestamps is to do
conflict resolution. If there is a conflict between two replicas, when doing
a read or a repair, then the highest timestamp
If you are correct and you are probably closer to the code - then CL of
Quorum does not guarantee a consistency.
On Thu, Feb 24, 2011 at 10:54 AM, Sylvain Lebresne wrote:
> On Thu, Feb 24, 2011 at 5:34 PM, Anthony John wrote:
>
>> >>Time stamps are not used for conflict resolution - unless is is
On Thu, Feb 24, 2011 at 6:01 PM, Anthony John wrote:
> If you are correct and you are probably closer to the code - then CL of
> Quorum does not guarantee a consistency.
If the operation succeed, it does (for some definition of consistency which
is, following reads at Quorum will be guaranteed
Completely understand!
All that I am quibbling over is whether a CL of quorum guarantees
consistency or not. That is what the documentation says - right. IF for a CL
of Q read - it depends on which node returns read first to determine the
actual returned result or other more convoluted conditions
Generally no. But yes if retrieving the key through index is faster than
going through the hash buckets.
Currently I am thinking there could be 100s of million or billion of rows
and in that case if we have to retrieve a row which one will be fast going
through hash bucket or index? I am thinkin
On Thu, Feb 24, 2011 at 6:33 PM, Anthony John wrote:
> Completely understand!
>
> All that I am quibbling over is whether a CL of quorum guarantees
> consistency or not. That is what the documentation says - right. IF for a CL
> of Q read - it depends on which node returns read first to determine
Another possibility is this:
why not setup 2 nodes in 1 region in 1 az, and get that to work.
Then, open a third node in the same region, but different AZ, and get that
to work.
Then, once you have that working, open a fourth node in a different region
and get that to work.
Seems like taking a pi
If you mean does it make sense to have a CF where each row contains a set of
keys to other rows in another CF, then yes, that's a common design pattern,
although usually it's because you're creating collections of those rows
(i.e. a Groups CF where each row consists of a set of keys to rows in the
Gotcha I had forgotten about the gossip piece, that makes sense.
-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, February 23, 2011 5:00 PM
To: Truelove, Jeremy: IT (NYK)
Cc: user@cassandra.apache.org
Subject: Re: Multiple Seeds
On Wed, Feb 23, 201
What I am trying to ask is that what if there are billions of row keys (eg:
abc, def, xyz in below eg.) and then client does a lookup/query on a row say
xyz (get all cols for row xyz). Now since there are billions of rows look up
using Hash mechanism, is it going to be slow? What algorithm will be
Not sure if there is a particular reason for you using different regions,
but Amazon states that each zone is a different physical location completely
separate from others, e.g. us-east-1a and us-east-1b. Using the Amazon
internal IPs (10.x. etc) reduces latency greatly by not going outbound
throu
I see the point - apologies for putting everyone through this!
It was just militating against my mental model.
In summary, here is my take away - simple stuff but - IMO - important to
conclude this thread (I hope):-
1. I was splitting hair over a failed ( partial ) Q Write. Such an event
should b
Javier Canillas wrote:
>
> Instead, when you execute the same OP using CL QUORUM, then it means
> RF /2+1, it will try to write on the coordinator node and replica.
> Considering only 1 replica is down, the OP will success too.
>
I am assuming even read will succeed when CL QUORUM and RF=3 and
I really don't see the point.. Again, suppose a cluster with 3 nodes, where
there is a ColumnFamily that will hold data which key is basically consisted
on a word of 2 letters (pretty simple). That's make a total of 729 posible
keys.
RandomPartitioner then will tokenize each key and assign them to
Thanks all for good detail and clarification. I just wanted to get things
clear and understand correctly what is the expected behavior when working
with Cassandra against various failure conditions so that application can be
designed accordingly and provide proper locking/synchronization if require
Well, it will need all nodes that are required on the operation to be up,
and to response in a timely fashion, even a time-out rpc of 1 replica will
get you a fail response.
CL is calculated based on the RF configured for the ColumnFamily.
"The ConsistencyLevel is an enum that controls both read
You are missing the point. The coordinator node that is handling the request
won't wait for all the nodes to return their copy/digest of data. It just
wait for Q (RF/2+1) nodes to return. This is the reason I explained two
possible scenarios.
Further, on what basis Cassandra will know that the dat
thanks Narendra. I read again the wiki quote you pasted below and now it
does make sense. Cassandra's design behavior is to propagate the failed
write if it was ever written successfully to atleast one server. I was
having hard time trying to work around this but I guess I am starting to
think the
>>but could be broken in case of a failed write<<
You can think of a scenario where R + W >N still leads to
inconsistency even for successful writes. Say you keep W=1 and R=N .
Lets say the one node where a write happened with success goes down
before it made to the other N-1 nodes. Lets say it goe
Thanks! I am thinking more in terms where you have millions of keys (rows).
For eg: UUID as a row key. or there could millions of users.
So are we saying that we should NOT create column families with these many
keys? What are the other options in such cases?
UserProfile = { // this is a Column
Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node that
owns the key dies. Would subsequent write operations for that key be
successful? I am guessing it will not succeed.
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Under
On Thu, Feb 24, 2011 at 1:26 PM, mcasandra wrote:
>
> Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node
> that
> owns the key dies. Would subsequent write operations for that key be
> successful? I am guessing it will not succeed.
>
No, it would not succeed. It would only s
I don't say you shouldn't. In case you feel like there is a problem, you may
think of splitting column families into N. But I think you won't get that
problem. You can read about RowCacheSize and KeyCache support on 0.7.X of
Cassandra, if you rows are small, you may cache a lot of them and avoid a
HH is some kind of write repair, so it has nothing to do with CL that is a
requirement of the operation; and it won't be used over reads.
In your example QUORUM is the same as ALL, since you only have 1 RF (only
the data holder - coordinator). If that node fails, all read / writes will
fail.
Now,
The leap of faith here is that an error does not mean a clean backing out to
prior state - as we are used to with databases. It means that the operation
in error could have gone through partially
Again, this is not an absolutely unfamiliar territory and can be dealt with.
-JA
On Thu, Feb 24, 201
Hi everyone
I am new to JAVA and Cassandra.
I just get started to install Cassandra.
My Machine is Debian 5.0.6.
I installed jdk1.6.0_24 to /usr/local
java -version is as following.
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Server VM (build 19.1-
yes, that is difficult to digest and one has to be sure if the use
case can afford it.
Some other NOSQL databases deals with it differently (though I don't
think any of them use atomic 2-phase commit). MongoDB for example will
ask you to read from the node you wrote first (primary node) unless
you
Javier Canillas wrote:
>
> HH is some kind of write repair, so it has nothing to do with CL that is a
> requirement of the operation; and it won't be used over reads.
>
> In your example QUORUM is the same as ALL, since you only have 1 RF (only
> the data holder - coordinator). If that node fai
It all depends on what you're trying to do. What you're proposing doing, by
defintion, is creating a secondary index. The primary index is your row
key. Depending on the partitioner, it might or might not be a conveniently
iterable index or sorted index. If you need your keys sorted in a differ
No, since you are intentionally asking that at least a QUORUM of the RFs are
written. So in your scenario, only 1 node is up of 3, and QUORUM value is 2.
So that operation will fail, no HH is made.
A read won't succedd either, since you are asking that the data to be
returned must be validated at
I'm doing insertion with a pycassa client. It seems to work in most cases,
but sometimes, when I go to Cassandra-cli, and query with key and column
that I inserted, I get "null" whereas I shouldn't. What could be causes for
that?
--
View this message in context:
http://cassandra-user-incubator-a
Thanks. This helps a lot!
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061838.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
You're welcomed!
On Thu, Feb 24, 2011 at 5:30 PM, mcasandra wrote:
>
> Thanks. This helps a lot!
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061838.html
> Sent from the cassandra-u...@incubato
On Thu, Feb 24, 2011 at 2:27 PM, buddhasystem wrote:
>
> I'm doing insertion with a pycassa client. It seems to work in most cases,
> but sometimes, when I go to Cassandra-cli, and query with key and column
> that I inserted, I get "null" whereas I shouldn't. What could be causes for
> that?
>
C
On Thu, Feb 24, 2011 at 3:03 PM, A J wrote:
> yes, that is difficult to digest and one has to be sure if the use
> case can afford it.
>
> Some other NOSQL databases deals with it differently (though I don't
> think any of them use atomic 2-phase commit). MongoDB for example will
> ask you to read
I wasn't aware that there is an index on primary key (that is row keys). So
from what I understand there is by default an index on for eg: , in
below example? Where can I read more about it?
UserProfile = { // this is a ColumnFamily
{ // this is the key to this Row inside the C
All:
So "ANY" CL seems to mean that Write (and read) on any node, even if it is a
hinted handoff, and return success. Correct ?
Guessing this accommodates node failure - right ?
Does "ALL" succeed even if there is a single surviving replica for the
given piece of data ?
Again, tolerates node fa
On Thu, Feb 24, 2011 at 2:36 PM, Anthony John wrote:
>
> Does "ALL" succeed even if there is a single surviving replica for the
> given piece of data ?
> Again, tolerates node failure. Does it really mean - from ALL surviving
> nodes ?
>
All replicas (RF) for that row must respond before an ope
Thanks Tyler,
ColumnFamily: index1
Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
Row cache size / save period: 0.0/0
Key cache size / save period: 1.0/3600
Memtable thresholds: 0.8765625/50/60
GC grace seconds: 864000
Compaction min/max t
On Thu, Feb 24, 2011 at 3:34 PM, mcasandra wrote:
>
> I wasn't aware that there is an index on primary key (that is row keys). So
> from what I understand there is by default an index on for eg: , in
> below example? Where can I read more about it?
>
> UserProfile = { // this is a ColumnFa
When I've gotten "null" as a result in cassandra-cli, it turned out to mean
that there were exceptions being thrown on the server side. Have you checked
your Cassandra logs?
On Thu, Feb 24, 2011 at 3:44 PM, buddhasystem wrote:
>
> Thanks Tyler,
>
>ColumnFamily: index1
> Columns sorted b
Either I am not explaning properly or I don't understand the data model just
yet. Please check again:
In below example this is what I understand:
1) UserProfile is a CF
2) is a row key
3) username is a column. Each row (eg ) has username column
My understanding is that secondary indexe
While we are at it, there's more to consider than just CAP in distributed :)
http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo wrote:
> On Thu, Feb 24, 2011 at 3:03 PM, A J wrote:
>> yes, that is difficult to digest and one
Thanks! You are right. I see exception but have no idea what went wrong.
ERROR [ReadStage:14] 2011-02-24 21:51:29,374 AbstractCassandraDaemon.java
(line 113) Fatal exception in thread Thread[ReadStage:14,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.columnitera
On Thu, Feb 24, 2011 at 3:55 PM, mcasandra wrote:
>
> Either I am not explaning properly or I don't understand the data model just
> yet. Please check again:
>
> In below example this is what I understand:
>
> 1) UserProfile is a CF
> 2) is a row key
> 3) username is a column. Each row (eg 11
Thanks! I just started reading about Bloom Filter. Is this something that is
inbuilt by default or is it something that need to be explicitly configured?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understanding-Indexes-tp6058238p6062010.html
I should mention that it took me a while to figure this out too. Might be a
candidate for an improvement in the cli?
On Thu, Feb 24, 2011 at 4:01 PM, buddhasystem wrote:
>
> Thanks! You are right. I see exception but have no idea what went wrong.
>
>
> ERROR [ReadStage:14] 2011-02-24 21:51:29,3
On Thu, Feb 24, 2011 at 3:07 PM, mcasandra wrote:
>
> Thanks! I just started reading about Bloom Filter. Is this something that
> is
> inbuilt by default or is it something that need to be explicitly
> configured?
>
It's built in, no configuration needed.
--
Tyler Hobbs
Software Engineer, Data
Hey all,
Our setup is 5 machines running Cassandra 0.7.0 with 24GB of heap and 1.5TB
disk each collocated in a DC. We're doing bulk imports from each of the nodes
with RF = 2 and write consistency ANY (write perf is very important). The
behavior we're seeing is this:
- Nodes often se
On Thu, Feb 24, 2011 at 3:56 PM, A J wrote:
> While we are at it, there's more to consider than just CAP in distributed :)
> http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors
>
> On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo
> wrote:
>> On Thu, Feb 24, 2011 at 3:03 PM,
Retrieving data using row key is the primary way how to get data from
Cassandra, so it's highly optimized.
Firstly, node responsible for the row is computed using partitioner. You can
use RandomPartitioner (distributes md5 of keys) or
OrderPreservingPartitioner (key must be UTF8 string).
Then the r
I am doing some experimenting with indexing. My data CF has about 25000 rows
around 1KB each. I set up a special column of boolean value to use as the
secondary index. I also created my own index in a separate CF where each index
is one row and the column names are the data keys.
The implem
FWIW, for me the advantage of homebrew indexes is that they can be a lot more
sophisticated than the standard -- I can hash combinations of column values
to whatever I want. I also put counters on column values in the index, so
there is lots of functionality. Of course, I can do it because my data
I failed to mention: this is just doing repeated data retrievals using the
index.
> ...
>
> Sample run: Secondary index.
>
> DEBUG Retrieved THS / 7293 rows, in 2012 ms
> DEBUG Retrieved THS / 7293 rows, in 1956 ms
> DEBUG Retrieved THS / 7293 rows, in 1843 ms
...
Right, so I'm interpreting silence as a confirmation on all points. I
opened:
https://issues.apache.org/jira/browse/CASSANDRA-2245
https://issues.apache.org/jira/browse/CASSANDRA-2246
to work on these.
On Wed, Feb 23, 2011 at 5:31 PM, Matt Kennedy wrote:
> Let me start out by saying that I thin
1. Why 24GB of heap? Do you need this high heap? Bigger heap can lead to
longer GC cycles but 15min look too long.
2. Do you have ROW cache enabled?
3. How many column families do you have?
4. Enable GC logs and monitor what GC is doing to get idea of why it is
taking so long. You can add following
--
Junyoung Kim (juneng...@gmail.com)
2011/2/25 Jun Young Kim
>
> --
> Junyoung Kim (juneng...@gmail.com)
>
>
http://goo.gl/3sjE5
On Fri, 2011-02-25 at 10:33 +0800, Ardi Chen wrote:
> 2011/2/25 Jun Young Kim
>
> >
> > --
> > Junyoung Kim (juneng...@gmail.com)
--
Eric Evans
eev...@rackspace.com
This is where things starts getting subtle.
If Cassandra's failure detector knows ahead of time that not enough
writes are available, that is the only time we truly fail a write, and
nothing will be written anywhere. But if a write starts during the
window where a node is failed but we don't know
Even though the client did not get a success message, it is possible
that write may have succeeded on one of the replicas. Let us say that
client did a retry and the write succeeded.
Let us also assume that I was trying to withdraw $100. Initially $100
was withdrawn as per one of the replica
79 matches
Mail list logo