"The bad design part (just my opinion, no intention to offend) is not allow
the possibility of sending batches directly to the data nodes, without
using a coordinator."
Well it's normal that it's not possible.
What is a batch ? It's a bunch of insert/update/delete statements put
together. Now e
I forgot to add that each connection can handle multiple simultaneous
queries. This was part of the original protocol as of C* 1.2:
http://www.datastax.com/dev/blog/binary-protocol
Asynchronous: each connection can handle more than one active request
at the same time. In practice, this means that
There is nothing preventing that in Cassandra, it's just a matter of how
intelligent the driver API is. Submit a feature request to Astyanax or
Datastax driver projects.
On Fri, Jun 20, 2014 at 2:27 PM, Marcelo Elias Del Valle <
marc...@s1mbi0se.com.br> wrote:
> The bad design part (just my opin
The bad design part (just my opinion, no intention to offend) is not allow
the possibility of sending batches directly to the data nodes, without
using a coordinator.
I would choose that option.
[]s
2014-06-20 16:05 GMT-03:00 DuyHai Doan :
> Well it's kind of a trade-off.
>
> Either you send da
Well it's kind of a trade-off.
Either you send data directly to the primary replica nodes to take
advantage of data-locality using token-aware strategy and the price to pay
is a high number of opened connections from client side.
Or you just batch data to a random node playing the coordinator ro
I am using python + CQL Driver.
I wonder how they do...
These things seems little important, but they are fundamental to get a good
performance in Cassandra...
I wish there was a simpler way to query in batches. Opening a large amount
of connections and sending 1 message at a time seems bad to me,
That depends on the connection pooling implementation in your driver.
Astyanax will keep N connections open to each node (configurable) and route
each query in a separate message over an existing connection, waiting until
one becomes available if all are in use.
On Fri, Jun 20, 2014 at 12:32 PM,
A question, not sure if you guys know the answer:
Supose I async query 1000 rows using token aware and suppose I have 10
nodes. Suppose also each node would receive 100 row queries each.
How does async work in this case? Would it send each row query to each node
in a different connection? Different
I've found that if you have any amount of latency between your client and
nodes, and you are executing a large batch of queries, you'll usually want
to send them together to one node unless execution time is of no concern.
The tradeoff is resource usage on the connected node vs. time to complete
al
However my extensive benchmarking this week of the python driver from
master shows a performance *decrease* when using 'token_aware'.
This is on 12-node, 2-datacenter, RF-3 cluster in AWS.
Also why do the work the coordinator will do for you: send all the queries,
wait for everything to come back
Yes, I am using the CQL datastax drivers.
It was a good advice, thanks a lot Janathan.
[]s
2014-06-20 0:28 GMT-03:00 Jonathan Haddad :
> The only case in which it might be better to use an IN clause is if
> the entire query can be satisfied from that machine. Otherwise, go
> async.
>
> The nati
The only case in which it might be better to use an IN clause is if
the entire query can be satisfied from that machine. Otherwise, go
async.
The native driver reuses connections and intelligently manages the
pool for you. It can also multiplex queries over a single connection.
I am assuming yo
This is interesting, I didn't know that!
It might make sense then to use select = + async + token aware, I will try
to change my code.
But would it be a "recomended solution" for these cases? Any other options?
I still would if this is the right use case for Cassandra, to look for
random keys in
If you use async and your driver is token aware, it will go to the
proper node, rather than requiring the coordinator to do so.
Realistically you're going to have a connection open to every server
anyways. It's the difference between you querying for the data
directly and using a coordinator as a
But using async queries wouldn't be even worse than using SELECT IN?
The justification in the docs is I could query many nodes, but I would
still do it.
Today, I use both async queries AND SELECT IN:
SELECT_ENTITY_LOOKUP = "SELECT entity_id FROM " + ENTITY_LOOKUP + " WHERE
name=%s and value in(%s
Your other option is to fire off async queries. It's pretty
straightforward w/ the java or python drivers.
On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
wrote:
> I was taking a look at Cassandra anti-patterns list:
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/arch
16 matches
Mail list logo