Thank you. That is very good news. I can sort the results myself - what is
important is that I get them!
On Thu, May 13, 2010 at 2:42 AM, Vijay wrote:
> If you use Random partitioner, You will *NOT* get RowKey's sorted.
> (Columns are sorted always).
>
> Answer: If used Random partitioner
> True
You can choose to have keys ordered by using an
OrderPreservingPartioner with the trade-off that key ranges can get
denser on certain nodes than others.
On Wed, May 12, 2010 at 7:48 PM, philip andrew wrote:
>
> Hi,
> From my understanding, Cassandra entities are indexed on only one key, so
> this
Well, ain't that a kick in the tush. :-/
I grabbed the svn trunk, but build failed, probably because Fedora 11 is too
old for this bleeding edge stuff. Server is upgrading itself now, but I
wondered: is anyone using an rpm-based distro for Net::Cassandra::Easy and the
svn Cassandra?
Thanks.
Hi,
>From my understanding, Cassandra entities are indexed on only one key, so
this can be a problem if you are searching for example by two values such as
if you are storing an entity with a x,y then wish to search for entities in
a box ie x>5 and x<10 and y>5 and y<10. MongoDB can do this, Cassa
Although, if replication factor spans all nodes, then the disparity in
row allocation should be a non-issue when using
OrderPreservingPartitioner.
On Wed, May 12, 2010 at 6:42 PM, Vijay wrote:
> If you use Random partitioner, You will NOT get RowKey's sorted. (Columns
> are sorted always).
> Answ
Oh, thanks to Andrey Panov for providing that example, btw. We are
always looking for good usage examples to post on the Hector wiki If
anyone else has them.
-Nate
On Wed, May 12, 2010 at 5:01 PM, Nathan McCall wrote:
> Here is a basic example using get_range_slices to retrieve 500 rows via
> he
Here is a basic example using get_range_slices to retrieve 500 rows via hector:
http://github.com/bosyak/hector/blob/master/src/main/java/me/prettyprint/cassandra/examples/ExampleReadAllKeys.java
To page, use the last key you got back as the start key.
-Nate
On Wed, May 12, 2010 at 3:37 PM, Core
If you use Random partitioner, You will *NOT* get RowKey's sorted. (Columns
are sorted always).
Answer: If used Random partitioner
True True
Regards,
On Wed, May 12, 2010 at 1:25 AM, David Boxenhorn wrote:
> You do any kind of range slice, e.g. keys beginning with "abc"? But the
> results w
Can someone point me to a thrift sample (preferable java) to list all the
rows in a ColumnFamily for my Cassandra server. I noticed some examples
using SlicePredicate and SliceRange to perform a similar query against the
columns with paging, but I was looking for something similar for rows with
pa
I tried searching mail-archive, but the search feature is a bit wacky (or
more probably I don't know how to use it).
What are the key differences between Cassandra and Mongodb?
Is there a particular use case where each solution shines?
On Thu, May 13, 2010 at 12:34 AM, Moses Dinakaran
wrote:
> I wanted to remove the records based upon the value of the column ses_tstamp
> ie (delete from sessions where ses_tstamp between XXX & YYY OR delete from
> session where ses_tstamp < XXX )
>
> Is it possible to achieve this in Cassandra If
On Wed, May 12, 2010 at 2:02 AM, David Vanderfeesten wrote:
>...
>
> My concern with the denormalization approach is that it shouldn't be managed
> by the client side because this has big impact on your throughput. Is the
> map-reduce in that respect any better?
> Wouldn't it be nice to support a
Hi, I read your post and noticed you are running Cassandra on win 2008.
Do you run it in a production environment?
I'm contacting you because there aren't that many windows installations. I need
to provide a live Cassandra environment on win 2008 and was stumbling into some
problems with node to
Picked up out of config, I mean.
On Wed, May 12, 2010 at 11:10 AM, James Golick wrote:
> Hmmm that's definitely what we're seeing. Although, we aren't seeing
> cache settings being picked up properly on a restart either.
>
>
> On Wed, May 12, 2010 at 8:42 AM, Ryan King wrote:
>
>> It's a bug
Hmmm that's definitely what we're seeing. Although, we aren't seeing
cache settings being picked up properly on a restart either.
On Wed, May 12, 2010 at 8:42 AM, Ryan King wrote:
> It's a bug:
>
> https://issues.apache.org/jira/browse/CASSANDRA-1079
>
> -ryan
>
> On Wed, May 12, 2010 at 8:1
I've been trying to improve the time it takes to map 30 million rows using a
hadoop / cassandra cluster with 30 nodes. I discovered that since
CassandraInputFormat returns an ordered list of splits, when there are many
splits (e.g. hundreds or more) the load on cassandra is horribly unbalanced.
e
The functionality of a WHERE clause usually means maintaining an
inverted index, usually another CF, on the information of interest
(ses_tstamp in your example). You then retrieve index rows from that
CF to find the data rows.
b
On Wed, May 12, 2010 at 5:34 AM, Moses Dinakaran
wrote:
> Hi All,
On Wed, May 12, 2010 at 5:46 PM, Johan Oskarsson wrote:
> Looking over the code this is in fact an issue in 0.6.
> It's fixed in trunk/0.7. Connections will be reused and closed properly, see
> https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details.
>
> We can either backport that
D'oh, forgot to search the JIRA on this one. Thanks Jonathan!
On Wed, May 12, 2010 at 9:37 AM, Jonathan Ellis wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-856
>
> On Tue, May 11, 2010 at 3:44 PM, Tobias Jungen
> wrote:
> > Yet another BMT question, thought this may apply for regular
What makes cassandra a poor choice is the fact that, you can't use a
keyrange as input for the map phase for Hadoop.
On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis wrote:
> On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati
> wrote:
> > - First of all, my first thoughts is to have two CF o
Looking over the code this is in fact an issue in 0.6.
It's fixed in trunk/0.7. Connections will be reused and closed properly, see
https://issues.apache.org/jira/browse/CASSANDRA-1017 for more details.
We can either backport that patch or make at least close the connections
properly in 0.6. Ca
It's a bug:
https://issues.apache.org/jira/browse/CASSANDRA-1079
-ryan
On Wed, May 12, 2010 at 8:16 AM, James Golick wrote:
> When I first brought this cluster online, the storage-conf.xml file had a
> few cache capacities set. Since then, we've completely changed how we use
> cassandra's cachi
Have you checked your open file handler limit? You can do that by using
"ulimit" in the shell. If it's too low, you will encounter the "too many
open files" error. You can also see how many open handlers an
application has with "lsof".
Héctor Izquierdo
On 12/05/10 17:00, gabriele renzi wrote:
When I first brought this cluster online, the storage-conf.xml file had a
few cache capacities set. Since then, we've completely changed how we use
cassandra's caching, and no longer use any of the caches I setup in the
original configuration.
I'm finding that cassandra doesn't want to keep my new
On Wed, May 12, 2010 at 4:43 PM, Jonathan Ellis wrote:
> On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote:
>> - is it possible that such errors show up on the client side as
>> timeoutErrors when they could be reported better?
>
> No, if the node the client is talking to doesn't get a reply
On Wed, May 12, 2010 at 5:11 AM, gabriele renzi wrote:
> - is it possible that such errors show up on the client side as
> timeoutErrors when they could be reported better?
No, if the node the client is talking to doesn't get a reply from the
data node, there is no way for it to magically find ou
Sounds like the sort of error you'd see if you were using
thread-unsafe Thrift clients on multiple threads.
On Tue, May 11, 2010 at 11:23 PM, Waqas Badar
wrote:
> Dear all,
>
> We are using Cassandra on website. Whenever website traffic increases, we
> got the following error (Python):
>
> File "
This is a slightly different way of describing
https://issues.apache.org/jira/browse/CASSANDRA-685
On Tue, May 11, 2010 at 9:01 PM, Jeremy Dunck wrote:
> Reddit posted a blog entry about some recent downtime, partially due
> to issues with Cassandra.
> http://blog.reddit.com/2010/05/reddits-may-2
On Tue, May 11, 2010 at 4:18 PM, Anthony Molinaro
wrote:
> Hi,
>
> I thought that 'nodetool drain' was supposed to flush the commit logs
> through the system, which it appears to do (verified by running ls in
> the commit log directory and seeing no files).
>
> However, it also appears to disable
https://issues.apache.org/jira/browse/CASSANDRA-856
On Tue, May 11, 2010 at 3:44 PM, Tobias Jungen wrote:
> Yet another BMT question, thought this may apply for regular memtables as
> well...
>
> After doing a batch insert, I accidentally submitted the flush command
> twice. To my surprise, the t
On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati
wrote:
> - First of all, my first thoughts is to have two CF one for raw client
> request (~10 millions++ per day) and other for aggregated metrics in some
> defined inteval time like 1min, 5min, 15min... Is this a good approach ?
Sure.
> - I
Thanks Eben
On Wed, May 12, 2010 at 7:33 PM, Eben Hewitt wrote:
> QUORUM is a high consistency level. It refers to the number of nodes that
> have to acknowledge read or write operations in order to be assured that
> Cassandra is in a consistent state. It uses / 2 + 1.
>
> DCQUORUM means "Data
QUORUM is a high consistency level. It refers to the number of nodes that
have to acknowledge read or write operations in order to be assured that
Cassandra is in a consistent state. It uses / 2 + 1.
DCQUORUM means "Data Center Quorum", and balances consistency with
performance. It puts multiple
Hi
I have read about QUORUM but lately came across DCQUORUM. What is it and
whats the difference between the two ?
I don't understand all the problems yet you guys are facing. Just wanted to let
you know that I'm getting my feet wet with Cassandra. In a few days/weeks I'll
be re-reading all your notes again :)
I'm bound to provide a production Cassandra environment based on win 2008 (I
know, I know...It's b
>>If the replication factor is 2, then everything is written twice. So
>>your throughput is cut in half.
throughput of new inserts is cut in half right? I think I was thinking about
capacity in more general terms from the node's perspective. The node has the
ability to write so many operations per
Hi All,
In Cassandra it possible to remove records based upon where condition.
We are planning to move the session and cache table from MySql to Cassandra
and where doing the fesability study. Everything seems to be Ok other than
garbage collection of session table.
Was not able to remove super
Hi!
I am having trouble understanding the "column" terminology Cassandra
uses. I am developing in Ruby. I need to store data for vehicles which
will come in at different times and retrieve data for a specific
vehicle for specific slices of time. So each record could look like:
vehicle_id, { time
There is a per cf read and write latency jmx.
On May 12, 2010 12:55 AM, "Jordan Pittier - Rezel" wrote:
For sure you have to pay particular attention to memory allocation on each
node, especially be sure your servers dont swap. Then you can monitor how
load are balanced among your nodes (nodetoo
a follow up for anyone that may end up on this conversation again:
I kept trying and neither changing the number of concurrent map tasks,
nor the slice size helped.
Finally, I found out a screw up in our logging system, which had
forbidden us from noticing a couple of recurring errors in the logs
About this linear scaling of throughput(with keys perfectly distributed +
requests balanced over all nodes):
I would assume that this is not the case for small number of nodes because
starting from 2 nodes onwards a part of the requests have to be handled by a
proxy node + the actual node responsib
On the scaleability and performance side, I found Yahoo's paper about the
YCSB project interesting (benchmarking some NoSQL solutions with MySQL). See
research.yahoo.com/files/*ycsb*.*pdf.
*My concern with the denormalization approach is that it shouldn't be
managed by the client side because this
You do any kind of range slice, e.g. keys beginning with "abc"? But the
results will not be ordered?
Please answer one of the following:
True True
True False
False False
Explain?
Thanks!
On Sun, May 9, 2010 at 8:27 PM, Vijay wrote:
> True, The Range slice support was enabled in Random Partit
43 matches
Mail list logo