Hello list,
I encountered a problem with streaming in my test cluster but
found nothing like this in JIRA or on the list.
I'm running a test cluster of three nodes, RF=3, Cassandra 0.6.1.
I started the first node and inserted some data, then bootstrapped
the other machines one after the other. N
I wrote this CassandraOutputFormat last year. It is most likely not working
against newer/current versions of Cassandra, but if you want something to work
with it can be used as a starting point.
http://github.com/johanoskarsson/cassandraoutputformat
/Johan
On 30 apr 2010, at 14.14, Utku Can T
Hi all... I am trying to feed a specific list of Cassandra column names in
as input to a Hadoop process, but for some reason it only feeds in some of
the columns I specify, not all.
This is a short description of the problem - I'll see if anyone might have
some insight before I dump a big load of
Hi again.
My system log says:
ERROR [pool-1-thread-1] 2010-05-03 12:54:03,801 Cassandra.java (line 1153)
Internal error processing login
java.lang.RuntimeException: Unexpected authentication problem
at
org.apache.cassandra.auth.SimpleAuthenticator.login(SimpleAuthenticator.java:113)
at
org.apache
You need to define two more properties: passwd.properties and
access.properties (hint
-Dpasswd.properties=/user/schildmeijer/cassandra/conf/passwd.properties and
analogous for access.properties)
// Roger Schildmeijer
On Mon, May 3, 2010 at 1:06 PM, Julio Carlos Barrera Juez <
juliocar...@gmail
Hey All,
I have a simple sample use case,
The aim is to export the columns in a column family into flat files with the
keys in range from k1 to k2.
Since all the nodes in the cluster is supposed to contain some of the
distribution of data, is it possible to make each node dump its own local
data v
Yes, we have already figured that out :)
Thanks!
-Original Message-
From: Carlos Alvarez [mailto:cbalva...@gmail.com]
Sent: Thursday, April 29, 2010 4:03 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra on Windows network latency
Are you using TSocket in the client?. If yes, use
Hello,
Our system (not Cassandra) have backup cluster in different datacenter in case
of primary cluster unavailability or for software upgrades.
100% of traffic goes to primary cluster. We switch 100% traffic to backup
cluster in case above for a short time, then when issues are resolved, traff
I've seen this too (your second case) - it seems like the entire row
contents (or some big subset of the row) are loaded to memory on the
server before any column value is returned. The partitioner selection
did not make any difference to performance in my case. I did not find a
way around this e
Make sure you have disallowed the row cache. If you have row cache, the entire
row do get loaded to memory. Otherwise it is not.
On Mon, May 3, 2010 at 3:06 PM, malsmith wrote:
> I've seen this too (your second case) - it seems like the entire row
> contents (or some big subset of the row) are lo
I am only speaking to your second question.
It may be helpful to think of modeling your storage layout in terms of
* lists
* sets
* hash maps
... and certain combinations of these.
Since there are no schema-defined relations, your relations may appear
implicit between different views or "copies"
That's fine, although GPLv3 software cannot be included in Apache
projects. http://www.apache.org/licenses/GPL-compatibility.html
On Sat, May 1, 2010 at 8:25 PM, Shuge Lee wrote:
> Thanks for reply.
> Add GPLv3 license, github.com/shuge/shuge-cassandra/downloads.
>
> 2010/5/1 Jonathan Ellis
>>
On Sat, May 1, 2010 at 6:34 AM, Rakesh Rajan wrote:
> I am evaluating cassandra to implement activity streams. We currently have
> over 100 feeds with total entries exceeding 32000 implemented using
> redis ( ~320 entries / feed). Would like hear from the community on how to
> use cassandr
Util.range returns a Range object which is end-exclusive. (You want
"Bounds" for end-inclusive.)
On Sun, May 2, 2010 at 7:19 AM, aaron morton wrote:
> He there, I'm still getting odd behavior with get_range_slices. I've created
> a JUNIT test that illustrates the case.
> Could someone take a loo
On Sun, May 2, 2010 at 1:00 PM, James Golick wrote:
> the ConcurrentSkipListMap (ColumnFamily.columns_).
> SliceQueryFilter.getMemColumnIterator @ ~30% - Virtually all the time in
> here is spent in ConcurrentSkipListMap$Values.toArrray()
Besides the UUID optimization you posted, we should do an
Can you reproduce outside the Hadoop environment, i.e. w/ Thrift code?
On Mon, May 3, 2010 at 5:49 AM, Mark Schnitzius
wrote:
> Hi all... I am trying to feed a specific list of Cassandra column names in
> as input to a Hadoop process, but for some reason it only feeds in some of
> the columns I
sstable2json does this. (you'd want to perform nodetool compact
first, so there is only one sstable for the CF you want.)
On Mon, May 3, 2010 at 6:17 AM, Utku Can Topçu wrote:
> Hey All,
>
> I have a simple sample use case,
> The aim is to export the columns in a column family into flat files wi
I suppose you could use rsync (sstable files are immutable, so you
don't need to worry about not getting a "consistent" version of the
data files), but compared to letting Cassandra handle the replication
the way it's designed to,
# you'll generate a lot of disk i/o doing that vs
# your backup clu
Martin,
Please create a ticket and include the relevant parts of your storage-conf.
To summarize, the output gives you the impression that bootstrap has
completed normally, but when you check, it appears to be hung on the
receiving nodes?
Do you mind turning debug on and and seeing if you can re
Hey!
We are currently using Cassandra 0.5.1 and I'm getting a StackOverflowError
when
comparing two ColumnOrSuperColumn objects. It turns out the the comparTo
function
for byte [] has an infinite loop in libthrift-r820831.jar.
We are planning to upgrade to 0.6.1 but not ready to do it today', so j
Trying to add a node to an existing cluster and getting the following
error (using 0.6.1):
INFO [main] 2010-05-03 08:36:58,960 CommitLog.java (line 169) Log
replay complete
INFO [main] 2010-05-03 08:36:58,993 SystemTable.java (line 164) Saved
Token found: 113225717064305079230489016527619806
We make a patch to 0.6 branch and 0.6.1 for this feature.
https://issues.apache.org/jira/browse/CASSANDRA-1041
Seems your adding node is not a "new" node.
INFO [main] 2010-05-03 08:36:58,993 SystemTable.java (line 164) Saved Token
found: 113225717064305079230489016527619806663
INFO [main] 2010-05-03 08:36:58,994 SystemTable.java (line 179) Saved
ClusterName found: Image Cluster
Above log says, this node
It started out new, didn't cut and paste the "original" startup, but
here it is...
INFO [main] 2010-05-03 08:34:43,305 DatabaseDescriptor.java (line 229)
Auto DiskAccessMode determined to be mmap
INFO [main] 2010-05-03 08:34:43,637 SystemTable.java (line 139) Saved
Token not found. Using 113
Is there a reason why the jvm options are so different in the debian version
from the standard cassandra.in.sh? The following lines are completely
missing from the init script:
-XX:TargetSurvivorRatio=90 \
-XX:+AggressiveOpts \
-XX:+UseParNewGC \
-XX:+UseConcMarkSwe
On Mon, 2010-05-03 at 13:30 -0500, Lee Parker wrote:
> I see these in the JVM_EXTRA_OPS line of /etc/defaults/cassandra, but
> I don't see where this data is actually passed into the jvm in the
> init script. Am I missing something?
JVM_EXTRA_OPS should be getting passed (they used to be), so th
I have a CF on our cluster which has several rows with 200k+ columns of
TimeUUID data. I have noticed recently that this CF is reaching my memtable
thresholds (128M or 1.5 mill obj) far more frequently than I would expect
(every 10 minutes or so). This CF is used as an index of items in another
C
Thanks. I'll apply the patch. I'm not real familiar with the JVM options,
but I assume that on a production machine I should remove -Xdebug and the
-Xrunjdwp options.
Lee Parker
On Mon, May 3, 2010 at 2:29 PM, Eric Evans wrote:
> On Mon, 2010-05-03 at 13:30 -0500, Lee Parker wrote:
> > I see t
On Mon, 2010-05-03 at 14:39 -0500, Lee Parker wrote:
> Thanks. I'll apply the patch. I'm not real familiar with the JVM
> options, but I assume that on a production machine I should remove
> -Xdebug and the -Xrunjdwp options.
Yes.
--
Eric Evans
eev...@rackspace.com
i have a cluster which contains two CFs. One is a bunch of rows with 10-15
columns per row. the other is an index of those items with only a few rows,
but thousands of columns per row. i am noticing that the replication of
data between the nodes in the cluster is causing a lot of memtable flushi
Hello Everyone,
Is there a practical formula for determining Cassandra system requirements
using OrderPreservingPartitioner ?
We have hundreds of millions of rows in a single column family with a
potential target of maybe a billion rows.
How can we estimate the Cassandra system requirements give
You'd need to check out thrift r820831, fix the compareTo code, then
build a new jar.
You can't just use the jar from 0.6 or from current thrift trunk
because Thrift breaks backwards compatibility frequently, and there
were such changes between our 0.5 and 0.6.
So, yes, you could do it, but it's
Short answer: no, there is no formula into which you can plug numbers.
Longer answer: benchmark with a subset of your data and extrapolate.
The closer the test data is to real data, the more accurate it will
be. Yes, compaction is O(N) wrt the amount of data in the system, so
don't do it more tha
On Mon, May 03, 2010 at 05:46:51PM -0500, Jonathan Ellis wrote:
> You'd need to check out thrift r820831, fix the compareTo code, then
> build a new jar.
>
> You can't just use the jar from 0.6 or from current thrift trunk
> because Thrift breaks backwards compatibility frequently, and there
> we
If I take the exact same SlicePredicate that fails in the Hadoop example,
and pass it in to a multiget_slice, the data is returned successfully. So
it appears the problem does lie somewhere in the tie-in to Hadoop.
I will try to create a maximally-trimmed-down example that's complete enough
to ru
Let me rephrase my question.
How does Cassandra deal with bloom filter's false positives on deleted records?
The bloom filters can answer false positives, especially for deleted
records. How does Cassandra detect them?
And, how does Cassandra remove those *detected* false positives from
the blo
We serialize the SlicePredicate as part of the Hadoop Configuration
string. It's quite possible that either
- one of your column names is exposing a bug in the Thrift json serializer
- Hadoop is silently truncating large predicates
You should test that getSlicePredicate(conf).equals(originalPr
On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami wrote:
> Let me rephrase my question.
>
> How does Cassandra deal with bloom filter's false positives on deleted
> records?
The same way it deals with tombstones that it encounters otherwise
(part of a row slice, or in a memtable).
All the bloom fi
Thrift is good about wire compatibility. We're talking about running
Java code built against one API, against another version of the thrift
jar. Different ball game.
On Mon, May 3, 2010 at 6:00 PM, Anthony Molinaro
wrote:
>
> On Mon, May 03, 2010 at 05:46:51PM -0500, Jonathan Ellis wrote:
>> Yo
replication in Cassandra is per-operation, not per-row
On Mon, May 3, 2010 at 2:40 PM, Lee Parker wrote:
> I have a CF on our cluster which has several rows with 200k+ columns of
> TimeUUID data. I have noticed recently that this CF is reaching my memtable
> thresholds (128M or 1.5 mill obj) far
Hi all,
I can't figure out how to deal with request routing...
In fact I have two nodes in the "Test Cluster" and I wrote the client as
specified here http://wiki.apache.org/cassandra/ThriftExamples#Java. The
Keyspace is the default one (KeySpace1, replicatorFactor 1..)
The Seeds are well configu
>
> You should test that getSlicePredicate(conf).equals(originalPredicate)
>
>
That's it! The byte arrays are slightly different after setting it on the
Hadoop config. Below is a simple test which demonstrates the bug -- it
should print "true" but instead prints "false". Please let me know if a
I'm looking into performance issues on a 0.6.1 cluster. I see two symptoms:
1. Reads and writes are slow
2. One of the hosts is doing a lot of GC.
1 is slow in the sense that in normal state the cluster used to make around
3-5k read and writes per second (6-10k operations per second), but how it's
I think you may found the "eventually" in eventually consistent. With a
replication factor of 1, you are allowing the client thread to continue to
the read on node#2 before it is replicated to node 2. Try setting your
replication factor higher for different results.
Jonathan
On Tue, May 4, 2010 a
44 matches
Mail list logo