He there!
1. I'm new to Cassandra.
2. I created multinode cluster which worked well between two computers
in LAN (192.168.1.31, 192.168.1.33) with reference to the official
wiki.
3. When creating multinode cluster between my computer (192.168.1.31)
and a remote server ( y.y.y.y ) it didn't work.
I
Don't think of it as getting rid of supercolum. Think of it as adding
superdupercolums, supertriplecolums, etc. Or, in sparse array terminology:
array[dim1][dim2][dim3].[dimN] = value
Or, as said above:
On Mon, May 10, 2010 at 11:
Hi
Can we count the total no. of columns in a ColumnFamily, if yes how ?
Exactly.
On Tue, May 11, 2010 at 10:20, David Boxenhorn wrote:
> Don't think of it as getting rid of supercolum. Think of it as adding
> superdupercolums, supertriplecolums, etc. Or, in sparse array terminology:
> array[dim1][dim2][dim3].[dimN] = value
>
> Or, as said above:
>
> Type="UTF8
> Can we count the total no. of columns in a ColumnFamily, if yes how ?
You should have a look at the map reduce support. There is an example
in contrib/word_count.
Now if the question was, is there a simple api call that returns the total no.
of columns in a ColumnFamily, then the answer is no.
Turns out the problem is with batch mutate. I mutate chunks 100 times
bigger, it goes 100 times faster.
Now I have a problem with running out of memory sometimes
On Mon, May 10, 2010 at 8:17 PM, B. Todd Burruss wrote:
> have you put your commit log on a disk by itself? not a logical parti
Hi
I have a column named colom. Can we update column name "colom" to
"column" during runtime or via API ?
Hi
Can we make range search on ID:ID format as this would be treated as
single ID by API or can it bifurcate on ':' . If now then how do can
we ignore usage of supercolumns where we need to associate 'n' number
of rows to a single ID.
Like
CatID1-> articleID1
CatID1-> articleID
I am saving a large amount of data to Cassandra using batch mutate. I have
found that my speed is proportional to the size of the batch. It was very
slow when I was inserting one row at a time, but when I created batches of
100 rows and mutated them together, it went 100 times faster. (OK, I didn't
I like to base my batch sizes off of the total number of columns
instead of the number of rows. This effectively means counting the
number of Mutation objects in your mutation map and submitting the
batch once it reaches a certain size. For my data, batch sizes of
about 25,000 columns work best. Yo
Thanks a lot! 25,000 is a number I can work with.
Any other suggestions?
On Tue, May 11, 2010 at 3:21 PM, Ben Browning wrote:
> I like to base my batch sizes off of the total number of columns
> instead of the number of rows. This effectively means counting the
> number of Mutation objects in y
We have this problem initially. but it disappeared after several days'
operation. so we have no chance to investigate problems more.
2010/5/10 Даниел Симеонов
> Hi,
> I've experienced the same problem, two nodes got stuck with CPU at 99%
> and the following source code from IncomingStreamRead
The main thing is to test on your data - 25k works great for me but if
your values are substantially smaller or larger it might not for you.
Not specific to batches, but if you have a decent size cluster and
have to do lots of inserts make sure your client is multi-threaded so
that it's not the bo
On Mon, May 10, 2010 at 17:01, Tatsuya Kawano wrote:
> Hi,
>
> Does Cassandra support rolling restart recipe between minor version
> upgrade? I mean rolling restart is a way to upgrade Cassandra version
> or change configuration **without** bringing down the whole cluster.
> The recipe will be som
On Tue, May 11, 2010 at 04:45, vd wrote:
> Hi
>
> I have a column named colom. Can we update column name "colom" to
> "column" during runtime or via API ?
>
This will require two operations: remove and insert.
Gary.
On Tue, May 11, 2010 at 06:54, David Boxenhorn wrote:
> My problem is that my rows are of very varying degrees of bushiness (i.e.
> number of supercolums and columns per row). I inserted 592,500 rows
> successfully, in a few minutes, and then I hit a batch of exceptionally
> bushy rows and ran out
multiget performs in O(N) with the number of rows requested. so will
range scanning.
if you want to query millions of records of one type i would create a
CF per type and use hadoop to parallelize the computation.
On Fri, May 7, 2010 at 6:16 PM, James wrote:
> Hi all,
> Apologies if I'm still s
You didn't give a lot of details about the remoteness of the remote
server. Remote hosts not be able to contact any host on the
192.168.*.* over the internet without routing support. If the remote
host is on the same network as the 192.168.*.* host, it should work
unless one of those hosts is run
s/keyspace/token/ and you've got it.
On Mon, May 10, 2010 at 10:34 AM, David Koblas wrote:
> Sounds great, will give it a go. However, just to make sure I understand
> getting the keyspace correct.
>
> Lets say I've got:
> A -- Node before overfull node in keyspace order
> O -- Overfull no
Sounds like https://issues.apache.org/jira/browse/THRIFT-638
On Tue, May 11, 2010 at 1:50 AM, Jared Laprise wrote:
> Hello all, I’m really stumped on this issue.
>
>
>
> I’m using the PHP Thrift client along with Pandra.
>
>
>
> I have a ColumnFamily `Groups` and once I set the `description` colu
Using multiple client threads (w/ pooled thrift connections) will be
even better than mutating really large chunks at a time.
On Tue, May 11, 2010 at 4:16 AM, David Boxenhorn wrote:
> Turns out the problem is with batch mutate. I mutate chunks 100 times
> bigger, it goes 100 times faster.
>
> Now
I have a similar issue, but I can't create a CF per type, because types are
an open-ended set in my case (they are geographical locations). So I wanted
to have one CF for types, and a supercolumn for each type, with the keys as
columns per supercolumn.
Is it a problem for me to have millions of co
This is one of the sticking points with the key concatenation
argument. You can't simply access subpartitions of data along an
aggregate name using a concatenated key unless you can efficiently
address a range of the keys according to a property of a subset. I'm
hoping this will bear out with more
I would like an API with a variable number of arguments. Using Java varargs,
something like
value = keyspace.get("articles", "cars", "John Smith", "2010-05-01",
"comment-25");
or
valueArray = keyspace.get("articles", predicate1, predicate2, predicate3,
predicate4);
The storage layout would be
I am evaluating Cassandra, and Read latency is the biggest concern in terms
of performance. As I test various scenarios and configurations I am getting
surprising results. I have a 2 node cluster with both nodes connected to
direct attached storage. The read latency pulling data off the raid 10
sto
RAID may be less valuable to you here. More useful to you would be to
split the storage according to
http://wiki.apache.org/cassandra/CassandraHardware
When Cassandra is accessing effectively random parts of a large data
store, expect it to be constantly hitting certain "always hot" parts
of files
> isolated requests, obviously in scale the RAID should perform better... I
> have not started testing concurrent reads in scale as the single reads are
> too slow to begin with. I am getting 20-30ms response time off of internal
Concurrent reads is what you need to do in order to see the benefit
Hi Stu,
Thanks for your hard work. That's not a easy work.
With my partners, after days of reading of the code.
We really know that current code implementation of the storage-layer should
be rewrite for a clear implementation.
On Tue, May 11, 2010 at 12:44 AM, Stu Hood wrote:
> I think that it
I appreciate to let cassandra core data model clear and pure.
On Tue, May 11, 2010 at 5:20 AM, Mike Malone wrote:
> On Mon, May 10, 2010 at 1:38 PM, AJ Chen wrote:
>
>> Could someone confirm this discussion is not about abandoning supercolumn
>> family? I have found modeling data with supercol
On Mon, May 10, 2010 at 11:36 PM, vd wrote:
> Hi Mike
>
> AFAIK cassandra queries only on keys and not on column names, please
> verify.
>
Incorrect. You can slice a row or rows (identified by a key) on a column
name range (e.g., "a" through "m") or ask for specific columns in a row or
rows (e.g
"(I originally saw 3-5 ms read latency with a small amount of data and 1
Keyspace/CF)? "
The 3~5ms latency is offered by the Filesystem page cache.
Because your dataset is small, it can be cached totally by Filesystm.
2010/5/11 Peter Schüller
> > isolated requests, obviously in scale the RAID
In current 0.6.1, after a long time of compation, the old SSTable files are
still there, with the mark of
"CFName-id-Compacted" zero sized file.
Whey not delete them immediately? What is the policy in 0.6.1?
See following examples.
-rw-rw-r-- 1 cassandra cassandra 0 May 11 23:35 LZO-12
Hi.
I undertstood I can use this structure with cassandra :
KeySpace_3nodeslevels = {
Key_Family_alfa : {
Key_Super_Column_A : { Key_Column_1: "Value_alfa_A_1",
Key_Column_2: "Value_alfa_A_2"},
Key_Super_Column_B : { Key_Column_1: "Value_alfa_B
In the future, maybe cassandra can provide some "Filter" or "Coprocessor"
interfaces. Just like what of Bigtable do.
But now, cassandra is too young, there are many things to do for a clear
core.
On Tue, May 11, 2010 at 11:35 PM, Mike Malone wrote:
> On Mon, May 10, 2010 at 11:36 PM, vd wrote:
Is it a problem for me to have millions of columns in a supercolumn?
You will have problem, because there is no index in supercolumn for
subcolumns.
On Tue, May 11, 2010 at 10:03 PM, David Boxenhorn wrote:
> I have a similar issue, but I can't create a CF per type, because types are
> an open-en
On Tue, May 11, 2010 at 7:46 AM, David Boxenhorn wrote:
> I would like an API with a variable number of arguments. Using Java
> varargs, something like
>
> value = keyspace.get("articles", "cars", "John Smith", "2010-05-01",
> "comment-25");
>
> or
>
> valueArray = keyspace.get("articles", predic
On Tue, May 11, 2010 at 8:54 AM, Schubert Zhang wrote:
> In the future, maybe cassandra can provide some "Filter" or "Coprocessor"
> interfaces. Just like what of Bigtable do.
> But now, cassandra is too young, there are many things to do for a clear
> core.
There's been talk of adding coproces
If I had 10 Cassandra nodes each with a write capacity of 5K per second
and a replication factor of 2, would that mean the expected write
capacity of the system would be ~25K writes per second because the nodes
are also serving other nodes and not just clients?
I know this is highly simplified
you can try this benchmarking tool to compare your drive(s)
http://freshmeat.net/projects/fio/
... you can simulate various loads, etc. my RAID0 outperforms single
drive (as mentioned below) under heavy concurrent reads.
On 05/11/2010 08:15 AM, Peter Schüller wrote:
isolated requests, obvio
i was thinking about doing some testing with 0.6.2 ... do the devs
consider the tip of 0.6 branch ok to test with?
from http://wiki.apache.org/cassandra/ArchitectureInternals:
Making this concurrency-safe without blocking writes or reads while we
remove the old SSTables from the list and add the new one is tricky,
because naive approaches require waiting for all readers of the old
sstables to finish before del
Yes
On Tue, May 11, 2010 at 11:19 AM, B. Todd Burruss wrote:
> i was thinking about doing some testing with 0.6.2 ... do the devs consider
> the tip of 0.6 branch ok to test with?
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra
Hi folks,
I'm trying to wrap my head around Net::Cassandra::Easy, and it's making
me cross-eyed.
My prototype app can be seen here:
http://bito.ponzo.net/Hatchet/
The idea is to index logfiles by various keys, using Cassandra's extreme
write speed to keep up with the millions of lines of logfil
On Fri, May 7, 2010 at 6:56 AM, Matt Revelle wrote:
> Reston, VA is a good spot in the DC metro area for tech events.
+1
Thanks jonathan, clear!
On Wed, May 12, 2010 at 12:22 AM, Jonathan Ellis wrote:
> from http://wiki.apache.org/cassandra/ArchitectureInternals:
>
> Making this concurrency-safe without blocking writes or reads while we
> remove the old SSTables from the list and add the new one is tricky,
> becau
another note on this ... since all my nodes are very well balanced and
were started at the same time, i notice that they all do garbage
collection at about the same time. this of course causes a performance
issue.
i also have noticed that with the default JVM options and heavy load,
ConcMark
On Tue, 11 May 2010 09:40:02 -0700 Scott Doty wrote:
SD> I'm trying to wrap my head around Net::Cassandra::Easy, and it's making
SD> me cross-eyed.
SD> My prototype app can be seen here:
SD> http://bito.ponzo.net/Hatchet/
SD> The idea is to index logfiles by various keys, using Cassandra's ex
If you have for example, your replication factor equal to the total amount
of nodes in the ring, I suspect you will hit a brick wall pretty soon.
The biggest impact on your write performance will most likely be the
consistency level of your writes. In other words, how many nodes you want to
wait f
> The biggest impact on your write performance will most likely be the
> consistency level of your writes. In other words, how many nodes you want to
> wait for before you acknowledge the write back to the client.
I believe the consistency level is only expected to have a significant
impact on lat
Hello All,
I guess the subject talks for itself.
I'm currently developing a document analysis engine using cassandra as the
scalable storage.
I just want to briefly make an overview of the data model I'm using for this
purpose.
"the key" is formed in the format of timestamp.random(), so that it'
On 05/11/2010 10:45 AM, Ted Zlatanov wrote:
> SD> My prototype app can be seen here:
>
> SD> http://bito.ponzo.net/Hatchet/
>
>
> The latest N::C::Easy will not work with Cassandra 0.6.x, the only
> target is SVN trunk. I can't discover the API version on the server so
> there's no way to anticipa
Hi all.
I thinking about implementing a real-time WA tool using Cassandra as my
storage. But i have some questions first.
I'm considering Cassandra because of its excellent write performance,
horizontal scalability and its tunable consistency level.
- First of all, my first thoughts is to have t
Mark Greene wrote:
If you have for example, your replication factor equal to the total
amount of nodes in the ring, I suspect you will hit a brick wall pretty
soon.
Right :) So if we said there was 100 nodes at 5K wps with R=2, then
would that suggest the cluster can support 250K wps?
Again
We're working on better GC defaults for 0.6.2. Thanks!
On Tue, May 11, 2010 at 12:00 PM, B. Todd Burruss wrote:
> another note on this ... since all my nodes are very well balanced and were
> started at the same time, i notice that they all do garbage collection at
> about the same time. this o
2010/5/11 Ted Zlatanov :
> The latest N::C::Easy will not work with Cassandra 0.6.x, the only
> target is SVN trunk. I can't discover the API version on the server so
> there's no way to anticipate such breakage as you see (I suspect it's
> due to API mismatch). The Cassandra developers haven't a
On Tue, May 11, 2010 at 11:10 AM, Bill de hOra wrote:
> I know this is highly simplified take on things (ie no consideration for
> reads or quorum), I'm just trying to understand what the implication of
> replication is on write scalability. Intuitively it would seem actual write
> capacity is tot
On Tue, 11 May 2010 14:29:13 -0500 Jonathan Ellis wrote:
JE> 2010/5/11 Ted Zlatanov :
>> The latest N::C::Easy will not work with Cassandra 0.6.x, the only
>> target is SVN trunk. I can't discover the API version on the server so
>> there's no way to anticipate such breakage as you see (I suspe
Yet another BMT question, thought this may apply for regular memtables as
well...
After doing a batch insert, I accidentally submitted the flush command
twice. To my surprise, the target node's log indicates that it wrote a new
*-Data.db file, and the disk usage went up accordingly. I tested and i
Hi,
I thought that 'nodetool drain' was supposed to flush the commit logs
through the system, which it appears to do (verified by running ls in
the commit log directory and seeing no files).
However, it also appears to disable writes completely (ie, scripts attempting
to write data were frozen,
If you have 3-4 nodes, how do you monitor the performance of each node?
For sure you have to pay particular attention to memory allocation on each
node, especially be sure your servers dont swap. Then you can monitor how
load are balanced among your nodes (nodetools -h XX ring).
On Tue, May 11, 2010 at 11:46 PM, S Ahmed wrote:
> If you have 3-4 nodes, how do you mon
I was under the impression from what I've seen talked about on this list
(perhaps I'm wrong here) that given the write throughput of one node in a
cluster (again assuming each node has a given throughput and the same
config) that you would simply multiply that throughput by the number of
nodes you
On Tue, May 11, 2010 at 9:30 PM, Gary Dusbabek wrote:
> If the remote host is on the same network as the 192.168.*.* host, it should
> work
> unless one of those hosts is running a local firewall.
Not quite understand what you meant by "on the same network", but
connecting them
with VPN works.
T
Reddit posted a blog entry about some recent downtime, partially due
to issues with Cassandra.
http://blog.reddit.com/2010/05/reddits-may-2010-state-of-servers.html
This part surprised me:
"
First, Cassandra has an internal queue of work to do. When it times
out a client (10s by default), it still
On Tue, May 11, 2010 at 5:56 PM, Mark Greene wrote:
> I was under the impression from what I've seen talked about on this list
> (perhaps I'm wrong here) that given the write throughput of one node in a
> cluster (again assuming each node has a given throughput and the same
> config) that you woul
Dear all,
We are using Cassandra on website. Whenever website traffic increases,
we got the following error (Python):
File "/usr/local/lib/python2.6/dist-packages/pycassa/columnfamily.py",
line 199, in multiget
self._rcl(read_consistency_level))
File "/usr/local/lib/p
66 matches
Mail list logo