(sp) Lucandra http://github.com/tjake/Lucandra
On Mon, Apr 26, 2010 at 11:08 PM, Joseph Stein wrote:
> great talk tonight in NYC I attended in regards to using Cassandra as
> a Lucene Index store (really great idea nicely implemented)
> http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-bas
great talk tonight in NYC I attended in regards to using Cassandra as
a Lucene Index store (really great idea nicely implemented)
http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/
so Lucinda uses Cassandra as a distributed cache of indexes =8^)
On Mon, Apr 26, 2010 a
I was attempting to get a snapshot on our cassandra nodes. I get the
following error every time I run nodetool ... snapshot.
Exception in thread "main" java.io.IOException: Cannot run program "ln":
java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuil
I'll work on doing more tests around this. In 0.5 we used a different data
structure that required polling. But this does seem problematic.
-Chris
On Apr 26, 2010, at 7:04 PM, Eric Yu wrote:
> I have the same problem here, and I analysised the hprof file with mat, as
> you said, LinkedBlockQu
I know I can get thrift API version.
However, I writing a CLI for Cassandra in Python with readline support,
and it will supports one-key deploy/upgrade cassandra+thrift remote,
I need to get ApacheCassandra version to make sure it has deploy
successfully.
2010/4/27 Jonathan Ellis
> You can't
Hi Ahmed,
Casandra has a limitation to store value in to database. the maximum size is
2^31-1 byte.
if you have more than 2^31-1 byte, I suggest you to create several chunk
data.
On Mon, Apr 26, 2010 at 3:19 AM, S Ahmed wrote:
> Is there a suggested sized maximum that you can set the value of
I have the same problem here, and I analysised the hprof file with mat, as
you said, LinkedBlockQueue used 2.6GB.
I think the ThreadPool of cassandra should limit the queue size.
cassandra 0.6.1
java version
$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b
You can't get the Cassandra release version, but you can get the
Thrift api version, which is more useful. It's compiled as a constant
VERSION string in your client library. See the comments in
interface/cassandra.thrift.
On Mon, Apr 26, 2010 at 8:14 PM, Shuge Lee wrote:
> Hi all:
> How to get
0.5 has a bug that allows it to OOM itself from replaying the log too
fast. You should upgrade to 0.6.1.
On Mon, Apr 26, 2010 at 12:11 PM, elsif wrote:
>
> Hello. I have a six node cassandra cluster running on modest hardware
> with 1G of heap assigned to cassandra. After inserting about 245
>
On Mon, Apr 26, 2010 at 9:04 AM, Dominique De Vito
wrote:
> (1) has anyone already used Cassandra as an in-memory data grid ?
> If no, does anyone know how far such a database is from, let's say, Oracle
> Coherence ?
> Does Cassandra provide, for example, a (synchronized) cache on the client
> sid
How are you checking that the rows are gone?
Are you experiencing node outages during this?
DC_QUORUM is unfinished code right now, you should avoid using it.
Can you reproduce with normal QUORUM?
On Sat, Apr 24, 2010 at 12:23 PM, Joost Ouwerkerk wrote:
> I'm having trouble deleting rows in Cas
On Fri, Apr 23, 2010 at 3:32 PM, Robert wrote:
> I am starting out with Cassandra and I had a couple of questions, I read a
> lot of the documentation including:
> http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
> First I wanted to make sure I understand this
> bug: http://issues.apa
Hi all:
How to get apache cassandra version with thrift client ?
Thanks for reply.
--
Shuge Lee | Lee Li | 李蠡
Live nodes that have tokens indicating they should receive a copy of
data count towards write quorum. This means if a node is down (not
decommissioned) the copy sent to the node acting as the hinted handoff
replica will not count towards achieving quorum. If a token is moved,
it is moved. It is
I call tragedy a 'Cassandra Object Abstraction' (COA), because I try
write a reusable implementation of patterns that are commonly used for
cassandra data modeling. E.g. using TimeUUID columns for storing an
Index is a pattern. Then various strategies to partition these Indexes
are another pattern.
> Increasing the replication level is known to break it.
Thanks! Yes, of that I am aware. When I said ring changes I meant
nodes being added and removed, or just re-balanced, implying tokens
moving around the ring.
--
/ Peter Schuller aka scode
Increasing the replication level is known to break it.
--Original Message--
From: Peter Schuller
Sender: sc...@scode.org
To: user@cassandra.apache.org
ReplyTo: user@cassandra.apache.org
Subject: Quorom consistency in a changing ring
Sent: Apr 26, 2010 21:55
Hello,
Is my interpretation co
Hello,
Is my interpretation correct that Cassandra is intended to guarantee
quorom consistency (overlapping read/write sets) at all times,
including a ring that is actively changing? I.e., there are no
(intended) cases where qurom consistency is defeated due to writes or
reads going to nodes that
I've broken this case down further to some pyton code that works against
the thrift generated
client and am still getting the same odd results. With keys obejct1,
object2 and object3 an
open ended get_range_slice starting with "object1" only returns object1 and
2.
I'm guessing that I've got so
On Mon, Apr 26, 2010 at 2:15 PM, Anthony Molinaro
wrote:
> I think it might be worse case that you read all the disks. If your
> block size is large enough to hold an entire row, you should only have to
> read one disk to get that data.
And conversely, for a large enough row you might benefit fro
I think it might be worse case that you read all the disks. If your
block size is large enough to hold an entire row, you should only have to
read one disk to get that data.
I for instance, stopped using multiple data directories and instead use
a RAID0. The number of blocks read is not the same
http://wiki.apache.org/cassandra/UUID if you don't need transactional
ordering, ZooKeeper or something comparable if you do.
2010/4/26 Roland Hänel
> Typically, in the SQL world we use things like AUTO_INCREMENT columns that
> let us create a unique key automatically if a row is inserted into a
Typically, in the SQL world we use things like AUTO_INCREMENT columns that
let us create a unique key automatically if a row is inserted into a table.
What do you guys usually do to create identifiers for use in Cassandra?
Do we only rely on "currentTimeMills() + random()" to create something tha
RAID0 decreases the performance of muliple, concurrent random reads because
for each read request (I assume that at least a couple of stripe sizes are
read), all hard disks are involved in that read.
Consider the following example: you want to read 1MB out of each of two
files
a) both files are o
Short version: Matt Pfeil and I have founded http://riptano.com to
provide production Cassandra support, training, and professional
services. Yes, we're hiring.
Long version:
http://spyced.blogspot.com/2010/04/and-now-for-something-completely.html
We're happy to answer questions on- or off-list
On Sat, Apr 24, 2010 at 10:20 AM, dir dir wrote:
> In general what is the difference between Cassandra and HBase??
>
> Thanks.
>
Others have already said it ...
Cassandra has a peer architecture, with all peers being essentially
equivalent (minus the concept of a "seed," as far as I can tell).
Thanks Chris
2010/4/26 Chris Goffinet
> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
> LinkedBlockQueue issues that were fixed.
>
> -Chris
>
>
> 2010/4/26 Roland Hänel
>
>> Cassandra Version 0.6.1
>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>> Import speed is about
2010/4/26 Roland Hänel :
> Ryan, I agree with you on the hot spots, however for the physical disk
> performance, even the worst case hot spot is not worse than RAID0: in a hot
> spot scenario, it might be that 90% of your reads go to one hard drive. But
> with RAID0, 100% of your reads will go to *
Upgrade to b20 of Sun's version of JVM. This OOM might be related to
LinkedBlockQueue issues that were fixed.
-Chris
2010/4/26 Roland Hänel
> Cassandra Version 0.6.1
> OpenJDK Server VM (build 14.0-b16, mixed mode)
> Import speed is about 10MB/s for the full cluster; if a compaction is going
>
Cassandra Version 0.6.1
OpenJDK Server VM (build 14.0-b16, mixed mode)
Import speed is about 10MB/s for the full cluster; if a compaction is going
on the individual node is I/O limited
tpstats: caught me, didn't know this. I will set up a test and try to catch
a node during the critical time.
Than
On 04/26/2010 03:11 PM, Tatu Saloranta wrote:
On Mon, Apr 26, 2010 at 10:35 AM, Ethan Rowe wrote:
On 04/26/2010 01:26 PM, Isaac Arias wrote:
On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
...
In my opinion, a mapping solution for Cassandra should be more like a
Template. Something
On Sun, Apr 25, 2010 at 5:43 PM, Jonathan Ellis wrote:
> On Sun, Apr 25, 2010 at 5:40 PM, Tatu Saloranta wrote:
>>> Now with TimeUUIDType, if two UUID have the same timestamps, they are
>>> ordered
>>> by bytes order.
>>
>> Naively for the whole UUID? That would not be good, given that
>> timest
Ryan, I agree with you on the hot spots, however for the physical disk
performance, even the worst case hot spot is not worse than RAID0: in a hot
spot scenario, it might be that 90% of your reads go to one hard drive. But
with RAID0, 100% of your reads will go to *all* hard drives.
But you're rig
Which version of Cassandra?
Which version of Java JVM are you using?
What do your I/O stats look like when bulk importing?
When you run `nodeprobe -host tpstats` is any thread pool backing up
during the import?
-Chris
2010/4/26 Roland Hänel
> I have a cluster of 5 machines building a Cass
I have a cluster of 5 machines building a Cassandra datastore, and I load
bulk data into this using the Java Thrift API. The first ~250GB runs fine,
then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
and row or index caches, and since I only have 5 CF's and some 2,5 GB of
On Mon, Apr 26, 2010 at 10:35 AM, Ethan Rowe wrote:
> On 04/26/2010 01:26 PM, Isaac Arias wrote:
>>
>> On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
>>
...
>> In my opinion, a mapping solution for Cassandra should be more like a
>> Template. Something that helps map (back and forth) rows to
2010/4/26 Roland Hänel :
> Hm... I understand that RAID0 would help to create a bigger pool for
> compactions. However, it might impact read performance: if I have several
> CF's (with their SSTables), random read requests for the CF files that are
> on separate disks will behave nicely - however i
Hm... I understand that RAID0 would help to create a bigger pool for
compactions. However, it might impact read performance: if I have several
CF's (with their SSTables), random read requests for the CF files that are
on separate disks will behave nicely - however if it's RAID0 then a random
read o
Just found the way...
keyRange start and end key will be the same and instead of specifying the
count and start on KeyRange it has to be specified on SliceRange and then
keySlices will come with a single key and a list of columns...
2010/4/25 Rafael Ribeiro
> Hi all!
>
> I am trying to do a p
http://wiki.apache.org/cassandra/CassandraHardware
On Mon, Apr 26, 2010 at 1:06 PM, Edmond Lau wrote:
> Ryan -
>
> You (or maybe someone else) mentioned using RAID-0 instead of multiple
> data directories at the Cassandra hackathon as well. Could you
> explain the motivation behind that?
>
> Tha
Ryan -
You (or maybe someone else) mentioned using RAID-0 instead of multiple
data directories at the Cassandra hackathon as well. Could you
explain the motivation behind that?
Thanks,
Edmond
On Mon, Apr 26, 2010 at 9:53 AM, Ryan King wrote:
> I would recommend using RAID-0 rather that multipl
The real tragedy is that we have not created a new acronym for this yet...
OKVM... it makes more sense...
On Mon, Apr 26, 2010 at 10:35 AM, Ethan Rowe wrote:
> On 04/26/2010 01:26 PM, Isaac Arias wrote:
>
>> On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
>>
>>
>>
>>> Clearly Cassandra is
On 04/26/2010 01:26 PM, Isaac Arias wrote:
On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
Clearly Cassandra is not an RDBMS. The intent of my Hibernate
reference was to be more lyrical. Sorry if that didn't come through.
Nonetheless, the need remains to relieve ourselves
There is, of course, also cassandra_object on the ruby side. I assume
this thread has the implicit requirement of Java, though.
--
Jeff
On Mon, Apr 26, 2010 at 10:26 AM, Isaac Arias wrote:
> On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
>
>> Clearly Cassandra is not an RDBMS. The intent o
On Apr 26, 2010, at 12:13 PM, Geoffry Roberts wrote:
> Clearly Cassandra is not an RDBMS. The intent of my Hibernate
> reference was to be more lyrical. Sorry if that didn't come through.
> Nonetheless, the need remains to relieve ourselves from excessive
> boilerplate coding.
I agree with eli
Hi,
OpenX is looking for someone to work fulltime on Cassandra, we're
located in Pasadena, CA. Here's a link to the job description
http://www.openx.org/jobs/position/software-engineer-infrastructure
We've been running cassandra in production since 0.3.0, and currently
have 3 cassandra cluste
Hello. I have a six node cassandra cluster running on modest hardware
with 1G of heap assigned to cassandra. After inserting about 245
million rows of data, cassandra failed with a
java.lang.OutOfMemoryError: Java heap space error. I rasied the java
heap to 2G, but still get the same error when
On Sun, Apr 25, 2010 at 11:14 AM, Bob Hutchison
wrote:
>
> Hi,
>
> I'm new to Cassandra and trying to work out how to do something that I've
> implemented any number of times (e.g. TokyoCabinet, Perst, even the
> filesystem using grep :-) I've managed to get some of this working in
> Cassandra
I would recommend using RAID-0 rather that multiple data directories.
-ryan
2010/4/26 Roland Hänel :
> I have a configuration like this:
>
>
> /storage01/cassandra/data
> /storage02/cassandra/data
> /storage03/cassandra/data
>
>
> After loading a big chunk of data into cas
Clearly Cassandra is not an RDBMS. The intent of my Hibernate reference was
to be more lyrical. Sorry if that didn't come through.
Nonetheless, the need remains to relieve ourselves from excessive
boilerplate coding.
On Mon, Apr 26, 2010 at 9:00 AM, Ned Wolpert wrote:
> I don't think you are t
I think that once we have built-in indexing (CASSANDRA-749) you can
make a good case for dropping supercolumns (at least, dropping them
from the public API and reserving them for internal use).
On Mon, Apr 26, 2010 at 11:05 AM, Schubert Zhang wrote:
> I don't think the SuperColumn is so necessary
Better fault tolerance? Scalability to large data volumes? A combination of
ZooKeeper based transactions and Cassandra may have better characteristics than
RDBMS on these criteria. There's no question that trade-offs are involved, but
as far as these issues are concerned, you'd be starting from
I don't think the SuperColumn is so necessary.
I think this level of logic can be leaved to application.
Do you think so?
If SuperColumn is needed, as
https://issues.apache.org/jira/browse/CASSANDRA-598, we should build index
in SuperColumns level and SubColumns level.
Thus, the levels of index
I don't think you are trying to convert Cassandra to a RDBMS with what you
want. The issue is that finding a way to map these objects to Cassandra in a
meaningful way is hard. Its not as easy as saying 'do what hibernate does'
simply because its not an RDBMS... but it is a reasonable and useful goa
RandomPartioner is for row-keys.
#1 no
#2 yes
#3 yes
On Sat, Apr 24, 2010 at 4:33 AM, Larry Root wrote:
> I trying to better understand how using the RandomPartitioner will affect
> my ability to select ranges of keys. Consider my simple example where we
> have many online games across differ
I am going to agree with axQd. Having something that does for Cassandra what
say, Hibernate does for RDBMS seems an effort well worth pursuing. I have
some complex object graphs written in Java. If I could annotate them and
get persistence with a well laid out schema. It would be good.
On Mon, A
Hi all,
Had you tried with Tanuki's Java Wrapper? It's so easy to deploy in Windows...
-aah
2010/4/23, Miguel Verde :
> https://issues.apache.org/jira/browse/CASSANDRA-292 points to
> http://commons.apache.org/daemon/procrun.html which is used by other Apache
> software to implement Windows servic
this is what IPartitioner does
On Mon, Apr 26, 2010 at 10:16 AM, Schubert Zhang wrote:
> Hi Jonathan Ellis and Stu Hood,
>
> I think, finally, we should provide a user customizable key abstract class.
> User can define what types of key and its class, which define how to compare
> keys.
>
> Schub
I think you should forget these RDBMS tech.
On Sat, Apr 24, 2010 at 11:00 AM, aXqd wrote:
> On Sat, Apr 24, 2010 at 1:36 AM, Ned Wolpert
> wrote:
> > There is nothing wrong with what you are asking. Some work has been done
> to
> > get an ORM layer ontop of cassandra, for example, with a RubyO
Hi Jonathan Ellis and Stu Hood,
I think, finally, we should provide a user customizable key abstract class.
User can define what types of key and its class, which define how to compare
keys.
Schubert
On Sat, Apr 24, 2010 at 1:16 PM, Stu Hood wrote:
> Your keys cannot be an encoded as binary fo
Orthogonal in this case means "at cross purposes" Transactions can't really be
done with eventual consistency because all nodes don't have all the info at the
time the transaction is done. I think they recommend zookeeper for this kind
of stuff, but I don't know why you want to use Cassandra v
The column index in a row is a sorted-blocked index (like b-tree), just like
bigtable.
On Mon, Apr 26, 2010 at 2:43 AM, Stu Hood wrote:
> The indexes within rows are _not_ implemented with Lucene: there is a
> custom index structure that allows for random access within a row. But, you
> should p
OPP will be marginally faster. Maybe 10%? I don't think anyone has
benchmarked it.
On Fri, Apr 23, 2010 at 10:30 AM, Joost Ouwerkerk wrote:
> In that case I should probably wait for 0.7. Is there any fundamental
> performance difference in get_range_slices between Random and
> Order-Preserving
Hi,
Cassandra comes closer and closer to a data grid like Oracle Coherence:
Cassandra includes distributed "hash maps", partitioning, high
availability, map/reduce processing, (some) request capability, etc.
So, I am wondering about the 2 following (and possible ?) Cassandra's
use cases :
I think that is not what cassandra good at.
On Mon, Apr 26, 2010 at 4:22 AM, Mark Greene wrote:
> http://wiki.apache.org/cassandra/CassandraLimitations
>
>
> On Sun, Apr 25, 2010 at 4:19 PM, S Ahmed wrote:
>
>> Is there a suggested sized maximum that you can set the value of a given
>> key?
>>
Hello Mark,
El 26/04/2010, a las 07:17, Mark Robson escribió:
> I think the solution to this would be to choose your nodes' tokens wisely
> before you start inserting data, and if possible, modify the keys to split
> them better between the nodes.
>
> For example, if your key has two parts, on
Thanks very much. Precisely answers my questions. :-)
2010/4/26 Schubert Zhang
> Please refer the code:
>
> org.apache.cassandra.db.ColumnFamilyStore
>
> public String getFlushPath()
> {
> long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() *
> 1024*1024; // 2* adds
Please refer the code:
org.apache.cassandra.db.ColumnFamilyStore
public String getFlushPath()
{
long guessedSize = 2 * DatabaseDescriptor.getMemtableThroughput() *
1024*1024; // 2* adds room for keys, column indexes
String location =
DatabaseDescriptor.getDataFileLocationF
thank you so much for your help!
2010-04-26
Bingbing Liu
发件人: Mark Robson
发送时间: 2010-04-26 18:17:53
收件人: user
抄送:
主题: Re: when i use the OrderPreservingPartition, the load is veryimbalance
On 26 April 2010 01:18, 刘兵兵 wrote:
i do some INSERT ,because i will do some scan operation
When starting your cassandra cluster, please configure the InitialToken for
each node, which make the key range balance.
On Mon, Apr 26, 2010 at 6:17 PM, Mark Robson wrote:
> On 26 April 2010 01:18, 刘兵兵 wrote:
>
>> i do some INSERT ,because i will do some scan operations, i use the
>> OrderPres
On 26 April 2010 01:18, 刘兵兵 wrote:
> i do some INSERT ,because i will do some scan operations, i use the
> OrderPreservingPartition method.
>
> the state of the cluster is showed below.
>
> as i predicated the load is very imbalance
I think the solution to this would be to choose your nodes' t
sorry, if specifying the token manually, use:
bin/nodetool -h move
2010/4/26 Roland Hänel
> 1) you can re-balance a node with
>
> bin/nodetool -h token []
>
> specify a new token manually or let the system guess one.
>
> 2) take a look into your system.log to find out why your nodes
On 26 April 2010 00:57, Shuge Lee wrote:
> In Python:
>
> keyspace.columnfamily[key][column] = value
>
> files.video[uuid.uuid4()]['name'] = 'foo.flv'
> files.video[uuid.uuid4()]['path'] = '/var/files/foo.flv'
>
Hi.
Storing the filename in the database will not solve the file storage
problem. C
1) you can re-balance a node with
bin/nodetool -h token []
specify a new token manually or let the system guess one.
2) take a look into your system.log to find out why your nodes are dying.
2010/4/26 刘兵兵
> i do some INSERT ,because i will do some scan operations, i use the
> OrderPres
Hi Jonathan,
Cassandra seems has not a Blob data type. To handle binary large object
data,
we have to use array of byte. I have a question to you. Suppose I have a
MPEG
video files 15 MB. To save this video file into Cassandra database I will
store
this file into array of byte. One day, I feel thi
I have a configuration like this:
/storage01/cassandra/data
/storage02/cassandra/data
/storage03/cassandra/data
After loading a big chunk of data into cassandra, I end up wich some 70GB in
the first directory, and only about 10GB in the second and third one. All
rows are q
76 matches
Mail list logo