Thanks! Can I do that in the CLI? (If so, I'm not getting it to work)
On Thu, Apr 29, 2010 at 7:44 AM, Jonathan Ellis wrote:
> use get_range_slices, with a start key of '', and page through it
>
> On Wed, Apr 28, 2010 at 9:26 AM, David Boxenhorn
> wrote:
> > Is there a Cassandra Navigator,
Hi,
Now I start to know what's really happenning. The INDEX_INTERVAL(in
IndexSummary.java) was set to be 4; so at least 1/4
of the indices are in the heap. For a node with 20M columns, most of the heap
is occupied by indices, and of course a poor performance
with processing large fi
Thanks, Brandon!
When I started the Cassandra daemon it did (seem) to work, but now that I
did what you said (actually, I deleted all the contents of data/) the CLI
works too!
On Wed, Apr 28, 2010 at 11:41 PM, Brandon Williams wrote:
> On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn wrote:
>
>
Yes, it is ture.
Current cassandra has many limitations or bad implementations, especially on
storage level.
In my opinion, these limitations or bad implementations are just
implementation, not the original intention of design.
And I also want to give a suggestion/advice to the project leaders, w
Thanks!
I want have a detailed study of Hector.
On Thu, Apr 29, 2010 at 1:39 PM, Ran Tavory wrote:
> Hi Schubert, I'm sorry Hector isn't a good fit for you, so let's see what's
> missing for your.
>
> On Thu, Apr 29, 2010 at 8:22 AM, Schubert Zhang wrote:
>
>> I found hector is not a good desig
Hi
Can anyone please tell me if we can have duplicate keys in Super Column
Family, if now how can we represent this : -
Article and Category Mapping
clientOne.insert(:ArticleCategory, "12", {"ArticleID" => "123"})
"12", {"ArticleID" =>
"124"})
Thanks Nate!
I tested this parameter before and the result was almost the same. I got an
OutOfMemory error.
Jonathan, I saw that everything is put together in the trunk version since
yesterday.
But in that version I'm trying to connect to Keyspace1 with cassandra-cli
and I'm getting that error:
Hi all,
We have installed Cassandra on Windows and found that with any number of
Cassandra (single, or 3 node cluster) on Windows Vista or Windows Server 2008,
32 or 64 bit, with any load or number of requests we, have:
When client and server are on the same machine, connect/read/write latencie
I learned the hard way, that running py_stress in the src/contrib directory
is a great way to test what kind of speeds you are really getting.
What tools / client are you using to test to get the 200ms number?
stu
On Thu, Apr 29, 2010 at 7:12 AM, Viktor Jevdokimov <
viktor.jevdoki...@adform.com>
Thrift C# sources, thrift generated Cassandra sources, test app built with C#.
Simple connect/write/read operations. No pooling or anything else.
From: Heath Oderman [mailto:he...@526valley.com]
Sent: Thursday, April 29, 2010 2:17 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra on Windows
Is there any practical number can refer to?
Like what’s the size (big one) used in single columns in your application?
From: uncle mantis [mailto:uncleman...@gmail.com]
Sent: Thursday, April 29, 2010 1:57 AM
To: user@cassandra.apache.org
Subject: Re: What's the best maximum size for a sin
If you're getting an internalerror, you need to check the server logs
for the exception that caused it
On Wed, Apr 28, 2010 at 6:20 AM, Julio Carlos Barrera Juez
wrote:
> Hi all!
> I am using org.apache.cassandra.auth.SimpleAuthenticator to use
> authentication in my cluster with one node (with c
are you seeing memtable flushes and compactions in the log?
what does tpstats look like when it's timing out?
spending 2000ms on GC every 50s indicates that it's not GC causing
your problem. (especially when all of them are ParNew, which are
completely non-blocking to other threads)
On Wed, Apr
you can't do range queries from the CLI, no.
On Thu, Apr 29, 2010 at 2:07 AM, David Boxenhorn wrote:
> Thanks! Can I do that in the CLI? (If so, I'm not getting it to work)
>
> On Thu, Apr 29, 2010 at 7:44 AM, Jonathan Ellis wrote:
>>
>> use get_range_slices, with a start key of '', and page
2010/4/29 casablinca126.com :
> Hi,
> Now I start to know what's really happenning. The INDEX_INTERVAL(in
> IndexSummary.java) was set to be 4; so at least 1/4
> of the indices are in the heap. For a node with 20M columns, most of the heap
> is occupied by indices, and of course a poor per
you really shouldn't be using trunk yet, but this is why you are
having problems: http://wiki.apache.org/cassandra/FAQ#no_keyspaces
On Thu, Apr 29, 2010 at 5:47 AM, Daniel Gimenez wrote:
>
> Thanks Nate!
> I tested this parameter before and the result was almost the same. I got an
> OutOfMemory e
Are you using TSocket in the client?. If yes, use TbufferedTransport instead.
Carlos
On 4/29/10, Viktor Jevdokimov wrote:
> Thrift C# sources, thrift generated Cassandra sources, test app built with
> C#. Simple connect/write/read operations. No pooling or anything else.
>
> From: Heath Oderman
So, the first time I ran into the issue, I added a 1G swap file and then I
was able to snapshot just fine. Then after a few hours, I wasn't able to do
snapshots again. So, I added a second swap file of 2G and was now able to
snapshot just fine. My reason for adding and removing the 2G as part of
We want to store objects in Cassandra. In general, the mapping is quite
easy. But for some kinds of objects, we want to be able to read all of them
into memory.
We want to use random partitioning, which means that we can't do a range
query over keys (is this right?). Is there any way to get ALL th
apparently there is now range query support for getting all keys using the RP...
cheers,
jesse
--
jesse mcconnell
jesse.mcconn...@gmail.com
On Thu, Apr 29, 2010 at 08:16, David Boxenhorn wrote:
> We want to store objects in Cassandra. In general, the mapping is quite
> easy. But for some kind
Are you sure that your keyspace is named "keyspace", and not "Keyspace1"
(default)?
/ Roger Schildmeijer
On Thu, Apr 29, 2010 at 2:47 PM, Jonathan Ellis wrote:
> If you're getting an internalerror, you need to check the server logs
> for the exception that caused it
>
> On Wed, Apr 28, 2010
Jonathan, thanks for this pointer. I've new had a look at contrib/mutex.
Coming back to my point, the use of Zookeeper within Cassandra for the
purpose of then being able to deliver a "unique key generation function" out
of Cassandra seems like overkill, in this case the application could use
Zooke
Hey All,
I'm trying to run some tests on cassandra an Hadoop integration. I'm
basically following the word count example at
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.javausing
the ColumnFamilyInputFormat.
Currently I have one-node cassandra and hadoop setup
How do I do that???
On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell
wrote:
> apparently there is now range query support for getting all keys using the
> RP...
>
> cheers,
> jesse
>
> --
> jesse mcconnell
> jesse.mcconn...@gmail.com
>
>
>
> On Thu, Apr 29, 2010 at 08:16, David Boxenhorn wrote:
The default batch size is 4096, which means that each call to
get_range_slices retrieves 4,096 rows. I have found that this causes
timeouts when cassandra is under load. Try reducing the batchsize
with a call to ConfigHelper.setRangeBatchSize(). This has eliminated
the TimedOutExceptions for us.
take a look at get_range_slices and start with "".
then invoke get_range_slices again, but this time use the last key as the start
key
// Roger Schildmeijer
On 29 apr 2010, at 16.28em, David Boxenhorn wrote:
> How do I do that???
>
> On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell
> wrote
So now we can do any kind of range queries, not just "for getting all keys"
as Jesse said?
On Thu, Apr 29, 2010 at 6:04 PM, Roger Schildmeijer
wrote:
> take a look at get_range_slices and start with "".
> then invoke get_range_slices again, but this time use the last key as the
> start key
>
> //
What is the upper limit on the number of super columns? Is it pretty much the
same as for columns in general?
On Apr 28, 2010, at 10:09 PM, Schubert Zhang wrote:
> key : stock ID, e.g. AAPL+year
> column family: closting price and valume, tow CFs.
> colum name: timestamp LongType
>
> AAPL+201
The max size would probably be best determined by looking at the size of your
MemTable
64
Read repair is on a per column basis, every column gets a timestamp, and the
overhead of a name. So, balance those 3 out and you have a pretty good idea of
what to do.
From: Dop Sun [mailto:su...@d
One of your problems here is the connect uses a daft connection string
convention
You would think node:port but it's actually node/port
Your connection only succeeded because 9160 is the default for port not
specified.
And the keyspace thing that jbellis mentioned.
-Original Message-
At the moment they all have to fit in memory during compaction. Columns OR
SuperColumns (for one Key).
From: Andrew Nguyen [mailto:andrew-lists-cassan...@ucsfcti.org]
Sent: Thursday, April 29, 2010 10:30 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra data model for financial data
What
Hello Jeff,
Thank you for your comments, bu the problem is not about the RangeBatchSize.
In the case of the configuration parameter,
mapred.tasktracker.map.tasks.maximum > 1
all the map task times out, they don't even run a single line of code in the
Mapper.map() function.
In the case of the con
Hi,
I've been trying to use Cassandra for some kind of a supplementary input
source for Hadoop MapReduce jobs.
The default usage of the ColumnFamilyInputFormat does a full columnfamily
scan for using within the MapReduce framework as map input.
However I believe that, it should be possible to gi
Sounds like you want something like http://oss.oetiker.ch/rrdtool/
Assuming you are trying to store computer log data.
Do you have any other data that can spread the data load? Like a machine name?
If so, you can use a hash of that value to place that "machine" randomly on
the net, then appe
Thanks Mark!
you're right :-)
Jonathan,
I tested everything with the patch and I had the same OutOfMemoryError after
some "Concurrent Mode Failure". Now, I'm trying to distribute the load of
Cassandra among 4 servers, maybe if the JVM is more "relaxed" it has enough
time to do the GC without pro
Is there a preference as to which JRE is used for cassandra?
Lee Parker
On Thu, Apr 29, 2010 at 10:19 AM, David Boxenhorn wrote:
> So now we can do any kind of range queries, not just "for getting all keys"
> as Jesse said?
>
With RP, the key ranges are based on the MD5 sum of the key, so it's really
only useful for getting all keys, or obtaining a semi-random row.
use dynamic column names.
make a CF called Articles, have row key = 12, first column name 123,
next column name 124, etc.
On Thu, Apr 29, 2010 at 4:40 AM, vineet daniel wrote:
> Hi
>
> Can anyone please tell me if we can have duplicate keys in Super Column
> Family, if now how can we represent t
the correct data model is one where you can pull the data you want out
as a slice of a row, or (sometimes) as a slice of sequential rows.
usually this involves writing the same data to multiple columnfamilies
at insertion time, so when you do queries you don't need to do joins.
On Wed, Apr 28, 201
2010/4/29 Roland Hänel :
> Imagine the following rule: if we are in doubt whether to repair a column
> with timestamp T (because two values X and Y are present within the cluster,
> both at timestamp T), then we always repair towards X if md5(X) this case, even after an inconsistency on the first i
It's technically possible but 0.6 does not support this, no.
What is the use case you are thinking of?
On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu wrote:
> Hi,
>
> I've been trying to use Cassandra for some kind of a supplementary input
> source for Hadoop MapReduce jobs.
>
> The default us
most people use sun jdk or openjdk. for those you want u19 or u20.
On Thu, Apr 29, 2010 at 2:09 PM, Lee Parker wrote:
> Is there a preference as to which JRE is used for cassandra?
>
> Lee Parker
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professi
I'm currently writing collected data continuously to Cassandra, having keys
starting with a timestamp and a unique identifier (like
2009.01.01.00.00.00.RANDOM) for being able to query in time ranges.
I'm thinking of running periodical mapreduce jobs which will go through a
designated time period.
Ok, I reproduced without mapred. Here is my recipe:
On a single-node cassandra cluster with basic config (-Xmx:1G)
loop {
* insert 5,000 records in a single columnfamily with UUID keys and
random string values (between 1 and 1000 chars) in 5 different columns
spanning two different supercolumn
On Thu, 2010-04-29 at 14:09 -0500, Lee Parker wrote:
> Is there a preference as to which JRE is used for cassandra?
There are people using both. To the best of my knowledge, there's never
been any evidence that one is a better choice for Cassandra than
another.
--
Eric Evans
eev...@rackspace.com
All,
Does anyone know of a program (series of classes) that can capture the key
distribution of the rows in a ColumnFamily, sort of a [sub] string-histogram.
Thanks,
Carlos
This email message and any attachments are for the sole use of the intended
recipients and may contain proprietary and/o
MD5 is not a perfect hash, it can produce collisions, how are these dealt with?
Is there a size appended to them?
If 2 keys collide, would that result in a merging of data (if the column names
aren't the same) or an overwrite if they were?
When making rough calculations regarding the potential size of a single row,
what sort of overhead is there to consider? In other words, for a particular
column, what else is there to consider in terms of memory consumption besides
the value itself?
On Apr 29, 2010, at 8:49 AM, Mark Jones wrot
48 matches
Mail list logo