Re: How can I view all data??

2010-04-29 Thread David Boxenhorn
Thanks! Can I do that in the CLI? (If so, I'm not getting it to work) On Thu, Apr 29, 2010 at 7:44 AM, Jonathan Ellis wrote: > use get_range_slices, with a start key of '', and page through it > > On Wed, Apr 28, 2010 at 9:26 AM, David Boxenhorn > wrote: > > Is there a Cassandra Navigator,

Re: Re: compaction slow while sstable>25GB,limitation of the sstablesize?

2010-04-29 Thread casablinca126.com
Hi, Now I start to know what's really happenning. The INDEX_INTERVAL(in IndexSummary.java) was set to be 4; so at least 1/4 of the indices are in the heap. For a node with 20M columns, most of the heap is occupied by indices, and of course a poor performance with processing large fi

Re: How do I change the Cluster Name in the CLI?

2010-04-29 Thread David Boxenhorn
Thanks, Brandon! When I started the Cassandra daemon it did (seem) to work, but now that I did what you said (actually, I deleted all the contents of data/) the CLI works too! On Wed, Apr 28, 2010 at 11:41 PM, Brandon Williams wrote: > On Wed, Apr 28, 2010 at 3:17 AM, David Boxenhorn wrote: > >

Re: Regarding Cassandra Scalability

2010-04-29 Thread Schubert Zhang
Yes, it is ture. Current cassandra has many limitations or bad implementations, especially on storage level. In my opinion, these limitations or bad implementations are just implementation, not the original intention of design. And I also want to give a suggestion/advice to the project leaders, w

Re: Cassandra Java Client

2010-04-29 Thread Schubert Zhang
Thanks! I want have a detailed study of Hector. On Thu, Apr 29, 2010 at 1:39 PM, Ran Tavory wrote: > Hi Schubert, I'm sorry Hector isn't a good fit for you, so let's see what's > missing for your. > > On Thu, Apr 29, 2010 at 8:22 AM, Schubert Zhang wrote: > >> I found hector is not a good desig

can we have duplicate keys ?

2010-04-29 Thread vineet daniel
Hi Can anyone please tell me if we can have duplicate keys in Super Column Family, if now how can we represent this : - Article and Category Mapping clientOne.insert(:ArticleCategory, "12", {"ArticleID" => "123"}) "12", {"ArticleID" => "124"})

Re: Problem with JVM? concurrent mode failure

2010-04-29 Thread Daniel Gimenez
Thanks Nate! I tested this parameter before and the result was almost the same. I got an OutOfMemory error. Jonathan, I saw that everything is put together in the trunk version since yesterday. But in that version I'm trying to connect to Keyspace1 with cassandra-cli and I'm getting that error:

Cassandra on Windows network latency

2010-04-29 Thread Viktor Jevdokimov
Hi all, We have installed Cassandra on Windows and found that with any number of Cassandra (single, or 3 node cluster) on Windows Vista or Windows Server 2008, 32 or 64 bit, with any load or number of requests we, have: When client and server are on the same machine, connect/read/write latencie

Re: Cassandra on Windows network latency

2010-04-29 Thread Heath Oderman
I learned the hard way, that running py_stress in the src/contrib directory is a great way to test what kind of speeds you are really getting. What tools / client are you using to test to get the 200ms number? stu On Thu, Apr 29, 2010 at 7:12 AM, Viktor Jevdokimov < viktor.jevdoki...@adform.com>

RE: Cassandra on Windows network latency

2010-04-29 Thread Viktor Jevdokimov
Thrift C# sources, thrift generated Cassandra sources, test app built with C#. Simple connect/write/read operations. No pooling or anything else. From: Heath Oderman [mailto:he...@526valley.com] Sent: Thursday, April 29, 2010 2:17 PM To: user@cassandra.apache.org Subject: Re: Cassandra on Windows

RE: What's the best maximum size for a single column?

2010-04-29 Thread Dop Sun
Is there any practical number can refer to? Like what’s the size (big one) used in single columns in your application? From: uncle mantis [mailto:uncleman...@gmail.com] Sent: Thursday, April 29, 2010 1:57 AM To: user@cassandra.apache.org Subject: Re: What's the best maximum size for a sin

Re: Login failure with SimpleAuthenticator

2010-04-29 Thread Jonathan Ellis
If you're getting an internalerror, you need to check the server logs for the exception that caused it On Wed, Apr 28, 2010 at 6:20 AM, Julio Carlos Barrera Juez wrote: > Hi all! > I am using org.apache.cassandra.auth.SimpleAuthenticator to use > authentication in my cluster with one node (with c

Re: Inserting files to Cassandra timeouts

2010-04-29 Thread Jonathan Ellis
are you seeing memtable flushes and compactions in the log? what does tpstats look like when it's timing out? spending 2000ms on GC every 50s indicates that it's not GC causing your problem. (especially when all of them are ParNew, which are completely non-blocking to other threads) On Wed, Apr

Re: How can I view all data??

2010-04-29 Thread Jonathan Ellis
you can't do range queries from the CLI, no. On Thu, Apr 29, 2010 at 2:07 AM, David Boxenhorn wrote: > Thanks! Can I do that in the CLI? (If so, I'm not getting it to work) > > On Thu, Apr 29, 2010 at 7:44 AM, Jonathan Ellis wrote: >> >> use get_range_slices, with a start key of '', and page

Re: Re: compaction slow while sstable>25GB,limitation of the sstablesize?

2010-04-29 Thread Jonathan Ellis
2010/4/29 casablinca126.com : > Hi, >        Now I start to know what's really happenning. The INDEX_INTERVAL(in > IndexSummary.java) was set to be 4; so at least 1/4 > of the indices are in the heap. For a node with 20M columns, most of the heap > is occupied by indices, and of course a poor per

Re: Problem with JVM? concurrent mode failure

2010-04-29 Thread Jonathan Ellis
you really shouldn't be using trunk yet, but this is why you are having problems: http://wiki.apache.org/cassandra/FAQ#no_keyspaces On Thu, Apr 29, 2010 at 5:47 AM, Daniel Gimenez wrote: > > Thanks Nate! > I tested this parameter before and the result was almost the same. I got an > OutOfMemory e

Re: Cassandra on Windows network latency

2010-04-29 Thread Carlos Alvarez
Are you using TSocket in the client?. If yes, use TbufferedTransport instead. Carlos On 4/29/10, Viktor Jevdokimov wrote: > Thrift C# sources, thrift generated Cassandra sources, test app built with > C#. Simple connect/write/read operations. No pooling or anything else. > > From: Heath Oderman

Re: error during snapshot

2010-04-29 Thread Lee Parker
So, the first time I ran into the issue, I added a 1G swap file and then I was able to snapshot just fine. Then after a few hours, I wasn't able to do snapshots again. So, I added a second swap file of 2G and was now able to snapshot just fine. My reason for adding and removing the 2G as part of

Basic Architecture Question

2010-04-29 Thread David Boxenhorn
We want to store objects in Cassandra. In general, the mapping is quite easy. But for some kinds of objects, we want to be able to read all of them into memory. We want to use random partitioning, which means that we can't do a range query over keys (is this right?). Is there any way to get ALL th

Re: Basic Architecture Question

2010-04-29 Thread Jesse McConnell
apparently there is now range query support for getting all keys using the RP... cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Thu, Apr 29, 2010 at 08:16, David Boxenhorn wrote: > We want to store objects in Cassandra. In general, the mapping is quite > easy. But for some kind

Re: Login failure with SimpleAuthenticator

2010-04-29 Thread roger schildmeijer
Are you sure that your keyspace is named "keyspace", and not "Keyspace1" (default)? / Roger Schildmeijer On Thu, Apr 29, 2010 at 2:47 PM, Jonathan Ellis wrote: > If you're getting an internalerror, you need to check the server logs > for the exception that caused it > > On Wed, Apr 28, 2010

Re: Detailed behavior of insert() operation?

2010-04-29 Thread Roland Hänel
Jonathan, thanks for this pointer. I've new had a look at contrib/mutex. Coming back to my point, the use of Zookeeper within Cassandra for the purpose of then being able to deliver a "unique key generation function" out of Cassandra seems like overkill, in this case the application could use Zooke

TimedOutException when using the ColumnFamilyInputFormat

2010-04-29 Thread Utku Can Topçu
Hey All, I'm trying to run some tests on cassandra an Hadoop integration. I'm basically following the word count example at https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.javausing the ColumnFamilyInputFormat. Currently I have one-node cassandra and hadoop setup

Re: Basic Architecture Question

2010-04-29 Thread David Boxenhorn
How do I do that??? On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell wrote: > apparently there is now range query support for getting all keys using the > RP... > > cheers, > jesse > > -- > jesse mcconnell > jesse.mcconn...@gmail.com > > > > On Thu, Apr 29, 2010 at 08:16, David Boxenhorn wrote:

Re: TimedOutException when using the ColumnFamilyInputFormat

2010-04-29 Thread Joost Ouwerkerk
The default batch size is 4096, which means that each call to get_range_slices retrieves 4,096 rows. I have found that this causes timeouts when cassandra is under load. Try reducing the batchsize with a call to ConfigHelper.setRangeBatchSize(). This has eliminated the TimedOutExceptions for us.

Re: Basic Architecture Question

2010-04-29 Thread Roger Schildmeijer
take a look at get_range_slices and start with "". then invoke get_range_slices again, but this time use the last key as the start key // Roger Schildmeijer On 29 apr 2010, at 16.28em, David Boxenhorn wrote: > How do I do that??? > > On Thu, Apr 29, 2010 at 4:31 PM, Jesse McConnell > wrote

Re: Basic Architecture Question

2010-04-29 Thread David Boxenhorn
So now we can do any kind of range queries, not just "for getting all keys" as Jesse said? On Thu, Apr 29, 2010 at 6:04 PM, Roger Schildmeijer wrote: > take a look at get_range_slices and start with "". > then invoke get_range_slices again, but this time use the last key as the > start key > > //

Re: Cassandra data model for financial data

2010-04-29 Thread Andrew Nguyen
What is the upper limit on the number of super columns? Is it pretty much the same as for columns in general? On Apr 28, 2010, at 10:09 PM, Schubert Zhang wrote: > key : stock ID, e.g. AAPL+year > column family: closting price and valume, tow CFs. > colum name: timestamp LongType > > AAPL+201

RE: What's the best maximum size for a single column?

2010-04-29 Thread Mark Jones
The max size would probably be best determined by looking at the size of your MemTable 64 Read repair is on a per column basis, every column gets a timestamp, and the overhead of a name. So, balance those 3 out and you have a pretty good idea of what to do. From: Dop Sun [mailto:su...@d

RE: Problem with JVM? concurrent mode failure

2010-04-29 Thread Mark Jones
One of your problems here is the connect uses a daft connection string convention You would think node:port but it's actually node/port Your connection only succeeded because 9160 is the default for port not specified. And the keyspace thing that jbellis mentioned. -Original Message-

RE: Cassandra data model for financial data

2010-04-29 Thread Mark Jones
At the moment they all have to fit in memory during compaction. Columns OR SuperColumns (for one Key). From: Andrew Nguyen [mailto:andrew-lists-cassan...@ucsfcti.org] Sent: Thursday, April 29, 2010 10:30 AM To: user@cassandra.apache.org Subject: Re: Cassandra data model for financial data What

Re: TimedOutException when using the ColumnFamilyInputFormat

2010-04-29 Thread Utku Can Topçu
Hello Jeff, Thank you for your comments, bu the problem is not about the RangeBatchSize. In the case of the configuration parameter, mapred.tasktracker.map.tasks.maximum > 1 all the map task times out, they don't even run a single line of code in the Mapper.map() function. In the case of the con

ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
Hi, I've been trying to use Cassandra for some kind of a supplementary input source for Hadoop MapReduce jobs. The default usage of the ColumnFamilyInputFormat does a full columnfamily scan for using within the MapReduce framework as map input. However I believe that, it should be possible to gi

RE: OrderPreservingPartitioner limits and workarounds

2010-04-29 Thread Mark Jones
Sounds like you want something like http://oss.oetiker.ch/rrdtool/ Assuming you are trying to store computer log data. Do you have any other data that can spread the data load? Like a machine name? If so, you can use a hash of that value to place that "machine" randomly on the net, then appe

RE: Problem with JVM? concurrent mode failure

2010-04-29 Thread Daniel Gimenez
Thanks Mark! you're right :-) Jonathan, I tested everything with the patch and I had the same OutOfMemoryError after some "Concurrent Mode Failure". Now, I'm trying to distribute the load of Cassandra among 4 servers, maybe if the JVM is more "relaxed" it has enough time to do the GC without pro

openjdk or sun

2010-04-29 Thread Lee Parker
Is there a preference as to which JRE is used for cassandra? Lee Parker

Re: Basic Architecture Question

2010-04-29 Thread Brandon Williams
On Thu, Apr 29, 2010 at 10:19 AM, David Boxenhorn wrote: > So now we can do any kind of range queries, not just "for getting all keys" > as Jesse said? > With RP, the key ranges are based on the MD5 sum of the key, so it's really only useful for getting all keys, or obtaining a semi-random row.

Re: can we have duplicate keys ?

2010-04-29 Thread Jonathan Ellis
use dynamic column names. make a CF called Articles, have row key = 12, first column name 123, next column name 124, etc. On Thu, Apr 29, 2010 at 4:40 AM, vineet daniel wrote: > Hi > > Can anyone please tell me if we can have duplicate keys in Super Column > Family, if now how can we represent t

Re: Correct data model for Cassandra

2010-04-29 Thread Jonathan Ellis
the correct data model is one where you can pull the data you want out as a slice of a row, or (sometimes) as a slice of sequential rows. usually this involves writing the same data to multiple columnfamilies at insertion time, so when you do queries you don't need to do joins. On Wed, Apr 28, 201

Re: Detailed behavior of insert() operation?

2010-04-29 Thread Jonathan Ellis
2010/4/29 Roland Hänel : > Imagine the following rule: if we are in doubt whether to repair a column > with timestamp T (because two values X and Y are present within the cluster, > both at timestamp T), then we always repair towards X if md5(X) this case, even after an inconsistency on the first i

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Jonathan Ellis
It's technically possible but 0.6 does not support this, no. What is the use case you are thinking of? On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu wrote: > Hi, > > I've been trying to use Cassandra for some kind of a supplementary input > source for Hadoop MapReduce jobs. > > The default us

Re: openjdk or sun

2010-04-29 Thread Jonathan Ellis
most people use sun jdk or openjdk. for those you want u19 or u20. On Thu, Apr 29, 2010 at 2:09 PM, Lee Parker wrote: > Is there a preference as to which JRE is used for cassandra? > > Lee Parker -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professi

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
I'm currently writing collected data continuously to Cassandra, having keys starting with a timestamp and a unique identifier (like 2009.01.01.00.00.00.RANDOM) for being able to query in time ranges. I'm thinking of running periodical mapreduce jobs which will go through a designated time period.

Re: Cassandra reverting deletes?

2010-04-29 Thread Joost Ouwerkerk
Ok, I reproduced without mapred. Here is my recipe: On a single-node cassandra cluster with basic config (-Xmx:1G) loop { * insert 5,000 records in a single columnfamily with UUID keys and random string values (between 1 and 1000 chars) in 5 different columns spanning two different supercolumn

Re: openjdk or sun

2010-04-29 Thread Eric Evans
On Thu, 2010-04-29 at 14:09 -0500, Lee Parker wrote: > Is there a preference as to which JRE is used for cassandra? There are people using both. To the best of my knowledge, there's never been any evidence that one is a better choice for Cassandra than another. -- Eric Evans eev...@rackspace.com

Key distribution

2010-04-29 Thread Carlos Sanchez
All, Does anyone know of a program (series of classes) that can capture the key distribution of the rows in a ColumnFamily, sort of a [sub] string-histogram. Thanks, Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/o

How does cassandra deal with collisions?

2010-04-29 Thread Mark Jones
MD5 is not a perfect hash, it can produce collisions, how are these dealt with? Is there a size appended to them? If 2 keys collide, would that result in a merging of data (if the column names aren't the same) or an overwrite if they were?

Re: Cassandra data model for financial data

2010-04-29 Thread Andrew Nguyen
When making rough calculations regarding the potential size of a single row, what sort of overhead is there to consider? In other words, for a particular column, what else is there to consider in terms of memory consumption besides the value itself? On Apr 29, 2010, at 8:49 AM, Mark Jones wrot