Re: Direct control over where data is stored?

2011-06-05 Thread Watanabe Maki
You can know endpoints which cassandra will store your key to with getNaturalEndpoints, but you can't specify endpoint you want to use with this API. Partitioner decides which key will go to which node. With OPP, you may be able to predicate which key range will be stored to a node, so you can c

Re: [RELEASE] 0.8.0

2011-06-05 Thread Terje Marthinussen
0.8 under load may turn out to be more stable and well behaving than any release so far Been doing a few test runs stuffing more than 1 billion records into a 12 node cluster and thing looks better than ever. VM's stable and nice at 11GB. No data corruptions, dead nodes, full GC's or any of the ot

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
On Sun, Jun 5, 2011 at 11:26 PM, Maki Watanabe wrote: > getNaturalEndpoints tells you which key will be stored on which nodes, > but we can't force cassandra to store given key to specific nodes. > > maki I'm confused. Didn't you mention previously that I can use OrderPreservingPartitioner to sto

Re: slow insertion rate with secondary index

2011-06-05 Thread Jonathan Ellis
Index updates require read-before-write (to find out what the prior version was, if any, and update the index accordingly). This is random i/o. Index creation on the other hand is a lot of sequential i/o, hence more efficient. So, the classic bulk load advice to ingest data prior to creating ind

Re: Direct control over where data is stored?

2011-06-05 Thread Maki Watanabe
getNaturalEndpoints tells you which key will be stored on which nodes, but we can't force cassandra to store given key to specific nodes. maki 2011/6/6 mcasandra : > > Khanh Nguyen wrote: >> >> Is there a way to tell where a piece of data is stored in a cluster? >> For example, can I tell if Last

Re: Direct control over where data is stored?

2011-06-05 Thread Watanabe Maki
It may not what you want, but please read about Network Topology Strategy and DC_QUORUM. http://www.datastax.com/dev/blog/deploying-cassandra-across-multiple-data-centers You can configure your Cassandra "Data Center aware" . Your read and write will be resolved in DC local, but will be replica

Re: CQL How to do

2011-06-05 Thread Jeffrey Kesselman
Fair enough. I do have to keep reminding myself that a REST interface requires text. And it does make more sense, at least, when coming from a human as opposed to when you make a computer spend cycles converting binary to text just so another computer can spend cycles turning it back again. On Su

Re: Direct control over where data is stored?

2011-06-05 Thread mcasandra
Khanh Nguyen wrote: > > Is there a way to tell where a piece of data is stored in a cluster? > For example, can I tell if LastNameColumn['A'] is stored at node 1 in > the ring? > I have not used it but you can see getNaturalEndpoints in jmx. It will tell you which nodes are responsible for a gi

Re: how to know there are some columns in a row

2011-06-05 Thread aaron morton
You can create columns without values. Are you talking about reading them back through the API ? I would suggest looking at your data model to see if there is a better way to support your read patterns. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://

Re: CQL How to do

2011-06-05 Thread aaron morton
From what I've seen of CQL there is no comparison between the potential complexity of a CQL statement and that of a SQL statement. IMHO CQL is more or less a human readable form of the current API, it does not add features. SQL statements are arbitrarily complex and may generate many possible qu

Re: problems with many columns on a row

2011-06-05 Thread aaron morton
Ops, I misread "150 GB" in one of your earlier emails as "150 MB" so forget what I said before. You have loads of free space :) How many files do you have in your data directory ? If it's 1 then that log message was a small bug, that has been fixed. Cheers - Aaron Morton Freel

slow insertion rate with secondary index

2011-06-05 Thread Donal Zang
I did a insertion test with and without secondary indexes, and found that: Without secondary index: ~10864 rows inserted per second With secondary index on one column(BytesType): ~1515 rows inserted per second Is this normal? why secondary index would have so much affect? I noticed that If I bu

Re: how to know there are some columns in a row

2011-06-05 Thread Patrick de Torcy
It would be definetely useful to be able to have columns (or super columns) names WITHOUT their values. If these ones are pretty big or if there are a lot of columns, that would generate traffic not necessarily needed (if in the end you are just interrested by some column). Moreover it doesn't seem

Re: Paging Columns from a Row

2011-06-05 Thread Joseph Stein
So I can have one PagedIndex CF that holdes a row for each data file I am processing. The columns for that row (in my example) would have X columns and I can make those columns values be 100 strings that represent keys in another PagedData CF This other PagedData CF for each row would have 10,000

Re: Paging Columns from a Row

2011-06-05 Thread Jonathan Ellis
If you need to parallelize (and scale) you need to distribute across multiple rows. One Big Row means all your 100 workers are hammering the same 3 (for instance) replicas at the same time. On Sun, Jun 5, 2011 at 1:43 PM, Joseph Stein wrote: > What is the best practices here to page and slice col

Re: How to delete UUIDs from the CLI?

2011-06-05 Thread Jonathan Ellis
You're going to need to get a lot more specific. On Sun, Jun 5, 2011 at 12:12 PM, Kevin wrote: > Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't > delete any rows from the CLI, regardless of the type they're stored as. > > > > -Original Message- > From: Jonathan E

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
On Sun, Jun 5, 2011 at 2:17 PM, mcasandra wrote: > Please give more detailed info about what exactly you are worried about or > trying to solve. In general, we are trying to devise a partitioning and replication scheme that takes into account social relations between data. > Please take a step b

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
Great. Thank you, Eric. -k On Sun, Jun 5, 2011 at 2:13 PM, Eric tamme wrote: > On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen > wrote: >> Hi Maki and Adrian, >> >> Thank you very much for the promptness. It's weekend after all :). >> >> I realized I forgot a part of my question until Adrian men

Paging Columns from a Row

2011-06-05 Thread Joseph Stein
What is the best practices here to page and slice columns from a row. So lets say I have 1,000,000 columns in a row I read the row but want to have 1 thread read columns 0 - , second thread (actor in my case) 1 - 1 ... and so on so i can have 100 workers processing 10,000 columns for

Re: Direct control over where data is stored?

2011-06-05 Thread mcasandra
Please give more detailed info about what exactly you are worried about or trying to solve. Please take a step back and look at cassandra's architecture again and what it's trying to solve. It's a distributed database so if you do what you are describing there is a potential of getting hotspots. W

Re: Direct control over where data is stored?

2011-06-05 Thread Eric tamme
On Sun, Jun 5, 2011 at 12:18 PM, Khanh Nguyen wrote: > Hi Maki and Adrian, > > Thank you very much for the promptness. It's weekend after all :). > > I realized I forgot a part of my question until Adrian mentioned the > replication factor. Is it also possible to set where the replicas are > store

RE: How to delete UUIDs from the CLI?

2011-06-05 Thread Kevin
Jonathan, I've upgraded to 0.8.0 and the problem got worse. Now, I can't delete any rows from the CLI, regardless of the type they're stored as. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Sunday, June 05, 2011 10:56 AM To: user@cassandra.apache.org Subject:

Re: Direct control over where data is stored?

2011-06-05 Thread Khanh Nguyen
Hi Maki and Adrian, Thank you very much for the promptness. It's weekend after all :). I realized I forgot a part of my question until Adrian mentioned the replication factor. Is it also possible to set where the replicas are stored as well? Thanks. This is a research experiment we're exploring

Re: CQL How to do

2011-06-05 Thread Eric Evans
On Sun, 2011-06-05 at 00:51 -0400, Jeffrey Kesselman wrote: > Is CQL really the path for the future for Cassandra? CQL is no more or less "official" than the Thrift interface, and TTBMK, there is no secret cabal that met to decide it would be The Way. People will use what works best for them, and

Re: CQL/JDBC: Cannot locate cassandra.yaml

2011-06-05 Thread Jonathan Ellis
On Sun, Jun 5, 2011 at 9:38 AM, Timo Nentwig wrote: > Hmm, worked-around that by setting -Dcassandra.config (hmm, the client needs > the server's config...?). Yes, this is fixed for 0.8.1. > Not very verbose :-\ May have something to do with my l/p being just "/" for > AllowAll. Correct, that's

Re: How to delete UUIDs from the CLI?

2011-06-05 Thread Jonathan Ellis
If you're not using 0.8.0 the cli deals poorly with non-string row keys. On Sat, Jun 4, 2011 at 7:48 PM, Kevin wrote: > Currently I'm using a client (Pelops) to insert UUIDs (both lexical and > time) in to Cassandra. I haven't yet implemented a facility to remove them > with Pelops; i'm testing a

Re: Troubleshooting IO performance ?

2011-06-05 Thread Jonathan Ellis
You may be swapping. http://spyced.blogspot.com/2010/01/linux-performance-basics.html explains how to check this as well as how to see what threads are busy in the Java process. On Sat, Jun 4, 2011 at 5:34 PM, Philippe wrote: > Hello, > I am evaluating using cassandra and I'm running into some s

Re: CQL/JDBC: Cannot locate cassandra.yaml

2011-06-05 Thread Timo Nentwig
On 6/5/11 16:26, Timo Nentwig wrote: $ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks 2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 2011-06-05

CQL/JDBC: Cannot locate cassandra.yaml

2011-06-05 Thread Timo Nentwig
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks 2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 2011-06-05 16:21:54,517 ERROR [main] org.apache

Re: When should I use Solandra?

2011-06-05 Thread Jean-Nicolas Boulay Desjardins
Perfect thanks! On Sun, Jun 5, 2011 at 4:43 AM, Victor Kabdebon wrote: > Again I don't really know the specifics of Solandra but in Solr (so > Solandra being a cousin of Solr it should be true too) you have XML fields > like this : > > Just turn indexed to false and it's not going to be indexed.

Re: problems with many columns on a row

2011-06-05 Thread Mario Micklisch
I found a patch for the php extension here: https://issues.apache.org/jira/browse/THRIFT-1067 … this seemed to fix the issue. Thank you Jonathan and Aaron for taking time to provide me with some help! Regarding the compaction I would still love to hear your feedback on how to configure Cassandra

Re: problems with many columns on a row

2011-06-05 Thread Mario Micklisch
I tracked down the timestamp submission and everything was fine within the PHP Libraries. The thrift php extension however seems to have an overflow, because it was now setting now timestamps with also negative values ( -1242277493 ). I disabled the php extension and as a result I now got correct

Re: problems with many columns on a row

2011-06-05 Thread Mario Micklisch
Thanks for the feedback Aaron! The schema of the CF is default, I just defined the name and the rest is default, have a look: Keyspace: TestKS Read Count: 65 Read Latency: 657.8047076923076 ms. Write Count: 10756 Write Latency: 0.03237039791744143 ms. Pending Tasks: 0 Column Family: CFTest SSTa

Re: When should I use Solandra?

2011-06-05 Thread Victor Kabdebon
Again I don't really know the specifics of Solandra but in Solr (so Solandra being a cousin of Solr it should be true too) you have XML fields like this : Just turn indexed to false and it's not going to be indexed... Thrift won't affect Solandra at all. 2011/6/4 Jean-Nicolas Boulay Desjardins