date:20130428

CQL Clarification

2013-04-28 Thread Michael Theroux

Hello, Just wondering if I can get a quick clarification on some simple CQL. We utilize Thrift CQL Queries to access our cassandra setup. As clarified in a previous question I had, when using CQL and Thrift, timestamps on the cassandra column data is assigned by the server, not the client, un

Re: Really odd issue (AWS related?)

2013-04-28 Thread Michael Theroux

Hello, We've done some additional monitoring, and I think we have more information. We've been collecting vmstat information every minute, attempting to catch a node with issues,. So, it appears, that the cassandra node runs fine. Then suddenly, without any correlation to any event that I c

Re: Secondary Index on table with a lot of data crashes Cassandra

2013-04-28 Thread aaron morton

> What are we doing wrong? Can it be that Cassandra is actually trying to read > all the CF data rather than just the keys! (actually, it doesn't need to go > to the users CF at all - all the data it needs is in the index CF) > Data is not stored as a BTree, that's the RDBMS approach. We hit th

Re: 1.2.3 and 1.2.4 memory usage growth on idle cluster

2013-04-28 Thread aaron morton

> INFO 11:10:56,273 GC for ParNew: 1039 ms for 1 collections, 6631277912 used; > max is 10630070272 It depends on the settings. It looks like you are using non default JVM settings. It'd recommend restoring the default JVM settings as a start. CHeers - Aaron Morton Freelance

Re: CQL indexing

2013-04-28 Thread aaron morton

This discussion belongs on the user list, also please only email one list at a time. The article discusses improvements in secondary indexes in 1.2 http://www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2 If you have some more specific questions let us know. Cheers

Re: Many creation/inserts in parallel

2013-04-28 Thread aaron morton

> At first many CF are being created in parallel (about 1000 CF). > > Can you explain this in a bit more detail ? By in parallel do you mean multiple threads creating CF's at the same time ? I would also recommend taking a second look at your data model, you probably do not want to create so m

Re: Really odd issue (AWS related?)

2013-04-28 Thread Michael Theroux

I forgot to mention, When things go really bad, I'm seeing I/O waits in the 80->95% range. I restarted cassandra once when a node is in this situation, and it took 45 minutes to start (primarily reading SSTables). Typically, a node would start in about 5 minutes. Thanks, -Mike On Apr 28, 2

Re: Slow retrieval using secondary indexes

2013-04-28 Thread aaron morton

Try the request tracing in 1.2 http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 it may point to the different. > In our model the secondary index in also unique, as the primary key is. Is it > better, in this case, to create another CF mapping the secondary index to the > key? IMHO i

Re: Deletes, null values

2013-04-28 Thread aaron morton

What's your table definition ? >> select '1228#16857','1228#16866','1228#16875','1237#16544','1237#16553' >> from myCF where key = 'all'; The output looks correct to me. CQL table return values, including null, for all of the selected columns. Cheers - Aaron Morton Freelance C

Re: Is Cassandra oversized for this kind of use case?

2013-04-28 Thread aaron morton

Sounds like something C* would be good at. I would do some searching on Time Series data in cassandra, such as http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra And definitely consider storing data at the smallest level on granularity. On the analytics side there is good ne

question about internode_compression

2013-04-28 Thread John Sanda

When internode_compression is enabled, will the compression algorithm used be the same as whatever I am using for sstable_compression? - John

Re: cost estimate about some Cassandra patchs

2013-04-28 Thread aaron morton

> Does anyone know enough of the inner working of Cassandra to tell me how much > work is needed to patch Cassandra to enable such communication > vectorization/batch ? > Assuming you mean "have the coordinator send multiple row read/write requests in a single message to replicas" Pretty sure

Re: Adding nodes in 1.2 with vnodes requires huge disks

2013-04-28 Thread aaron morton

> We're going to try running a shuffle before adding a new node again... maybe > that will help I don't think hurt but I doubt it will help. >> It seems when new nodes join, they are streamed *all* sstables in the >> cluster. > How many nodes did you join, what was the num_tokens ? Did yo

cassandra-shuffle time to completion and required disk space

2013-04-28 Thread John Watson

The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should really be apparent in documentation (when some is made). Only semi-noticeable remark about the exorbitant amount of time is a bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance "Shuffling

Re: Really odd issue (AWS related?)

2013-04-28 Thread Alex Major

Hi Mike, We had issues with the ephemeral drives when we first got started, although we never got to the bottom of it so I can't help much with troubleshooting unfortunately. Contrary to a lot of the comments on the mailing list we've actually had a lot more success with EBS drives (PIOPs!). I'd d

setcompactionthroughput and setstreamthroughput have no effect

2013-04-28 Thread John Watson

Running these 2 commands are noop IO wise: nodetool setcompactionthroughput 0 nodetool setstreamtrhoughput 0 If trying to recover or rebuild nodes, it would be super helpful to get more than ~120mbit/s of streaming throughput (per session or ~500mbit total) and ~5% IO utilization in (8) 15k di

Re: CQL Clarification

2013-04-28 Thread aaron morton

I think this is some confusion about the two different usages of timestamp. The timestamp stored with the column value (not a column of timestamp type) is stored using microsecond scale, it's just a 64 bit int we do not use it as a time value. Each mutation in a single request will have a diffe

Re: setcompactionthroughput and setstreamthroughput have no effect

2013-04-28 Thread Edward Capriolo

Out of curiosity. Why did you decide to set it to 0 rather then 9. Does any documentation anywhere say that setting to 0 disables the feature? I have set streamthroughput higher and seen node join improvements. The features do work however they are probably not your limiting factor. Remember fo

Re: question about internode_compression

2013-04-28 Thread aaron morton

It uses Snappy Compression with the default block size. There may be a case for allowing configuration, for example so the LZ4Compressor can be used. Feel free to raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA Cheers - Aaron Morton Freelance Cassandra Consul

Re: setcompactionthroughput and setstreamthroughput have no effect

2013-04-28 Thread John Watson

The help command says 0 to disable: setcompactionthroughput - Set the MB/s throughput cap for compaction in the system, or 0 to disable throttling. setstreamthroughput - Set the MB/s throughput cap for streaming in the system, or 0 to disable throttling. I also set both to 1000 and it also

Re: cassandra-shuffle time to completion and required disk space

2013-04-28 Thread aaron morton

Can you provide some info on the number of nodes, node load, cluster load etc ? AFAIK shuffle was not an easy thing to test and does not get much real world use as only some people will run it and they (normally) use it once. Any info you can provide may help improve the process. Cheers -

Re: Adding nodes in 1.2 with vnodes requires huge disks

2013-04-28 Thread John Watson

On Sun, Apr 28, 2013 at 2:19 PM, aaron morton wrote: > We're going to try running a shuffle before adding a new node again... >> maybe that will help >> > I don't think hurt but I doubt it will help. > We had to bail on shuffle since we need to add capacity ASAP and not in 20 days. > >It

Re: CQL Clarification

2013-04-28 Thread Michael Theroux

Yes, that does help, So, in the link I provided: http://www.datastax.com/docs/1.0/references/cql/UPDATE It states: You can specify these options: Consistency level Time-to-live (TTL) Timestamp for the written columns. Where timestamp is a link to "Working with dates and times" and mentions th

Re: cassandra-shuffle time to completion and required disk space

2013-04-28 Thread John Watson

11 nodes 1 keyspace 256 vnodes per node upgraded 1.1.9 to 1.2.3 a week ago These are taken just before starting shuffle (ran repair/cleanup the day before). During shuffle disabled all reads/writes to the cluster. nodetool status keyspace: Load Tokens Owns (effective) Host ID 80.95 GB

CQL Clarification

Re: Really odd issue (AWS related?)

Re: Secondary Index on table with a lot of data crashes Cassandra

Re: 1.2.3 and 1.2.4 memory usage growth on idle cluster

Re: CQL indexing

Re: Many creation/inserts in parallel

Re: Really odd issue (AWS related?)

Re: Slow retrieval using secondary indexes

Re: Deletes, null values

Re: Is Cassandra oversized for this kind of use case?

question about internode_compression

Re: cost estimate about some Cassandra patchs

Re: Adding nodes in 1.2 with vnodes requires huge disks

cassandra-shuffle time to completion and required disk space

Re: Really odd issue (AWS related?)

setcompactionthroughput and setstreamthroughput have no effect

Re: CQL Clarification

Re: setcompactionthroughput and setstreamthroughput have no effect

Re: question about internode_compression

Re: setcompactionthroughput and setstreamthroughput have no effect

Re: cassandra-shuffle time to completion and required disk space

Re: Adding nodes in 1.2 with vnodes requires huge disks

Re: CQL Clarification

Re: cassandra-shuffle time to completion and required disk space

24 matches

Site Navigation

Mail list logo

Footer information