Hello,
Just wondering if I can get a quick clarification on some simple CQL. We
utilize Thrift CQL Queries to access our cassandra setup. As clarified in a
previous question I had, when using CQL and Thrift, timestamps on the cassandra
column data is assigned by the server, not the client, un
Hello,
We've done some additional monitoring, and I think we have more information.
We've been collecting vmstat information every minute, attempting to catch a
node with issues,.
So, it appears, that the cassandra node runs fine. Then suddenly, without any
correlation to any event that I c
> What are we doing wrong? Can it be that Cassandra is actually trying to read
> all the CF data rather than just the keys! (actually, it doesn't need to go
> to the users CF at all - all the data it needs is in the index CF)
>
Data is not stored as a BTree, that's the RDBMS approach. We hit th
> INFO 11:10:56,273 GC for ParNew: 1039 ms for 1 collections, 6631277912 used;
> max is 10630070272
It depends on the settings. It looks like you are using non default JVM
settings.
It'd recommend restoring the default JVM settings as a start.
CHeers
-
Aaron Morton
Freelance
This discussion belongs on the user list, also please only email one list at a
time.
The article discusses improvements in secondary indexes in 1.2
http://www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2
If you have some more specific questions let us know.
Cheers
> At first many CF are being created in parallel (about 1000 CF).
>
>
Can you explain this in a bit more detail ? By in parallel do you mean multiple
threads creating CF's at the same time ?
I would also recommend taking a second look at your data model, you probably do
not want to create so m
I forgot to mention,
When things go really bad, I'm seeing I/O waits in the 80->95% range. I
restarted cassandra once when a node is in this situation, and it took 45
minutes to start (primarily reading SSTables). Typically, a node would start
in about 5 minutes.
Thanks,
-Mike
On Apr 28, 2
Try the request tracing in 1.2
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2 it may point to the
different.
> In our model the secondary index in also unique, as the primary key is. Is it
> better, in this case, to create another CF mapping the secondary index to the
> key?
IMHO i
What's your table definition ?
>> select '1228#16857','1228#16866','1228#16875','1237#16544','1237#16553'
>> from myCF where key = 'all';
The output looks correct to me. CQL table return values, including null, for
all of the selected columns.
Cheers
-
Aaron Morton
Freelance C
Sounds like something C* would be good at.
I would do some searching on Time Series data in cassandra, such as
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra And
definitely consider storing data at the smallest level on granularity.
On the analytics side there is good ne
When internode_compression is enabled, will the compression algorithm used
be the same as whatever I am using for sstable_compression?
- John
> Does anyone know enough of the inner working of Cassandra to tell me how much
> work is needed to patch Cassandra to enable such communication
> vectorization/batch ?
>
Assuming you mean "have the coordinator send multiple row read/write requests
in a single message to replicas"
Pretty sure
> We're going to try running a shuffle before adding a new node again... maybe
> that will help
I don't think hurt but I doubt it will help.
>> It seems when new nodes join, they are streamed *all* sstables in the
>> cluster.
>
How many nodes did you join, what was the num_tokens ?
Did yo
The amount of time/space cassandra-shuffle requires when upgrading to using
vnodes should really be apparent in documentation (when some is made).
Only semi-noticeable remark about the exorbitant amount of time is a bullet
point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance
"Shuffling
Hi Mike,
We had issues with the ephemeral drives when we first got started, although
we never got to the bottom of it so I can't help much with troubleshooting
unfortunately. Contrary to a lot of the comments on the mailing list we've
actually had a lot more success with EBS drives (PIOPs!). I'd d
Running these 2 commands are noop IO wise:
nodetool setcompactionthroughput 0
nodetool setstreamtrhoughput 0
If trying to recover or rebuild nodes, it would be super helpful to get
more than ~120mbit/s of streaming throughput (per session or ~500mbit
total) and ~5% IO utilization in (8) 15k di
I think this is some confusion about the two different usages of timestamp.
The timestamp stored with the column value (not a column of timestamp type) is
stored using microsecond scale, it's just a 64 bit int we do not use it as a
time value. Each mutation in a single request will have a diffe
Out of curiosity. Why did you decide to set it to 0 rather then 9. Does
any documentation anywhere say that setting to 0 disables the feature? I
have set streamthroughput higher and seen node join improvements. The
features do work however they are probably not your limiting factor.
Remember fo
It uses Snappy Compression with the default block size.
There may be a case for allowing configuration, for example so the
LZ4Compressor can be used. Feel free to raise a ticket at
https://issues.apache.org/jira/browse/CASSANDRA
Cheers
-
Aaron Morton
Freelance Cassandra Consul
The help command says 0 to disable:
setcompactionthroughput - Set the MB/s throughput cap for
compaction in the system, or 0 to disable throttling.
setstreamthroughput - Set the MB/s throughput cap for
streaming in the system, or 0 to disable throttling.
I also set both to 1000 and it also
Can you provide some info on the number of nodes, node load, cluster load etc ?
AFAIK shuffle was not an easy thing to test and does not get much real world
use as only some people will run it and they (normally) use it once.
Any info you can provide may help improve the process.
Cheers
-
On Sun, Apr 28, 2013 at 2:19 PM, aaron morton wrote:
> We're going to try running a shuffle before adding a new node again...
>> maybe that will help
>>
> I don't think hurt but I doubt it will help.
>
We had to bail on shuffle since we need to add capacity ASAP and not in 20
days.
>
>It
Yes, that does help,
So, in the link I provided:
http://www.datastax.com/docs/1.0/references/cql/UPDATE
It states:
You can specify these options:
Consistency level
Time-to-live (TTL)
Timestamp for the written columns.
Where timestamp is a link to "Working with dates and times" and mentions th
11 nodes
1 keyspace
256 vnodes per node
upgraded 1.1.9 to 1.2.3 a week ago
These are taken just before starting shuffle (ran repair/cleanup the day
before).
During shuffle disabled all reads/writes to the cluster.
nodetool status keyspace:
Load Tokens Owns (effective) Host ID
80.95 GB
24 matches
Mail list logo