Re: Read efficiency question
In the first case, the partitioning is based on key1,key2,key3. In the second case, partitioning is based on key1 , key2. Additionally you have a clustered key key3. This means within a partition you can do range queries on key3 efficiently. That is the difference. regards On Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot wrote: > Wondering if there's a difference when querying by primary key between the > two definitions below: > > primary key ((key1, key2, key3)) > primary key ((key1, key2), key3) > > In terms of read speed/efficiency... I don't have much of a reason > otherwise to prefer one setup over the other, so would prefer the most > efficient for querying. > > Thanks. > -- http://khangaonkar.blogspot.com/
Bulk Import Question
I'm following the example here for doing a bulk import into Cassandra: https://github.com/yukim/cassandra-bulkload-example Is there a way to get a number of rows written to a sstable set created via CQLSSTableWriter, without importing the sstable set into Cassandra? I'd like to do some QA on the converted sstables I have before importing them into Cassandra.
Insert with both TTL and timestamp behavior
It appears as though, when inserting with "using ttl [foo] and timestamp [bar]" that the TTL does not take the provided timestamp into account. In other words, the TTL starts at insert time, not at the time specified by the timestamp. Similarly, if inserting with just "using timestamp [bar]" and relying on the table's default_time_to_live property, the timestamp is again ignored in terms of TTL expiration. Seems like a bug to me, but I'm guessing this is intended behavior? Use-case is importing data (some of it historical) and setting the timestamp manually (based on a timestamp within the data itself). Anyone familiar with any work-arounds that don't rely on calculating a TTL client-side for each record?
Re: Openstack and Cassandra
Kilo is a bit old but the good news is that CPU pinning is available which IMHO is a must to run C* on Production.Of course your bottleneck will be shared HDDs. Best, Romain Le Mardi 27 décembre 2016 10h21, Shalom Sagges a écrit : Hi Romain, Thanks for the input! We currently use the Kilo release of Openstack. Are you aware of any known bugs/issues with this release?We definitely defined anti-affinity rules regarding spreading C* on different hosts. (I surely don't want to be woken up at night due to a failed host ;-) ) Regarding Trove, I doubt we'll use it in Production any time soon. Thanks again! | | | Shalom Sagges | | DBA | | T: +972-74-700-4035 | | | | | | | | We Create Meaningful Connections | | | | On Mon, Dec 26, 2016 at 7:37 PM, Romain Hardouin wrote: Hi Shalom, I assume you'll use KVM virtualization so pay attention to your stack at every level:- Nova e.g. CPU pinning, NUMA awareness if relevant, etc. Have a look to extra specs.- libvirt - KVM- QEMU You can also be interested by resources quota on other OpenStack VMs that will be colocated with C* VMs.Don't forget to define anti-affinity rules in order to spread out your C* VMs on different hosts.Finally, watch out versions of libvirt/KVM/QEMU. Some optimizations/bugs are good to know. Out of curiosity, which OpenStack release are you using?You can be interested by Trove but C* support is for testing only. Best, Romain This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.
Re: Insert with both TTL and timestamp behavior
The purpose of timestamps is to guarantee out-of-order conflicting writes are resolved as last-write-wins. Cassandra doesn't really expect you to be writing timestamps with wide variations from record to record. Indeed, if you're doing this, it'll violate some of the assumptions in places such as time windowed / date tiered compaction. It's possible to dodge those landmines but it would be hard to know if you got it wrong. I think in general timestamp manipulation is *caveat utilitor*. It's not clear to me why for your use case you would want to manipulate the timestamps as you're loading the records unless you're concerned about conflicting writes getting applied in the correct order. Probably worth a footnote in the documentation indicating that if you're doing both USING TTL and WITH TIMESTAMP that those don't relate to each other. At rest TTL'd records get written with an expiration timestamp, not a delta from the writetime. On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot wrote: > It appears as though, when inserting with "using ttl [foo] and timestamp > [bar]" that the TTL does not take the provided timestamp into account. > > In other words, the TTL starts at insert time, not at the time specified > by the timestamp. > > Similarly, if inserting with just "using timestamp [bar]" and relying on > the table's default_time_to_live property, the timestamp is again ignored > in terms of TTL expiration. > > Seems like a bug to me, but I'm guessing this is intended behavior? > > Use-case is importing data (some of it historical) and setting the > timestamp manually (based on a timestamp within the data itself). Anyone > familiar with any work-arounds that don't rely on calculating a TTL > client-side for each record? >
Re: Insert with both TTL and timestamp behavior
>It's not clear to me why for your use case you would want to manipulate the timestamps as you're loading the records unless you're concerned about conflicting writes getting applied in the correct order. Simple use-case: want to load historical data, want to use TWCS, want to use TTL. Scenario: Importing data using standard write path (inserts) Using timestamp to give TWCS something to work with (import records contain a created-on timestamp from which I populate "using timestamp") Need records to expire according to TTL Don't want to calculate TTL for every insert individually (obviously what I want and what I get differ) I'm importing in chrono order, so TWCS should be able to keep things from getting out of hand. >I think in general timestamp manipulation is *caveat utilitor*. Yeah; although I'd probably choose stronger words. TWCS (and perhaps DTCS?) appears to treat writetimes as timestamps; the rest of Cassandra appears to treat them as integers. On Wed, Dec 28, 2016 at 2:50 PM, Eric Stevens wrote: > The purpose of timestamps is to guarantee out-of-order conflicting writes > are resolved as last-write-wins. Cassandra doesn't really expect you to be > writing timestamps with wide variations from record to record. Indeed, if > you're doing this, it'll violate some of the assumptions in places such as > time windowed / date tiered compaction. It's possible to dodge those > landmines but it would be hard to know if you got it wrong. > > I think in general timestamp manipulation is *caveat utilitor*. It's not > clear to me why for your use case you would want to manipulate the > timestamps as you're loading the records unless you're concerned about > conflicting writes getting applied in the correct order. > > Probably worth a footnote in the documentation indicating that if you're > doing both USING TTL and WITH TIMESTAMP that those don't relate to each > other. At rest TTL'd records get written with an expiration timestamp, not > a delta from the writetime. > > On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot > wrote: > >> It appears as though, when inserting with "using ttl [foo] and timestamp >> [bar]" that the TTL does not take the provided timestamp into account. >> >> In other words, the TTL starts at insert time, not at the time specified >> by the timestamp. >> >> Similarly, if inserting with just "using timestamp [bar]" and relying on >> the table's default_time_to_live property, the timestamp is again ignored >> in terms of TTL expiration. >> >> Seems like a bug to me, but I'm guessing this is intended behavior? >> >> Use-case is importing data (some of it historical) and setting the >> timestamp manually (based on a timestamp within the data itself). Anyone >> familiar with any work-arounds that don't rely on calculating a TTL >> client-side for each record? >> >
weird jvm metrics
Hi There - I recently upgraded from cassandra 3.5 to 3.9 (DDC), and I noticed that the "new" jvm metrics are reporting with an extra '.' character in them. Here is a snippet of what I see from one of my nodes: ubuntu@ip-10-0-2-163:~$ sudo tcpdump -i eth0 -v dst port 2003 -A | grep 'jvm' tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes .Je..l>.pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.capacity 762371494 1482960946 pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.count 3054 1482960946 pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.used 762371496 1482960946 pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.capacity 515226631134 1482960946 pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.count 45572 1482960946 pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.used 515319762610 1482960946 pi.cassandra.us-east-1.cassy-node1.jvm.fd.usage 0.00 1482960946 My metrics.yaml looks like this: graphite: - period: 60 timeunit: 'SECONDS' prefix: 'pi.cassandra.us-east-1.cassy-node1' hosts: - host: '#RELAY_HOST#' port: 2003 predicate: color: "white" useQualifiedName: true patterns: - "^org.+" - "^jvm.+" - "^java.lang.+" All the org.* metrics come through fine, and the jvm.fd.usage metric strangely comes through fine, too. The rest of the jvm.* metrics have this extra '.' character that causes them to not show up in graphite. Am I missing something silly here? Appreciate any help or suggestions. - Mike
unsubscribe
Re: Insert with both TTL and timestamp behavior
Indeed, the TTL is computed based on LOCAL timestamp of the server and not based on the PROVIDED timestamp by the client ... (according to Mastering Apache Cassandra, 2nd edition, Nishant Neeraj, PackPublishing) On Wed, Dec 28, 2016 at 10:15 PM, Voytek Jarnot wrote: > >It's not clear to me why for your use case you would want to manipulate > the timestamps as you're loading the records unless you're concerned about > conflicting writes getting applied in the correct order. > > Simple use-case: want to load historical data, want to use TWCS, want to > use TTL. > > Scenario: > Importing data using standard write path (inserts) > Using timestamp to give TWCS something to work with (import records > contain a created-on timestamp from which I populate "using timestamp") > Need records to expire according to TTL > Don't want to calculate TTL for every insert individually (obviously what > I want and what I get differ) > I'm importing in chrono order, so TWCS should be able to keep things from > getting out of hand. > > >I think in general timestamp manipulation is *caveat utilitor*. > > Yeah; although I'd probably choose stronger words. TWCS (and perhaps > DTCS?) appears to treat writetimes as timestamps; the rest of Cassandra > appears to treat them as integers. > > > On Wed, Dec 28, 2016 at 2:50 PM, Eric Stevens wrote: > >> The purpose of timestamps is to guarantee out-of-order conflicting writes >> are resolved as last-write-wins. Cassandra doesn't really expect you to be >> writing timestamps with wide variations from record to record. Indeed, if >> you're doing this, it'll violate some of the assumptions in places such as >> time windowed / date tiered compaction. It's possible to dodge those >> landmines but it would be hard to know if you got it wrong. >> >> I think in general timestamp manipulation is *caveat utilitor*. It's >> not clear to me why for your use case you would want to manipulate the >> timestamps as you're loading the records unless you're concerned about >> conflicting writes getting applied in the correct order. >> >> Probably worth a footnote in the documentation indicating that if you're >> doing both USING TTL and WITH TIMESTAMP that those don't relate to each >> other. At rest TTL'd records get written with an expiration timestamp, not >> a delta from the writetime. >> >> On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot >> wrote: >> >>> It appears as though, when inserting with "using ttl [foo] and timestamp >>> [bar]" that the TTL does not take the provided timestamp into account. >>> >>> In other words, the TTL starts at insert time, not at the time specified >>> by the timestamp. >>> >>> Similarly, if inserting with just "using timestamp [bar]" and relying on >>> the table's default_time_to_live property, the timestamp is again ignored >>> in terms of TTL expiration. >>> >>> Seems like a bug to me, but I'm guessing this is intended behavior? >>> >>> Use-case is importing data (some of it historical) and setting the >>> timestamp manually (based on a timestamp within the data itself). Anyone >>> familiar with any work-arounds that don't rely on calculating a TTL >>> client-side for each record? >>> >> >
Growing Hints
Hello All We have one unusual issue on our cluster. We are seeing growing hints table on node although all the nodes are up and coming online with notetool status. I know Cassandra appends the hints in case if there is write timeout for other nodes. In our case all nodes are up and functional , Gossip is also flowing well. Also write time out value is quite high in our cluster . Can anyone suggest what could be other possible reason for these growing hints ?