Re: Read efficiency question

2016-12-28 Thread Manoj Khangaonkar
In the first case, the partitioning is based on key1,key2,key3.

In the second case, partitioning is based on key1 , key2. Additionally you
have a clustered key key3. This means within a partition you can do range
queries on key3 efficiently. That is the difference.

regards

On Tue, Dec 27, 2016 at 7:42 AM, Voytek Jarnot 
wrote:

> Wondering if there's a difference when querying by primary key between the
> two definitions below:
>
> primary key ((key1, key2, key3))
> primary key ((key1, key2), key3)
>
> In terms of read speed/efficiency... I don't have much of a reason
> otherwise to prefer one setup over the other, so would prefer the most
> efficient for querying.
>
> Thanks.
>



-- 
http://khangaonkar.blogspot.com/


Bulk Import Question

2016-12-28 Thread Joe Olson
I'm following the example here for doing a bulk import into Cassandra: 
https://github.com/yukim/cassandra-bulkload-example 

Is there a way to get a number of rows written to a sstable set created via 
CQLSSTableWriter, without importing the sstable set into Cassandra? 

I'd like to do some QA on the converted sstables I have before importing them 
into Cassandra. 


Insert with both TTL and timestamp behavior

2016-12-28 Thread Voytek Jarnot
It appears as though, when inserting with "using ttl [foo] and timestamp
[bar]" that the TTL does not take the provided timestamp into account.

In other words, the TTL starts at insert time, not at the time specified by
the timestamp.

Similarly, if inserting with just "using timestamp [bar]" and relying on
the table's default_time_to_live property, the timestamp is again ignored
in terms of TTL expiration.

Seems like a bug to me, but I'm guessing this is intended behavior?

Use-case is importing data (some of it historical) and setting the
timestamp manually (based on a timestamp within the data itself). Anyone
familiar with any work-arounds that don't rely on calculating a TTL
client-side for each record?


Re: Openstack and Cassandra

2016-12-28 Thread Romain Hardouin
Kilo is a bit old but the good news is that CPU pinning is available which IMHO 
is a must to run C* on Production.Of course your bottleneck will be shared HDDs.
Best,
Romain 

Le Mardi 27 décembre 2016 10h21, Shalom Sagges  a 
écrit :
 

 Hi Romain, 
Thanks for the input!
We currently use the Kilo release of Openstack. Are you aware of any known 
bugs/issues with this release?We definitely defined anti-affinity rules 
regarding spreading C* on different hosts. (I surely don't want to be woken up 
at night due to a failed host ;-) )
Regarding Trove, I doubt we'll use it in Production any time soon.
Thanks again!



 
|  |
| Shalom Sagges |
| DBA |
| T: +972-74-700-4035 |
| 
| 
|  |  |  |

 | We Create Meaningful Connections |

 |
|  |

 
On Mon, Dec 26, 2016 at 7:37 PM, Romain Hardouin  wrote:

Hi Shalom,
I assume you'll use KVM virtualization so pay attention to your stack at every 
level:- Nova e.g. CPU pinning, NUMA awareness if relevant, etc. Have a look to 
extra specs.- libvirt - KVM- QEMU
You can also be interested by resources quota on other OpenStack VMs that will 
be colocated with C* VMs.Don't forget to define anti-affinity rules in order to 
spread out your C* VMs on different hosts.Finally, watch out versions of 
libvirt/KVM/QEMU. Some optimizations/bugs are good to know.
Out of curiosity, which OpenStack release are you using?You can be interested 
by Trove but C* support is for testing only.
Best,
Romain



   


This message may contain confidential and/or privileged information. If you are 
not the addressee or authorized to receive this on behalf of the addressee you 
must not use, copy, disclose or take action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply email and delete this message. Thank you.

   

Re: Insert with both TTL and timestamp behavior

2016-12-28 Thread Eric Stevens
The purpose of timestamps is to guarantee out-of-order conflicting writes
are resolved as last-write-wins.  Cassandra doesn't really expect you to be
writing timestamps with wide variations from record to record.  Indeed, if
you're doing this, it'll violate some of the assumptions in places such as
time windowed / date tiered compaction.  It's possible to dodge those
landmines but it would be hard to know if you got it wrong.

I think in general timestamp manipulation is *caveat utilitor*.  It's not
clear to me why for your use case you would want to manipulate the
timestamps as you're loading the records unless you're concerned about
conflicting writes getting applied in the correct order.

Probably worth a footnote in the documentation indicating that if you're
doing both USING TTL and WITH TIMESTAMP that those don't relate to each
other.  At rest TTL'd records get written with an expiration timestamp, not
a delta from the writetime.

On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot 
wrote:

> It appears as though, when inserting with "using ttl [foo] and timestamp
> [bar]" that the TTL does not take the provided timestamp into account.
>
> In other words, the TTL starts at insert time, not at the time specified
> by the timestamp.
>
> Similarly, if inserting with just "using timestamp [bar]" and relying on
> the table's default_time_to_live property, the timestamp is again ignored
> in terms of TTL expiration.
>
> Seems like a bug to me, but I'm guessing this is intended behavior?
>
> Use-case is importing data (some of it historical) and setting the
> timestamp manually (based on a timestamp within the data itself). Anyone
> familiar with any work-arounds that don't rely on calculating a TTL
> client-side for each record?
>


Re: Insert with both TTL and timestamp behavior

2016-12-28 Thread Voytek Jarnot
>It's not clear to me why for your use case you would want to manipulate
the timestamps as you're loading the records unless you're concerned about
conflicting writes getting applied in the correct order.

Simple use-case: want to load historical data, want to use TWCS, want to
use TTL.

Scenario:
Importing data using standard write path (inserts)
Using timestamp to give TWCS something to work with (import records contain
a created-on timestamp from which I populate "using timestamp")
Need records to expire according to TTL
Don't want to calculate TTL for every insert individually (obviously what I
want and what I get differ)
I'm importing in chrono order, so TWCS should be able to keep things from
getting out of hand.

>I think in general timestamp manipulation is *caveat utilitor*.

Yeah; although I'd probably choose stronger words. TWCS (and perhaps DTCS?)
appears to treat writetimes as timestamps; the rest of Cassandra appears to
treat them as integers.


On Wed, Dec 28, 2016 at 2:50 PM, Eric Stevens  wrote:

> The purpose of timestamps is to guarantee out-of-order conflicting writes
> are resolved as last-write-wins.  Cassandra doesn't really expect you to be
> writing timestamps with wide variations from record to record.  Indeed, if
> you're doing this, it'll violate some of the assumptions in places such as
> time windowed / date tiered compaction.  It's possible to dodge those
> landmines but it would be hard to know if you got it wrong.
>
> I think in general timestamp manipulation is *caveat utilitor*.  It's not
> clear to me why for your use case you would want to manipulate the
> timestamps as you're loading the records unless you're concerned about
> conflicting writes getting applied in the correct order.
>
> Probably worth a footnote in the documentation indicating that if you're
> doing both USING TTL and WITH TIMESTAMP that those don't relate to each
> other.  At rest TTL'd records get written with an expiration timestamp, not
> a delta from the writetime.
>
> On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot 
> wrote:
>
>> It appears as though, when inserting with "using ttl [foo] and timestamp
>> [bar]" that the TTL does not take the provided timestamp into account.
>>
>> In other words, the TTL starts at insert time, not at the time specified
>> by the timestamp.
>>
>> Similarly, if inserting with just "using timestamp [bar]" and relying on
>> the table's default_time_to_live property, the timestamp is again ignored
>> in terms of TTL expiration.
>>
>> Seems like a bug to me, but I'm guessing this is intended behavior?
>>
>> Use-case is importing data (some of it historical) and setting the
>> timestamp manually (based on a timestamp within the data itself). Anyone
>> familiar with any work-arounds that don't rely on calculating a TTL
>> client-side for each record?
>>
>


weird jvm metrics

2016-12-28 Thread Mike Torra
Hi There -

I recently upgraded from cassandra 3.5 to 3.9 (DDC), and I noticed that the 
"new" jvm metrics are reporting with an extra '.' character in them. Here is a 
snippet of what I see from one of my nodes:


ubuntu@ip-10-0-2-163:~$ sudo tcpdump -i eth0 -v dst port 2003 -A | grep 'jvm'

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 
bytes

.Je..l>.pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.capacity 
762371494 1482960946

pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.count 3054 1482960946

pi.cassandra.us-east-1.cassy-node1.jvm.buffers..direct.used 762371496 1482960946

pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.capacity 515226631134 
1482960946

pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.count 45572 1482960946

pi.cassandra.us-east-1.cassy-node1.jvm.buffers..mapped.used 515319762610 
1482960946

pi.cassandra.us-east-1.cassy-node1.jvm.fd.usage 0.00 1482960946

My metrics.yaml looks like this:

graphite:
  -
period: 60
timeunit: 'SECONDS'
prefix: 'pi.cassandra.us-east-1.cassy-node1'
hosts:
 - host: '#RELAY_HOST#'
   port: 2003
predicate:
  color: "white"
  useQualifiedName: true
  patterns:
- "^org.+"
- "^jvm.+"
- "^java.lang.+"

All the org.* metrics come through fine, and the jvm.fd.usage metric strangely 
comes through fine, too. The rest of the jvm.* metrics have this extra '.' 
character that causes them to not show up in graphite.

Am I missing something silly here? Appreciate any help or suggestions.

- Mike


unsubscribe

2016-12-28 Thread Nag M



Re: Insert with both TTL and timestamp behavior

2016-12-28 Thread DuyHai Doan
Indeed, the TTL is computed based on LOCAL timestamp of the server and not
based on the PROVIDED timestamp by the client ... (according to Mastering
Apache Cassandra, 2nd edition, Nishant Neeraj, PackPublishing)

On Wed, Dec 28, 2016 at 10:15 PM, Voytek Jarnot 
wrote:

> >It's not clear to me why for your use case you would want to manipulate
> the timestamps as you're loading the records unless you're concerned about
> conflicting writes getting applied in the correct order.
>
> Simple use-case: want to load historical data, want to use TWCS, want to
> use TTL.
>
> Scenario:
> Importing data using standard write path (inserts)
> Using timestamp to give TWCS something to work with (import records
> contain a created-on timestamp from which I populate "using timestamp")
> Need records to expire according to TTL
> Don't want to calculate TTL for every insert individually (obviously what
> I want and what I get differ)
> I'm importing in chrono order, so TWCS should be able to keep things from
> getting out of hand.
>
> >I think in general timestamp manipulation is *caveat utilitor*.
>
> Yeah; although I'd probably choose stronger words. TWCS (and perhaps
> DTCS?) appears to treat writetimes as timestamps; the rest of Cassandra
> appears to treat them as integers.
>
>
> On Wed, Dec 28, 2016 at 2:50 PM, Eric Stevens  wrote:
>
>> The purpose of timestamps is to guarantee out-of-order conflicting writes
>> are resolved as last-write-wins.  Cassandra doesn't really expect you to be
>> writing timestamps with wide variations from record to record.  Indeed, if
>> you're doing this, it'll violate some of the assumptions in places such as
>> time windowed / date tiered compaction.  It's possible to dodge those
>> landmines but it would be hard to know if you got it wrong.
>>
>> I think in general timestamp manipulation is *caveat utilitor*.  It's
>> not clear to me why for your use case you would want to manipulate the
>> timestamps as you're loading the records unless you're concerned about
>> conflicting writes getting applied in the correct order.
>>
>> Probably worth a footnote in the documentation indicating that if you're
>> doing both USING TTL and WITH TIMESTAMP that those don't relate to each
>> other.  At rest TTL'd records get written with an expiration timestamp, not
>> a delta from the writetime.
>>
>> On Wed, Dec 28, 2016 at 9:38 AM Voytek Jarnot 
>> wrote:
>>
>>> It appears as though, when inserting with "using ttl [foo] and timestamp
>>> [bar]" that the TTL does not take the provided timestamp into account.
>>>
>>> In other words, the TTL starts at insert time, not at the time specified
>>> by the timestamp.
>>>
>>> Similarly, if inserting with just "using timestamp [bar]" and relying on
>>> the table's default_time_to_live property, the timestamp is again ignored
>>> in terms of TTL expiration.
>>>
>>> Seems like a bug to me, but I'm guessing this is intended behavior?
>>>
>>> Use-case is importing data (some of it historical) and setting the
>>> timestamp manually (based on a timestamp within the data itself). Anyone
>>> familiar with any work-arounds that don't rely on calculating a TTL
>>> client-side for each record?
>>>
>>
>


Growing Hints

2016-12-28 Thread Anshu Vajpayee
Hello All
We have one unusual issue on our cluster. We are seeing growing hints table
on  node although all the nodes are up  and coming online with notetool
status.

I know  Cassandra  appends the hints in case if there is  write timeout for
other nodes. In our case  all nodes are up and functional , Gossip is also
flowing well.  Also write time out value is quite high in our cluster .
Can anyone suggest what could be other possible reason for these growing
hints ?