Join_ring=false Use Cases

2016-12-13 Thread Anuj Wadehra
 Hi,
I need to understand the use case of join_ring=false in case of node outages. 
As per https://issues.apache.org/jira/browse/CASSANDRA-6961, you would want 
join_ring=false when you have to repair a node before bringing a node back 
after some considerable outage. The problem I see with join_ring=false is that 
unlike autobootstrap, the node will NOT accept writes while you are running 
repair on it. If a node was down for 5 hours and you bring it back with 
join_ring=false, repair the node for 7 hours and then make it join the ring, it 
will STILL have missed writes because while the time repair was running (7 
hrs), writes only went to other others. So, if you want to make sure that reads 
served by the restored node at CL ONE will return consistent data after the 
node has joined, you wont get that as writes have been missed while the node is 
being repaired. And if you work with Read/Write CL=QUORUM, even if you bring 
back the node without join_ring=false, you would anyways get the desired 
consistency. So, how join_ring would provide any additional consistency in this 
case ??
I can see join_ring=false useful only when I am restoring from Snapshot or 
bootstrapping and there are dropped mutations in my cluster which are not fixed 
by hinted handoff.
For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted Handoff=3 
hrs.10 AM Snapshot taken on all 3 nodes11 AM: Node B goes down for 4 hours3 PM: 
Node B comes up but data is not repaired. So, 1 hr of dropped mutations (2-3 
PM) not fixed via Hinted Handoff.5 PM: Node A crashes.6 PM: Node A restored 
from 10 AM Snapshot, Node A started with join_ring=false, repaired and then 
joined the cluster.
In above restore snapshot example, updates from 2-3 PM were outside hinted 
handoff window of 3 hours. Thus, node B wont get those updates. Node A data for 
2-3 PM is already lost. So, 2-3 PM updates are only on one replica i.e. node C 
and minimum consistency needed is QUORUM so join_ring=false would help. But 
this is very specific use case.  
ThanksAnuj


Re: Configure NTP for Cassandra

2016-12-13 Thread Anuj Wadehra
Any NTP experts willing to take up these questions?

Thanks
Anuj 
 
  On Sun, 27 Nov, 2016 at 12:52 AM, Anuj Wadehra wrote: 
  Hi,
One popular NTP setup recommended for Cassandra users is described at 
Thankshttps://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
 .
Summary of article is:Setup recommends a dedicated pool of internal NTP servers 
which are associated as peers to provide a HA NTP service. Cassandra nodes sync 
to this dedicated pool but define one internal NTP server as preferred server 
to ensure relative clock synchronization. Internal NTP servers sync to external 
NTP servers.
My questions:
1. If my ISP provider is providing me a pool of reliable NTP servers, should I 
setup my own internal servers anyway or can I sync Cassandra nodes directly to 
the ISP provided servers and select one of the servers as preferred for 
relative clock synchronization?

I agree. If you have to rely on public NTP pool which selects random servers 
for sync, having an internal NTP server pool is justified for getting tight 
relative sync as described in the blog 
2. As per my understanding, peer association is ONLY for backup scenario . If a 
peer loses time synchronization source, then other peers can be used for time 
synchronization. Thus providing a HA service. But when everything is ok (happy 
path), does defining NTP servers synced from different sources as peers lead 
them to converge time as mentioned in some forums?
e.g. if A and B are peers and thier times are 9:00:00 and 9:00:10 after syncing 
with respective time sources, then will they converge their clocks as 9:00:05?
I doubt the above claim regarding time converge. Also no formal doc says that. 
Comments?

ThanksAnuj
  


Re: Configure NTP for Cassandra

2016-12-13 Thread Jim Witschey
You might find more NTP experts on the NTP questions mailing list:
http://lists.ntp.org/listinfo/questions

On Tue, Dec 13, 2016 at 1:25 PM, Anuj Wadehra  wrote:
> Any NTP experts willing to take up these questions?
>
> Thanks
> Anuj
>
> On Sun, 27 Nov, 2016 at 12:52 AM, Anuj Wadehra
>  wrote:
> Hi,
>
> One popular NTP setup recommended for Cassandra users is described at
> Thankshttps://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
> .
>
> Summary of article is:
> Setup recommends a dedicated pool of internal NTP servers which are
> associated as peers to provide a HA NTP service. Cassandra nodes sync to
> this dedicated pool but define one internal NTP server as preferred server
> to ensure relative clock synchronization. Internal NTP servers sync to
> external NTP servers.
>
> My questions:
>
> 1. If my ISP provider is providing me a pool of reliable NTP servers, should
> I setup my own internal servers anyway or can I sync Cassandra nodes
> directly to the ISP provided servers and select one of the servers as
> preferred for relative clock synchronization?
>
>
> I agree. If you have to rely on public NTP pool which selects random servers
> for sync, having an internal NTP server pool is justified for getting tight
> relative sync as described in the blog
>
> 2. As per my understanding, peer association is ONLY for backup scenario .
> If a peer loses time synchronization source, then other peers can be used
> for time synchronization. Thus providing a HA service. But when everything
> is ok (happy path), does defining NTP servers synced from different sources
> as peers lead them to converge time as mentioned in some forums?
>
> e.g. if A and B are peers and thier times are 9:00:00 and 9:00:10 after
> syncing with respective time sources, then will they converge their clocks
> as 9:00:05?
>
> I doubt the above claim regarding time converge. Also no formal doc says
> that. Comments?
>
>
> Thanks
> Anuj
>


Re: Configure NTP for Cassandra

2016-12-13 Thread Anuj Wadehra
Thanks for the NTP link. Most of us are Cassandra users and must be using NTP 
(or other time synchronization methods) for ensuring relative time 
synchronization in our Cassandra clusters. I hope there are people on the 
mailing list who can answer these questions with respect to Cassandra. 
There is just one detailed blog on NTP best practices for Cassandra and I think 
answering these questions is important rather than just creating an internal 
NTP pool with recommended settings.

Thanks
Anuj 
 
  On Wed, 14 Dec, 2016 at 12:07 AM, Jim Witschey 
wrote:   You might find more NTP experts on the NTP questions mailing list:
http://lists.ntp.org/listinfo/questions

On Tue, Dec 13, 2016 at 1:25 PM, Anuj Wadehra  wrote:
> Any NTP experts willing to take up these questions?
>
> Thanks
> Anuj
>
> On Sun, 27 Nov, 2016 at 12:52 AM, Anuj Wadehra
>  wrote:
> Hi,
>
> One popular NTP setup recommended for Cassandra users is described at
> Thankshttps://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/
> .
>
> Summary of article is:
> Setup recommends a dedicated pool of internal NTP servers which are
> associated as peers to provide a HA NTP service. Cassandra nodes sync to
> this dedicated pool but define one internal NTP server as preferred server
> to ensure relative clock synchronization. Internal NTP servers sync to
> external NTP servers.
>
> My questions:
>
> 1. If my ISP provider is providing me a pool of reliable NTP servers, should
> I setup my own internal servers anyway or can I sync Cassandra nodes
> directly to the ISP provided servers and select one of the servers as
> preferred for relative clock synchronization?
>
>
> I agree. If you have to rely on public NTP pool which selects random servers
> for sync, having an internal NTP server pool is justified for getting tight
> relative sync as described in the blog
>
> 2. As per my understanding, peer association is ONLY for backup scenario .
> If a peer loses time synchronization source, then other peers can be used
> for time synchronization. Thus providing a HA service. But when everything
> is ok (happy path), does defining NTP servers synced from different sources
> as peers lead them to converge time as mentioned in some forums?
>
> e.g. if A and B are peers and thier times are 9:00:00 and 9:00:10 after
> syncing with respective time sources, then will they converge their clocks
> as 9:00:05?
>
> I doubt the above claim regarding time converge. Also no formal doc says
> that. Comments?
>
>
> Thanks
> Anuj
>
  


Are Materialized views persisted on disk?

2016-12-13 Thread Kant Kodali
Are Materialized views persisted on disk? sorry for the naive question.


Re: Are Materialized views persisted on disk?

2016-12-13 Thread Carl Yeksigian
Yes, they are stored on disk like a normal table.

On Tue, Dec 13, 2016 at 2:31 PM, Kant Kodali  wrote:

> Are Materialized views persisted on disk? sorry for the naive question.
>


Re: Are Materialized views persisted on disk?

2016-12-13 Thread Benjamin Roth
The word "materialized" implies that.

2016-12-13 20:34 GMT+01:00 Carl Yeksigian :

> Yes, they are stored on disk like a normal table.
>
> On Tue, Dec 13, 2016 at 2:31 PM, Kant Kodali  wrote:
>
>> Are Materialized views persisted on disk? sorry for the naive question.
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Are Materialized views persisted on disk?

2016-12-13 Thread Jonathan Haddad
People should be able to ask legit questions here without getting snarky
answers, please don't do that.  Not everyone has the same background or
knowledge that you do.

On Tue, Dec 13, 2016 at 11:49 AM Benjamin Roth 
wrote:

> The word "materialized" implies that.
>
> 2016-12-13 20:34 GMT+01:00 Carl Yeksigian :
>
> Yes, they are stored on disk like a normal table.
>
> On Tue, Dec 13, 2016 at 2:31 PM, Kant Kodali  wrote:
>
> Are Materialized views persisted on disk? sorry for the naive question.
>
>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: Are Materialized views persisted on disk?

2016-12-13 Thread Benjamin Roth
It wasn't meant in a snarky way, it was as (too short) explanation. I try
to sum it up:

Materialized View:
The data that is represented by the view is stored persistently and updated
as soon as the underlying base data changes.
On RDBMS: Pro: Fast reads, Con: Slow(er) updates
On CS: Used to do filtering or sorting of the base table. Much slower write
path.

"Regular" View:
The base data is queried on demand. More or less a rewrite or alias of
another query.
On RDBMS: Pro: No updates required, Con: Probably slow reads, depending on
indexes.
On CS: Does not exist.

The term "materialized view" has been established by well known RDBMS like
oracle and behaves very similar in CS. In most RDBMS a view can have many
base tables. In CS an MV can have only one base table and has many more
restrictions compared to RDBMS.

2016-12-13 21:06 GMT+01:00 Jonathan Haddad :

> People should be able to ask legit questions here without getting snarky
> answers, please don't do that.  Not everyone has the same background or
> knowledge that you do.
>
> On Tue, Dec 13, 2016 at 11:49 AM Benjamin Roth 
> wrote:
>
>> The word "materialized" implies that.
>>
>> 2016-12-13 20:34 GMT+01:00 Carl Yeksigian :
>>
>> Yes, they are stored on disk like a normal table.
>>
>> On Tue, Dec 13, 2016 at 2:31 PM, Kant Kodali  wrote:
>>
>> Are Materialized views persisted on disk? sorry for the naive question.
>>
>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: Configure NTP for Cassandra

2016-12-13 Thread Martin Schröder
2016-11-26 20:20 GMT+01:00 Anuj Wadehra :
> 1. If my ISP provider is providing me a pool of reliable NTP servers, should
> I setup my own internal servers anyway or can I sync Cassandra nodes
> directly to the ISP provided servers and select one of the servers as
> preferred for relative clock synchronization?

Set up three ntp servers which uses the provider servers _and_ pool servers
and sync your other machines from these servers (and maybe get gps receivers
for your ntp servers). This reduces ntp traffic at your firewall (your servers
act as proxies) and reduces load on public servers.

> 2. As per my understanding, peer association is ONLY for backup scenario .
> If a peer loses time synchronization source, then other peers can be used
> for time synchronization. Thus providing a HA service. But when everything
> is ok (happy path), does defining NTP servers synced from different sources
> as peers lead them to converge time as mentioned in some forums?

Maybe; but the difference will be negligible (sub milliseconds).
I wouldn't worry about that.

Best
   Martin