Re: adding more space to the existing server

Julie Nishimura Mon, 05 Aug 2019 14:05:58 -0700

Thanks a lot! Very helpful

Sent from my iPhone


> On Aug 5, 2019, at 1:50 PM, Alban Hertroys <haram...@gmail.com> wrote:
> 
> 
>> On 5 Aug 2019, at 17:27, Julie Nishimura <juliez...@hotmail.com> wrote:
>> 
>> Thanks for your reply Alban. Currently we only have A->B replication. Is 
>> adding B->C replication difficult? I remember in the past I tried to seed 
>> pg_basebackup from hot standby, and it was erroring out after awhile, so 
>> needed to switch to run from master.
> 
> I’ve never used multi-level replication (or cascading replication, as it’s 
> called in the PG docs).
> 
> I expect that replication slots (w/ streaming replication) would be resilient 
> to streaming to multiple levels, provided that you have the disk space left 
> to keep the logs that your farthest-behind replica is at, but I do not know 
> that. I’m pretty sure that’s being done though and not just theoretically 
> possible.
> 
> The basic problem you have is a bit like Towers of Hanoi. The details of 
> setting up each replica may be a little complicated, but if you look at it 
> from a high-level design phase, you’re just shuffling around clusters. The 
> details come later, when you’re actually designing how to apply those 
> replicas/clusters.
> 
> One of the main problems is that creating a new replica takes a lot of time, 
> you want to minimise the total time that takes. Apparently, you have neither 
> C nor D ready yet, so you need to create two replicas - being able to do 
> those in parallel instead of sequentially would save you time.
> 
> The other problem is that, to be on the safe side, you want to have a full 
> replica of A at any point in time. If you disconnect B from A before either C 
> or D is complete, you run a risk: If A fails, you don’t have the data that 
> accumulated while B was ‘offline’. So that’s not the best scenario.
> 
> That is why I think your initial set of replicas should look like:
> 
> A — B — C
> \
>  D
> 
> IIRC, streaming replication uses pg_basebackup to create the initial replica 
> and then it streams what’s needed to catch up. With replication slots, the 
> master knows what the slaves still need, so it won’t clean up too early. 
> Apparently, the slave (B) knows that it needs to retain data for C as well. 
> It looks perfectly safe on paper, except for the replication lag between A — 
> B.
> 
> You can, according to the docs, cascade replicas from each other and that 
> even allows replication (among the slaves) to go on after the master gets 
> disconnected - quite what you need for the B — C chain, I would think.
> 
> Take a look at: 
> https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION
> And note the sections on Replication Slots and on Cascading Replication.
> 
> The replication lag between A — B can be solved by pausing the clients 
> connecting to A (half of them need to be changed to B anyway) while B is 
> catching up on its lag. You probably have at least that much down-time to 
> change the connections anyway.
> 
> Regards,
> 
> Alban.
> 
> 
>> From: Alban Hertroys <haram...@gmail.com>
>> Sent: Monday, August 5, 2019 5:01 AM
>> To: Julie Nishimura <juliez...@hotmail.com>
>> Cc: Adrian Klaver <adrian.kla...@aklaver.com>; 
>> pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; 
>> pgsql-general <pgsql-gene...@postgresql.org>
>> Subject: Re: adding more space to the existing server
>> 
>> 
>>> On 5 Aug 2019, at 0:39, Julie Nishimura <juliez...@hotmail.com> wrote:
>>> 
>>> Alban, thank you for your reply. Your suggestion makes sense, and I will be 
>>> talking to our engineers about it. Currently we need to understand:
>>> 
>>> a)       How do we break A -> B replication such that both can become 
>>> independent primaries
>> 
>> That is pretty much like normal failover from A to B, except that you don’t 
>> reverse replication.
>> 
>> You will need to stop your clients from sending data for a bit (if it’s 
>> continuous data, having a buffer in between is a big help - at our company 
>> we’re looking into Apache Kafka for that), so that you can switch half of 
>> them to connect to B instead of A.
>> 
>> Next, you promote B to master. I used the docs for that last time, and they 
>> were pretty clear on the subject.
>> 
>> 
>>> b)      How do we reassign C from B->C replication to A->C replication
>> 
>> I don’t think you need to. If you indeed already have A->B->C, after 
>> promoting B to master, you end up with B->C, which is alright.
>> You just need to add A->D for the other set.
>> 
>>> c)       Thoughts on why this isn’t a good plan
>> 
>> That depends on your clients and how you decide which database in the 
>> current cluster they connect to. If you connect specific clients to specific 
>> databases, then all you need to do is to configure half your clients to 
>> connect to B instead.
>> 
>> Another option is to put a virtual database layer in front, such that both 
>> clusters still look like a single database to the outside world. We have 
>> some experience with Dremio for similar purposes (although for read-only 
>> reporting). Mind that the community edition doesn’t do authorization.
>> 
>>> Current: 
>>> A replicates to B
>>> all requests go to A
>>> 
>>> Soon: 
>>>  A replicates to B -> cascading to C and D
>>> 
>>> Transition: 
>>>  break A replication to B such that both can become primary
>> Correct.
>>>  stop B replication to C  then setup A to replicate to C
>> I would change this in:
>>   setup A to replicate to D
>>> 
>>> End state:
>>> A replicates to C
>>> B replicates to D
>>> 
>> End state:
>> A replicates to D
>> B replicates to C
>>> 
>>> we remove some of the dbs from A and B, then reassign the traffic based on 
>>> db selections
>>> 
>>> I hope it all makes sense...
>>> 
>>> Thank you
>> It does to me. Now would be a good time for people to chime in if they don't 
>> agree ;)
>> 
>>> 
>>> From: Alban Hertroys <haram...@gmail.com>
>>> Sent: Saturday, August 3, 2019 3:15 AM
>>> To: Julie Nishimura <juliez...@hotmail.com>
>>> Cc: Adrian Klaver <adrian.kla...@aklaver.com>; 
>>> pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; 
>>> pgsql-general <pgsql-gene...@postgresql.org>
>>> Subject: Re: adding more space to the existing server
>>> 
>>> 
>>>>> On 2 Aug 2019, at 21:45, Julie Nishimura <juliez...@hotmail.com> wrote:
>>>> 
>>>> 1) We use streaming replication, and due to hardware limitation, we cannot 
>>>> add more drives to the existing host. That is why we thought by breaking 
>>>> the existing streaming replication (from a->b), instead of currently 
>>>> identical standby (b), we can introduce twice larger host, then start the 
>>>> replication to the newly larger host, and when it is caught up, break it 
>>>> again. Then break rep again, make modification to 'a" host, making it 
>>>> larger, then replicate b->a. After it is caught up, break the rep again, 
>>>> switch master->standby (if necessary).
>>> 
>>> Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only 
>>> done (streaming) replication once and I managed to mess that up in a minor 
>>> way (disabled the wrong service during failover, so data still went to the 
>>> database I was attempting to replace for, like, 15 minutes).
>>> 
>>>> 2) I am not sure about the time, but it is understood it is required 2 
>>>> full replication cycles, and might be up to 2 weeks with no standby 
>>>> situation
>>> 
>>> No standby situation? Murphy is probably just waiting for that to strike…
>>> I recall a fairly recent story on the FreeBSD ML about someone on Malta 
>>> doing a migration of a couple dozen terabytes from her main server (because 
>>> of some failing disks in her RAID set) using her backup server to move data 
>>> around (with backups removed to make room), when, due to an accident 
>>> outside the building, an aerial 10KV power line hit another power line in 
>>> the ground, causing a fire in one UPS and frying the other one. Losing 
>>> power at that point meant that the file systems (ZFS) on both servers ended 
>>> up in an unrecoverable state with no backups. It didn’t help that the UPS’s 
>>> were at the bottom of the rack, with the heat and smoke going up into the 
>>> servers. What are the chances, right? (And then it turned out that it is 
>>> really hard to try to recover data from a ZFS file system in such a state, 
>>> which is what her actual inquiry was about)
>>> 
>>> I would definitely prefer to add a 3rd machine into the mix, even if it 
>>> were just a temporary machine - a rental perhaps?
>>> 
>>> From there, I’m certain Adrian knows more about replication than I do. I’d 
>>> go with the approach he suggested.
>>> 
>>>> 4) by pg_basebackup and restore
>>>> 
>>>> As of now, we are thinking about possibly other solutions, as of splitting 
>>>> existing 37 databases on the cluster into 2 hosts with their own standbys. 
>>>> This solution requires breaking up existing replication as well. Can you 
>>>> please point me to some document which lists all steps describing breaking 
>>>> up the existing replication properly? we are using 9.6 postgres
>>> 
>>> I’m going to assume that you will have data coming in while this split is 
>>> taking place and that you therefore cannot offline the entire set of 
>>> databases for as long as this takes. If not, that would probably allow for 
>>> a simpler (faster) scenario.
>>> 
>>> I think the easiest for this scenario would be to add two more machines (c 
>>> and d) and replicate them off the current setup. You want that to happen as 
>>> parallel as possible, so perhaps replicate c off a and d off b.
>>> 
>>> If you aren’t already using “replication slots”, I found that to make 
>>> things both easier to understand and more reliable. You can query their 
>>> status, for one thing.
>>> 
>>> Those replicas will take extra time of course (about double) because you’re 
>>> replicating twice what you need, but I don’t think you can replicate parts 
>>> of a cluster with your setup unless you go for a different replication 
>>> approach (I think per database replication requires statement level 
>>> replication?).
>>> 
>>> After that, decouple both sets into:
>>> a —> b (your current machine)
>>> c —> d (the new ones)
>>> 
>>> (Although any order should be fine, really, as long as they have caught up.)
>>> 
>>> At that point I would probably (temporarily) pause replication in at least 
>>> one set and create a backup of that.
>>> 
>>> This is the point to start removing superfluous databases from a and c (so 
>>> that a+c make up the complete set again).
>>> After verifying that no databases are missing, unpause replication.
>>> 
>>> If instead you find that you accidentally removed a database from both a 
>>> and c, you still have replicas to recover it from. And the backups, of 
>>> course, but that will not contain the data that came in after replication 
>>> was paused.
>>> 
>>> I do hope the remaining 3% disk space is enough to cover all that, though...
>>> 
>>> Regards,
>>> 
>>> Alban Hertroys
>>> --
>>> If you can't see the forest for the trees,
>>> cut the trees and you'll find there is no forest.
>> 
>> Alban Hertroys
>> --
>> There is always an exception to always.
> 
> Alban Hertroys
> --
> There is always an exception to always.
> 
> 
> 
>

Re: adding more space to the existing server

Reply via email to