Thanks a lot! Very helpful Sent from my iPhone
> On Aug 5, 2019, at 1:50 PM, Alban Hertroys <haram...@gmail.com> wrote: > > >> On 5 Aug 2019, at 17:27, Julie Nishimura <juliez...@hotmail.com> wrote: >> >> Thanks for your reply Alban. Currently we only have A->B replication. Is >> adding B->C replication difficult? I remember in the past I tried to seed >> pg_basebackup from hot standby, and it was erroring out after awhile, so >> needed to switch to run from master. > > I’ve never used multi-level replication (or cascading replication, as it’s > called in the PG docs). > > I expect that replication slots (w/ streaming replication) would be resilient > to streaming to multiple levels, provided that you have the disk space left > to keep the logs that your farthest-behind replica is at, but I do not know > that. I’m pretty sure that’s being done though and not just theoretically > possible. > > The basic problem you have is a bit like Towers of Hanoi. The details of > setting up each replica may be a little complicated, but if you look at it > from a high-level design phase, you’re just shuffling around clusters. The > details come later, when you’re actually designing how to apply those > replicas/clusters. > > One of the main problems is that creating a new replica takes a lot of time, > you want to minimise the total time that takes. Apparently, you have neither > C nor D ready yet, so you need to create two replicas - being able to do > those in parallel instead of sequentially would save you time. > > The other problem is that, to be on the safe side, you want to have a full > replica of A at any point in time. If you disconnect B from A before either C > or D is complete, you run a risk: If A fails, you don’t have the data that > accumulated while B was ‘offline’. So that’s not the best scenario. > > That is why I think your initial set of replicas should look like: > > A — B — C > \ > D > > IIRC, streaming replication uses pg_basebackup to create the initial replica > and then it streams what’s needed to catch up. With replication slots, the > master knows what the slaves still need, so it won’t clean up too early. > Apparently, the slave (B) knows that it needs to retain data for C as well. > It looks perfectly safe on paper, except for the replication lag between A — > B. > > You can, according to the docs, cascade replicas from each other and that > even allows replication (among the slaves) to go on after the master gets > disconnected - quite what you need for the B — C chain, I would think. > > Take a look at: > https://www.postgresql.org/docs/11/warm-standby.html#STREAMING-REPLICATION > And note the sections on Replication Slots and on Cascading Replication. > > The replication lag between A — B can be solved by pausing the clients > connecting to A (half of them need to be changed to B anyway) while B is > catching up on its lag. You probably have at least that much down-time to > change the connections anyway. > > Regards, > > Alban. > > >> From: Alban Hertroys <haram...@gmail.com> >> Sent: Monday, August 5, 2019 5:01 AM >> To: Julie Nishimura <juliez...@hotmail.com> >> Cc: Adrian Klaver <adrian.kla...@aklaver.com>; >> pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; >> pgsql-general <pgsql-gene...@postgresql.org> >> Subject: Re: adding more space to the existing server >> >> >>> On 5 Aug 2019, at 0:39, Julie Nishimura <juliez...@hotmail.com> wrote: >>> >>> Alban, thank you for your reply. Your suggestion makes sense, and I will be >>> talking to our engineers about it. Currently we need to understand: >>> >>> a) How do we break A -> B replication such that both can become >>> independent primaries >> >> That is pretty much like normal failover from A to B, except that you don’t >> reverse replication. >> >> You will need to stop your clients from sending data for a bit (if it’s >> continuous data, having a buffer in between is a big help - at our company >> we’re looking into Apache Kafka for that), so that you can switch half of >> them to connect to B instead of A. >> >> Next, you promote B to master. I used the docs for that last time, and they >> were pretty clear on the subject. >> >> >>> b) How do we reassign C from B->C replication to A->C replication >> >> I don’t think you need to. If you indeed already have A->B->C, after >> promoting B to master, you end up with B->C, which is alright. >> You just need to add A->D for the other set. >> >>> c) Thoughts on why this isn’t a good plan >> >> That depends on your clients and how you decide which database in the >> current cluster they connect to. If you connect specific clients to specific >> databases, then all you need to do is to configure half your clients to >> connect to B instead. >> >> Another option is to put a virtual database layer in front, such that both >> clusters still look like a single database to the outside world. We have >> some experience with Dremio for similar purposes (although for read-only >> reporting). Mind that the community edition doesn’t do authorization. >> >>> Current: >>> A replicates to B >>> all requests go to A >>> >>> Soon: >>> A replicates to B -> cascading to C and D >>> >>> Transition: >>> break A replication to B such that both can become primary >> Correct. >>> stop B replication to C then setup A to replicate to C >> I would change this in: >> setup A to replicate to D >>> >>> End state: >>> A replicates to C >>> B replicates to D >>> >> End state: >> A replicates to D >> B replicates to C >>> >>> we remove some of the dbs from A and B, then reassign the traffic based on >>> db selections >>> >>> I hope it all makes sense... >>> >>> Thank you >> It does to me. Now would be a good time for people to chime in if they don't >> agree ;) >> >>> >>> From: Alban Hertroys <haram...@gmail.com> >>> Sent: Saturday, August 3, 2019 3:15 AM >>> To: Julie Nishimura <juliez...@hotmail.com> >>> Cc: Adrian Klaver <adrian.kla...@aklaver.com>; >>> pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>; >>> pgsql-general <pgsql-gene...@postgresql.org> >>> Subject: Re: adding more space to the existing server >>> >>> >>>>> On 2 Aug 2019, at 21:45, Julie Nishimura <juliez...@hotmail.com> wrote: >>>> >>>> 1) We use streaming replication, and due to hardware limitation, we cannot >>>> add more drives to the existing host. That is why we thought by breaking >>>> the existing streaming replication (from a->b), instead of currently >>>> identical standby (b), we can introduce twice larger host, then start the >>>> replication to the newly larger host, and when it is caught up, break it >>>> again. Then break rep again, make modification to 'a" host, making it >>>> larger, then replicate b->a. After it is caught up, break the rep again, >>>> switch master->standby (if necessary). >>> >>> Let’s be clear, I’m in no way an expert on replication. In fact, I’ve only >>> done (streaming) replication once and I managed to mess that up in a minor >>> way (disabled the wrong service during failover, so data still went to the >>> database I was attempting to replace for, like, 15 minutes). >>> >>>> 2) I am not sure about the time, but it is understood it is required 2 >>>> full replication cycles, and might be up to 2 weeks with no standby >>>> situation >>> >>> No standby situation? Murphy is probably just waiting for that to strike… >>> I recall a fairly recent story on the FreeBSD ML about someone on Malta >>> doing a migration of a couple dozen terabytes from her main server (because >>> of some failing disks in her RAID set) using her backup server to move data >>> around (with backups removed to make room), when, due to an accident >>> outside the building, an aerial 10KV power line hit another power line in >>> the ground, causing a fire in one UPS and frying the other one. Losing >>> power at that point meant that the file systems (ZFS) on both servers ended >>> up in an unrecoverable state with no backups. It didn’t help that the UPS’s >>> were at the bottom of the rack, with the heat and smoke going up into the >>> servers. What are the chances, right? (And then it turned out that it is >>> really hard to try to recover data from a ZFS file system in such a state, >>> which is what her actual inquiry was about) >>> >>> I would definitely prefer to add a 3rd machine into the mix, even if it >>> were just a temporary machine - a rental perhaps? >>> >>> From there, I’m certain Adrian knows more about replication than I do. I’d >>> go with the approach he suggested. >>> >>>> 4) by pg_basebackup and restore >>>> >>>> As of now, we are thinking about possibly other solutions, as of splitting >>>> existing 37 databases on the cluster into 2 hosts with their own standbys. >>>> This solution requires breaking up existing replication as well. Can you >>>> please point me to some document which lists all steps describing breaking >>>> up the existing replication properly? we are using 9.6 postgres >>> >>> I’m going to assume that you will have data coming in while this split is >>> taking place and that you therefore cannot offline the entire set of >>> databases for as long as this takes. If not, that would probably allow for >>> a simpler (faster) scenario. >>> >>> I think the easiest for this scenario would be to add two more machines (c >>> and d) and replicate them off the current setup. You want that to happen as >>> parallel as possible, so perhaps replicate c off a and d off b. >>> >>> If you aren’t already using “replication slots”, I found that to make >>> things both easier to understand and more reliable. You can query their >>> status, for one thing. >>> >>> Those replicas will take extra time of course (about double) because you’re >>> replicating twice what you need, but I don’t think you can replicate parts >>> of a cluster with your setup unless you go for a different replication >>> approach (I think per database replication requires statement level >>> replication?). >>> >>> After that, decouple both sets into: >>> a —> b (your current machine) >>> c —> d (the new ones) >>> >>> (Although any order should be fine, really, as long as they have caught up.) >>> >>> At that point I would probably (temporarily) pause replication in at least >>> one set and create a backup of that. >>> >>> This is the point to start removing superfluous databases from a and c (so >>> that a+c make up the complete set again). >>> After verifying that no databases are missing, unpause replication. >>> >>> If instead you find that you accidentally removed a database from both a >>> and c, you still have replicas to recover it from. And the backups, of >>> course, but that will not contain the data that came in after replication >>> was paused. >>> >>> I do hope the remaining 3% disk space is enough to cover all that, though... >>> >>> Regards, >>> >>> Alban Hertroys >>> -- >>> If you can't see the forest for the trees, >>> cut the trees and you'll find there is no forest. >> >> Alban Hertroys >> -- >> There is always an exception to always. > > Alban Hertroys > -- > There is always an exception to always. > > > >