Actually, pls ignore my email. re reading my mail makes it look like I did not research it throughly and just asked without actual implementation of both options and having a clear goal on what can incompromise along with no read downtime. I'll write better next time.
On Wed, Oct 26, 2022, 10:04 PM Vijaykumar Jain < vijaykumarjain.git...@gmail.com> wrote: > Hi all, > > tl;dr > I have a simple question, > given a choice if I can write the same data to two databases in parallel, > should I opt for primary / replica setup or multi writer/master setup. This > setup has the ability to make use of kafka consumer groups (like two > replication slots each having their own lsn offsets) to write to both db > node pairs in parallel via the application layer. > > The churn of data is really high, there is a lot of wals generated, around > 500gb/hr. > > If I go with primary/replica, (lr not feasible) > > I need to ensure both are on the same major version. Upgrades are tricky > (we don't have qa) so we just have option to schema dump for upgrade > compatibility. Data, we trust postgresql for that :). (I wish we had zfs > everywhere but no ) > > Any excl table blocking operations, (although with later versions there > are very less blocking operations) can impact queries on replica as well > (excluding delay settings). > > Corruption can cause downtime (we have tons of them and raids to protect > them) so if replica is having issues, we can zero the pages on the replica > and do some operations if we isolate the problem pages, else resync the > replica from primary. But if primary is having some issues, we copy data > from replica to disk and copy in to primary after truncating etc. Some > downtime but not a lot. (I am not expert at data recovery) and mostly rely > on amcheck, dd, and raid checks. > > We don't use pitr (too many wals × 58) or delayed replication as we can't > afford more servers. > > ddl deploys are guaranteed by replication. So no need to try 2pc like > stuff at app layer. (Although apps use deploy tools to ensure eventually > the ddls are consistent and idempotent) > > Basically primary/replica relieves the app to think what is there on the > primary is also on the replica eventually, so there can be source of truth. > > But with multi writers, any app mishandling like bug in catching exception > etc can result in diversion and no more mirrored setup. > We need to have checks/reconciliation to ensure both write nodes in pair > have almost similar data at the end of the day so we can trust this setup > independent of any app mistakes. > > But if app layer gets robust, we have almost no downtime in reads and > writes, we can have both nodes on different versions, (w/o logical > replication) can query both nodes real time, no real replication lag issues > , conflicts etc, can upgrade like blue green, canary test some changes on > one if needed etc. > > Am I making sense at all? Or I am sounding confused, and I don't know the > difference between primary/replica vs multi writer. This is not bdr like > thing, they don't really need each other unless we are into some recovery. > > My point is, we have 58 such primary/replica shards (each 10tb+) > (consistent hashing at app layer, no fdw) and there is no scope of > downtime for reads, so any issue like post upgrade performance degradation > (if any) gives me chills. and we have no qa to test real data. > > There are too many dimensions to shard on and aggregations need to run > across the shards (Yes there is no scope of data isolation). > >