Hi Eric,

Thanks for confirming what we suspected all along!

On Tue, Aug 9, 2022 at 10:56 AM
eric-van.l...@klm.com> wrote:

> Hi Zoltan,
> I checked with the developer and basically there's no difference between a
> CANCEL PROC and a CANCEL REPLICATION. It might be improved in a future
> release.
-----Original Message-----
> From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> On Behalf Of Zoltan
> Forray
> Sent: dinsdag 9 augustus 2022 14:25
> Subject: Re: Cancel Session with EXTREME PREJUDICE (a.k.a. FORCE)
> Hi Eric,
> Thank you for confirming what my co-worker and I have been experiencing
> and that we aren't alone with these issues.  We would never have jumped from
> 8.1.12 to .14 if it were not for log4j and the plethora of other possible
> intrusions!
> I have seen way too many APARS related to STGRULE and have no desire to
> inflict even more pain upon ourselves!   These are (were) standard
> administrative schedule or console "replicate node" commands using node
> groups or sometimes just a single node trying to reconcile "errors" caused
> by canceling replication.
> This brings up another question - what is the purpose of the "cancel
> replication" command if it causes such damage that subsequent replications
> issue these dire warnings about *"detected partially replicated data from a
> previous replication operation. This might result in extended processing
> time while the server is replicating"*. IMO, it sounds like a regular
> "cancel process" would do about the same?  I understand about special
> commands like "cancel expiration" which allows subsequent expire inventory
> commands to resume processing where the "canceled expiration" left off but
> cancel replication doesn't seem to offer much benefit - or is this another
> APAR waiting for mitigation?
On Tue, Aug 9, 2022 at 2:54 AM
> eric-van.l...@klm.com> wrote:
> > Hi Zoltan,
> >
> > This all sounds so familiar. To my opinion, all releases after 8.1.12
> > are the most buggy versions IBM ever released (yes, maybe even more
> > buggy than the infamous 6.1). I'm in close contact with somebody from
> > development and it already resulted in 6 or 7 APARs and still not
> everything is working OK.
> > The crashes, the hanging replications, the slow replications, the
> > stale sessions which can't be canceled, I have seen them all...
> > One question: are you using 'traditional' node replication or are you
> > using stgrule replication? The latter one contains a nasty bug: when
> > replication fails of gets canceled, the next replication runs VERY slow.
> > The only way to fix this is by running  a special script I received
> > from support, along with several DB2 commands... No permanent fix
> available yet.
> > But to come back on your question: I have never been able to cancel
> > those session, the only way to get rid of them is by bouncing the server.
> >
> > Kind regards,
> > Eric van Loon
> > Air France/KLM Core Infra
> >
> >
> > First off, we do not know how anyone can run replication of data from
> > a FILE base storagepool (yes, we know that CONTAINERS fixes
> > everything🙄) to another server.  Every attempt we have made usually
> > ends up in a mess that we have to undo/cleanup.  We find it is very
> > slow (10G on both ends) and the replication processes never seem to
> finish/end.
> >
> > We have observed that no matter how many or how few replication
> > sessions we start, most of them seem to go idle/wait (e.g.
> > MAXSESSIONS=10 starts 20-sessions to the target server of which 16+
> > become idle eventhough there is 4TB to replication.
> >
> > Since we need to get off magnetic tape (moving to a new building with
> > restricted space so existing ATL has to go!), we have been using the
> > offsite server as Virtual Volumes and creating offsite backups to it.
> > This was working pretty well until we started experiencing server
> > crashes/cores after upgrading to 8.1.14 (support confirmed a bug -
> > sent us an eFix for it
> > - we were continuing to have intermittent crashes - support discovered
> > another related bug via RECONCILE VOLUMES command - just installed
> > another eFix that is supposed to address the crashes).
> >
> > While waiting for a fix for the original server problem, we decided to
> > try to transition back to replication - only to have more problems
> > than the crashes. We have had to bounce the target and source servers
> > multiple times due to replication sessions that won't go away/end as
> > well as the performance issues I mentioned above.
> >
> > Did I mention the issues with the 8.1.15 Linux client also related to
> > replication?
> >
> > Since we have the shared stgpools/reconcile volumes eFix (
> > installed on all servers, we have decided to go back to virtual volumes.
> >
> > Now back to the subject of this post.  Right now I have 4-replication
> > sessions on the target server that say they are doing something (i.e.
> > not in a WAIT), but in reality have been hung since August 1st (we had
> > installed the eFix but forgot to disable the admin command that kicks
> > off replication).  There are no replication sessions on the source
> server.
> >
> > All attempts to cancel the ghost sessions on the target server say
> > they can't be canceled.
> >
> > So before we bounce it one-more-time, we were wondering if there is a
> > super-secret "cancel session with force" we are not aware of?
> >
> >
Reply via email to