[resent by mbp]

On Tue, 2002-05-28 at 10:22, Jonathan Kamens wrote:

> If you run rsync with a subshell through ssh.com's ssh and sshd and
> then kill the client with ctrl-C, the rsync server process running on
> the remote machine grows without bound and eventually completely hoses
> the machine.

I feel some responsiblity for this, because it was my patch (to solve
another, but similar problem) which caused rsync to ignore SIGPIPE.

>   When the client rsync and ssh exit, and thus sshd on the other end
>   exits, and then the server rsync tries to write to the client, it
>   gets SIGPIPE.  Alas, SIGPIPE is being ignored, because, quoting a
>   comment in the rsync source code, "Ignore SIGPIPE; we consistently
>   check error codes and will see the EPIPE."
> 
>   The comment is wrong; it does *not* see the SIGPIPE.  

I assume you mean it doesn't see the EPIPE.  SIGPIPE is ignored, so we
won't see it.

>   What happens
>   is that as a result of the SIGPIPE

You mean as the result of the EPIPE return code.

> exit_cleanup gets called.
>   That's a macro which calls _exit_cleanup.  That calls log_exit That
>   calls rwrite.  Rwrite tries to write an error to stderr, but that
>   fails because of (you guessed it!) SIGPIPE, and so rwrite calls
>   exit_cleanup.  Presto, an infinite loop is born.

This analysis appears to be correct.

>   The most straightforward fix I came up with is to modify rwrite so
>   that it doesn't actually try to write the error if the FILE to which
>   the error has been written is showing a true error condition
>   (checked with ferror).  I will attach a patch.

I think a better solution is to create a new function
exit_cleanup_nolog, which will prevent exit_cleanup from logging
anything.  The attached patch implements that, and changes that section
of log.c to use it. There are probably other places that should be
changed to use it as well.

>   I suspect that the rsync server running under openssh doesn't have
>   this problem because openssh sshd causes it it to get a SIGHUP
>   whereas the ssh.com version of sshd does not.  

You should be using OpenSSH anyways; the ssh.com version of ssh is
proprietary software.

Could you test the attached patch?


Index: rsync.h
===================================================================
RCS file: /cvsroot/rsync/rsync.h,v
retrieving revision 1.131
diff -u -d -r1.131 rsync.h
--- rsync.h	11 Apr 2002 02:18:51 -0000	1.131
+++ rsync.h	3 Jun 2002 03:54:11 -0000
@@ -603,7 +603,8 @@
 #define	WEXITSTATUS(stat)	((int)(((stat)>>8)&0xFF))
 #endif
 
-#define exit_cleanup(code) _exit_cleanup(code, __FILE__, __LINE__)
+#define exit_cleanup(code) _exit_cleanup(code, 0, __FILE__, __LINE__)
+#define exit_cleanup_nolog(code) _exit_cleanup(code, 1, __FILE__, __LINE__)
 
 
 extern int verbose;
Index: proto.h
===================================================================
RCS file: /cvsroot/rsync/proto.h,v
retrieving revision 1.148
diff -u -d -r1.148 proto.h
--- proto.h	11 Apr 2002 02:25:53 -0000	1.148
+++ proto.h	3 Jun 2002 03:54:11 -0000
@@ -31,7 +31,7 @@
 void sum_init(void);
 void sum_update(char *p, int len);
 void sum_end(char *sum);
-void _exit_cleanup(int code, const char *file, int line);
+void _exit_cleanup(int code, int nolog, const char *file, int line);
 void cleanup_disable(void);
 void cleanup_set(char *fnametmp, char *fname, struct file_struct *file,
 		 struct map_struct *buf, int fd1, int fd2);
@@ -221,13 +221,13 @@
 OFF_T do_lseek(int fd, OFF_T offset, int whence);
 void *do_mmap(void *start, int len, int prot, int flags, int fd, OFF_T offset);
 char *d_name(struct dirent *di);
-int main(int argc, char **argv);
 int main (int argc, char *argv[]);
 void set_compression(char *fname);
 void send_token(int f,int token,struct map_struct *buf,OFF_T offset,
 		int n,int toklen);
 int recv_token(int f,char **data);
 void see_token(char *data, int toklen);
+int main(int argc, char **argv);
 int main(int argc, char **argv);
 void add_uid(uid_t uid);
 void add_gid(gid_t gid);
Index: log.c
===================================================================
RCS file: /cvsroot/rsync/log.c,v
retrieving revision 1.61
diff -u -d -r1.61 log.c
--- log.c	8 Apr 2002 09:10:50 -0000	1.61
+++ log.c	3 Jun 2002 03:54:11 -0000
@@ -275,9 +275,9 @@
 			f = stdout;
 	} 
 
-	if (!f) exit_cleanup(RERR_MESSAGEIO);
+	if (!f) exit_cleanup_nolog(RERR_MESSAGEIO);
 
-	if (fwrite(buf, len, 1, f) != 1) exit_cleanup(RERR_MESSAGEIO);
+	if (fwrite(buf, len, 1, f) != 1) exit_cleanup_nolog(RERR_MESSAGEIO);
 
 	if (buf[len-1] == '\r' || buf[len-1] == '\n') fflush(f);
 }
Index: cleanup.c
===================================================================
RCS file: /cvsroot/rsync/cleanup.c,v
retrieving revision 1.14
diff -u -d -r1.14 cleanup.c
--- cleanup.c	9 Apr 2002 05:26:46 -0000	1.14
+++ cleanup.c	3 Jun 2002 03:54:11 -0000
@@ -53,10 +53,11 @@
 
 /**
  * Eventually calls exit(), passing @p code, therefore does not return.
+ * If nolog is nonzero, then don't attempt to log the error.
  *
  * @param code one of the RERR_* codes from errcode.h.
  **/
-void _exit_cleanup(int code, const char *file, int line)
+void _exit_cleanup(int code, int nolog, const char *file, int line)
 {
 	int ocode = code;
 	extern int keep_partial;
@@ -65,7 +66,7 @@
 	signal(SIGUSR1, SIG_IGN);
 	signal(SIGUSR2, SIG_IGN);
 
-	if (verbose > 3)
+	if (verbose > 3 && !nolog)
 		rprintf(FINFO,"_exit_cleanup(code=%d, file=%s, line=%d): entered\n", 
 			code, file, line);
 
@@ -102,9 +103,9 @@
 		code = RERR_PARTIAL;
 	}
 
-	if (code) log_exit(code, file, line);
+	if (code && !nolog) log_exit(code, file, line);
 
-	if (verbose > 2)
+	if (verbose > 2 && !nolog)
 		rprintf(FINFO,"_exit_cleanup(code=%d, file=%s, line=%d): about to call exit(%d)\n", 
 			ocode, file, line, code);
 

Reply via email to