Set debug to 100 (-d100) on both ends. The debug messages will give more details about what is going on ...
On Monday 07 November 2005 21:20, TássiaCamõesAraújo wrote: > Hi all, > > About 10 days ago I wrote a message saying that I had some > "Authorization Errors" going on, but I didn't have any ideia of what > could be causing the problem. > At that time I didn't give information enough for someone to help me, > thanks Arno Lehmann for trying :) > > The scenario is that: I have a Bacula-director running at server > "maculele" and Bacula-fd runing at almost 10 different servers. The > problem only happens on server "newton". The File Daemon is refusing > the Bacula-Director connection, logging the following error: > > ------------------------------------------------------------------------- > 05-Nov 21:00 maculele-dir: Start Backup JobId 603, > Job=Newton.2005-11-05_21.00.00 > 05-Nov 21:00 maculele-dir: Newton.2005-11-05_21.00.00 Fatal error: Unable > to authenticate with File daemon. Possible causes: > Passwords or names not the same or > Maximum Concurrent Jobs exceeded on the FD or > FD networking messed up (restart daemon). > Please see http://www.bacula.org/html-manual/faq.html#AuthorizationErrors > for help. > ------------------------------------------------------------------------ > > Well, this time I tried to collect some more information. > > 1) I compared the versions of almost all packages running on my servers > and I didn't find anything different. > > 2) I installed the director on the server the problem was occurring, and > then I realized that the problem was related with any procedure of > connecting and authenticating with bacula services. I couldn't even run > the console, cause it couldn't connect to the director. > > 3) I tried to connect through telnet but it didn't work. In fact, all > bacula ports are open but as soon as the connection is established it is > reset by the remote host: > > ----------------------------------------------------------------- > [EMAIL PROTECTED]:~$ telnet newton 9101 > Trying 192.168.0.8... > Connected to newton.dcc.ufba.br. > Escape character is '^]'. > Connection closed by foreign host. > [EMAIL PROTECTED]:~$ > ------------------------------------------------------------------ > > But it is different from trying to connect on some closed port: > > --------------------------------------------------------------------------- >---- [EMAIL PROTECTED]:~$ telnet newton 9100 > Trying 192.168.0.8... > telnet: Unable to connect to remote host: Connection refused > [EMAIL PROTECTED]:~$ > --------------------------------------------------------------------------- >------ > > 4) Last attempt: I tried to run strace to monitor the system calls at > both systems and try to figure out the differences. > > I run the bconsole program on maculele (this server works fine), just to > see the console trying to connect to the director, and got the following > result (I'm showing only the final lines): > > --------------------------------------------------------------------------- >--------------------------------------- clone(child_stack=0x40ab1b48, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSE >M|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, > parent_tidptr=0x40ab1bf8, {entry_number:6, base_addr:0x40ab1bb0, > limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, > limit_in_pages:1, seg_not_present:0, useable:1}, > child_tidptr=0x40ab1bf8) = 12726 > futex(0x806bb50, FUTEX_WAKE, 1) = 1 > time(NULL) = 1131378562 > write(3, "\0\0\0\32", 4) = 4 > write(3, "Hello *UserAgent* calling\n", 26) = 26 > read(3, "\0\0\0009", 4) = 4 > read(3, "auth cram-md5 <1703362652.113137"..., 57) = 57 > write(3, "\0\0\0\27", 4) = 4 > write(3, "P4+EI119mW+PsS85V//BuC\0", 23) = 23 > select(4, [3], NULL, NULL, {180, 0}) = 1 (in [3], left {179, 960000}) > read(3, "\0\0\0\r", 4) = 4 > read(3, "1000 OK auth\n", 13) = 13 > gettimeofday({1131378562, 978543}, {120, 0}) = 0 > gettimeofday({1131378562, 978705}, {120, 0}) = 0 > gettimeofday({1131378562, 978863}, {120, 0}) = 0 > gettimeofday({1131378562, 979020}, {120, 0}) = 0 > gettimeofday({1131378562, 979156}, {120, 0}) = 0 > uname({sys="Linux", node="maculele", ...}) = 0 > time(NULL) = 1131378562 > write(3, "\0\0\0005", 4) = 4 > write(3, "auth cram-md5 <1032677570.113137"..., 53) = 53 > select(4, [3], NULL, NULL, {180, 0}) = 1 (in [3], left {179, 960000}) > read(3, "\0\0\0\27", 4) = 4 > read(3, "/61y86MgOH474FsNak+t2D\0", 23) = 23 > write(3, "\0\0\0\r", 4) = 4 > write(3, "1000 OK auth\n", 13) = 13 > read(3, "\0\0\0009", 4) = 4 > read(3, "1000 OK: maculele-dir Version: 1"..., 57) = 57 > time(NULL) = 1131378563 > futex(0x806bb50, FUTEX_WAKE, 1) = 1 > futex(0x806bb40, FUTEX_WAKE, 1) = 1 > futex(0x808ec84, FUTEX_WAKE, 1) = 1 > write(1, "1000 OK: maculele-dir Version: 1"..., 571000 OK: maculele-dir > Version: 1.36.2 (28 February 2005) > ) = 57 > write(1, "Enter a period to cancel a comma"..., 36Enter a period to > cancel a command. > ) = 36 > open("/root/.bconsolerc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such > file or directory) > ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo > ...}) = 0 > write(1, "*", 1*) = 1 > select(1, [0], NULL, NULL, {30, 0} > --------------------------------------------------------------------------- >------------------------------------- > > Then I did the same with newton (the server with the problem): > > --------------------------------------------------------------------------- >------------------------------------- clone(child_stack=0x40abbb48, > flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSE >M|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, > parent_tidptr=0x40abbbf8, {entry_number:6, base_addr:0x40abbbb0, > limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, > limit_in_pages:1, seg_not_present:0, useable:1}, > child_tidptr=0x40abbbf8) = 12967 > futex(0x806bb50, FUTEX_WAKE, 1) = 1 > time(NULL) = 1131383077 > write(3, "\0\0\0\32", 4) = 4 > write(3, "Hello *UserAgent* calling\n", 26) = -1 EPIPE (Broken pipe) > --- SIGPIPE (Broken pipe) @ 0 (0) --- > time(NULL) = 1131383077 > open("/etc/localtime", O_RDONLY) = 4 > fstat64(4, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0 > mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x40abc000 > read(4, > "TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\3\0"..., > 131072) = 286 > close(4) = 0 > munmap(0x40abc000, 131072) = 0 > time([1131383077]) = 1131383077 > rt_sigaction(SIGPIPE, {0x40232a70, [], 0}, {SIG_IGN}, 8) = 0 > socket(PF_FILE, SOCK_DGRAM, 0) = 4 > fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 > connect(4, {sa_family=AF_FILE, path="/dev/log"}, 16) = 0 > send(4, "<27>Nov 7 14:04:37 bacula-conso"..., 142, 0) = 142 > rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0 > write(1, "07-Nov 14:04 bconsole: Error: b"..., 11907-Nov 14:04 > bconsole: Error: bnet.c:406 Write error sending 26 bytes to Director > daemon:newton:9101: ERR=Broken pipe > ) = 119 > nanosleep({5, 0}, NULL) = 0 > time(NULL) = 1131383082 > futex(0x806bb50, FUTEX_WAKE, 1) = 1 > futex(0x806bb40, FUTEX_WAKE, 1) = 1 > futex(0x808ec84, FUTEX_WAKE, 1) = 1 > write(1, "Director authorization problem.\n"..., 156Director > authorization problem. > Most likely the passwords do not agree. > Please see http://www.bacula.org/html-manual/faq.html#AuthorizationErrors > for help. > ) = 156 > write(2, "ERR=", 4ERR=) = 4 > ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo > ...}) = 0 ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon > echo ...}) = 0 > ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo > ...}) = 0 > munmap(0x402a8000, 4096) = 0 > exit_group(1) > --------------------------------------------------------------------------- >------------------------------------- > > I think the problem is here: > When works: "write(3, "Hello *UserAgent* calling\n", 26) = 26" > When doen't work: "write(3, "Hello *UserAgent* calling\n", 26) = -1 > EPIPE (Broken pipe)" > > And when the problem occurs, the password information isn't even sent! > I think the authentication happens at this line (in the first example): > "read(3, "auth cram-md5 <1703362652.113137"..., 57) = 57" > I can't see any thing like that on the second example. > > There is any log that would be useful to look after the cause of this > problem? > Is it any problem with libraries that I don't even know the names? > Please, help me.... > > Tássia. > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. > Download it for free - -and be entered to win a 42" plasma tv or your very > own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users -- Best regards, Kern ("> /\ V_V ------------------------------------------------------- SF.Net email is sponsored by: Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users