Kern Sibbald wrote: > On Monday 30 July 2007 17:29, Ryan Novosielski wrote: >> Kern Sibbald wrote: >>> On Sunday 29 July 2007 19:28, Ryan Novosielski wrote: >>>> Hi all, >>>> >>>> Ever since I added the TapeAlert/smartmonctl command to my tape drive, >>>> it appears as if I get a fairly regular crash of that bacula-sd. I know >>>> there is a case where Bacula and the utility can go for the tape drive >>>> at the same time and cause problems, but I don't think Bacula should go >>>> KABOOM when this happens. >>> The traceback, unfortunately, doesn't demangle the C++ subroutine names > nor >>> provide source line numbers, but the best that I can tell is that the heap >>> has been corrupted, Bacula detects is, then does a Kaboom (self inflicted > seg >>> fault). >> I'm not too good with development tools -- is there a reason this would >> be that is on my end (stripped binaries or something like that)? > > Yes, if they are stripped, that would at least explain the lack of line > numbers and possibly the fact that the names were not demangled. > >>> Are you by any chance pointing the tapealert/smartmonctl at the tape drive >>> device rather than at the scsi control device? If you are, I am not >>> surprised, and you should remove it as two different programs cannot > properly >>> exist using the same tape device. >> Yes. > > Well, that is almost surely the cause of the problem. > >> And I actually will remove that, but near as I can tell, Solaris >> does not have a way of addressing the tape drive as two different >> devices. > > Well, Solaris has 20 ways to do everything connected with devices. In any > case, it is not a question of addressing the tape drive as two different > devices. One addresses the tape drive through the standard tape driver, > which is the /dev/rmt/xxxx stuff. > > The other is a SCSI pass through driver that allow SCSI commands to go > directly to the SCSI controller, and if I remember right they are addressed > with /dev/sg/xxxx. That you will need to find out from someone else on the > list or from your manuals -- I no longer have a Solaris. > >> From what I've read, the reason for this is that Solaris >> supposedly has the ability to do two actions on the device at once -- >> I'm not sure where I read that though in order to confirm it. I went >> looking for information about using the 'sgen' Solaris driver in order >> to instead use the control interface, but it appears as if sgen is only >> used to pick up devices that don't already have a type elsewhere; in >> other words, I could stop using 'st' and start using 'sgen', but that >> really wouldn't get me anywhere. Perhaps someone else will read about >> this and give me a pointer. > > I'll leave this to others.
There is no such thing in Solaris. The st driver supports "scsi pass through" via the uscsi(7I) interface, and mtx (and therefore tapeinfo) does indeed use this interface on solaris. from mtx sources: mtx.h:/* the 'uscsi' interface, as used on Solaris: */ mtx.h:#include <sys/scsi/impl/uscsi.h> man uscsi: Ioctl Requests uscsi(7I) NAME uscsi - user SCSI command interface SYNOPSIS #include <sys/scsi/impl/uscsi.h> ioctl(int fildes, int request, struct uscsi_cmd *cmd); DESCRIPTION The uscsi command is very powerful and somewhat dangerous; therefore it has some permission restrictions. See WARNINGS for more details. Drivers supporting this ioctl(2) provide a general interface allowing user-level applications to cause individual SCSI commands to be directed to a particular SCSI or ATAPI device under control of that driver. The uscsi command is supported by the sd driver for SCSI disks and ATAPI CD-ROM drives, and by the st driver for SCSI tape drives. uscsi may also be supported by other device drivers; see the specific device driver manual page for complete information. ........ The uscsi(7I) interface is only accessible by root. The sgen(7D) driver exports the uscsi(7I) interface to user (non root) processes. But the sgen(7D) man page clearly states: In general, the uscsi(7I) interface exported by sd(7D) or st(7D) should be used to gain access to direct access and sequential devices. Sorry, but I can't say more since I have no solaris box with an attached tape right now. So I can't test any interaction between bacula and tapeinfo. >>> If you are pointing it at the scsi control device, I would be interested > to >>> see what the normal output of the command gives back as there may be a >>> possible buffer overrun though that really should not happen. >>> >>> In any case, I recommend that you remove the tape alert for a time and see > if >>> that eliminates the problem. >> I suspect it will, as it only showed up when I added it, near as I can >> tell. A KABOOM seems like something that ought not happen either way, >> though, although I suppose if something is corrupting buffers, it can't >> be avoided. Curious, though, as the tapealert often returns "Device >> busy" which would seem to mean that there's no change that the other >> thing using the device would actually have an error. > > Well, when Bacula calls the tapealert command, it releases the drive, so it > doesn't get a busy. > > Many OSes such as Linux and Solaris permit addressing the SCSI controller > directly through the normal tape driver, but this is a very bad idea (it is > apparently what you are doing), and from everything users have said, it > causes lots of problems such as resetting the SCSI controller. If you do > that, all bets are off concerning Bacula correctly interfacing through the > normal tape driver. > If you would like to track it down, I think it would be good to eliminate the > KABOOM if it is possible (it may well not be possible). However, you *are* > apparently doing something very non-standard, and that is where things go > wrong. > >>>> This does not happen every day, but every once in awhile... it occurs at >>>> the end of a set of concurrent backups to tape -- all incrementals, 7 in >>>> total. By the time my catalog backup runs 2 hours later, the -sd has >>>> died and there is no connection made. >>>> >>>> The host machine is running Solaris 9, and the binaries are from >>>> BlastWave (currently version 2.0.3 with 2.0.2 clients, but until the day >>>> before yesterday, the admin/server machine was running 2.0.2 with >>>> identical results). I have not tried 2.1.x, but I would not be allowed >>>> to run a production schedule on a beta -- perhaps an exact copy on the >>>> same machine but writing to disk might yield the same results, but I >>>> suspect that this is caused by the TapeAlert, so maybe not. >>> For a problem with tape alert, it is very unlikely that upgrading to 2.1.x >>> will help. >>> >>>> Thanks for any insights you can provide -- I'd be happy to report a bug >>>> if it is needed. >>> Until I see your response and think about it, I don't think this is worth > a >>> bug report, at least not just yet. >> OK, that is fine. If there's any easy way to try to get more information >> out of this thing, let me know. I actually had a fair amount of trouble >> getting this much in the first place -- if you run your bacula-dir as a >> non-root user as one really should, it then cannot run proper traces >> against daemons that run as root. I had to involve sudo; originally, I >> had no idea that this even ran by itself until I saw a number of empty >> traceback e-mails in root's box. >> > > Yes, well, system security measure often make debugging more difficult. > Unfortunately, there is not much I can do about that except to say either you > need to understand the finer points of your OSes systems security or run as > root when attempting to debug these kinds of problems (I *never* debug as > root here though). > > My recommendation is to stop using tapealert until you figure out what the > pass through SCSI driver is, then you could try using it. If you can figure > out how to run the debugger manually and dig deeper into the problem, that > would be interesting too, but it is likely to take a lot of time ... > > Regards, > > Kern > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users