-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chris Hoogendyk wrote: > Attila Fülöp wrote: >> Kern Sibbald wrote: >> >>> On Monday 30 July 2007 17:29, Ryan Novosielski wrote: >>> >>>> Kern Sibbald wrote: >>>> >>>>> On Sunday 29 July 2007 19:28, Ryan Novosielski wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Ever since I added the TapeAlert/smartmonctl command to my tape >>>>>> drive, >>>>>> it appears as if I get a fairly regular crash of that bacula-sd. I >>>>>> know >>>>>> there is a case where Bacula and the utility can go for the tape >>>>>> drive >>>>>> at the same time and cause problems, but I don't think Bacula >>>>>> should go >>>>>> KABOOM when this happens. >>>>>> >>>>> The traceback, unfortunately, doesn't demangle the C++ subroutine >>>>> names >>> nor >>> >>>>> provide source line numbers, but the best that I can tell is that >>>>> the heap >>>>> has been corrupted, Bacula detects is, then does a Kaboom (self >>>>> inflicted >>> seg >>> >>>>> fault). >>>>> >>>> I'm not too good with development tools -- is there a reason this would >>>> be that is on my end (stripped binaries or something like that)? >>>> >>> Yes, if they are stripped, that would at least explain the lack of >>> line numbers and possibly the fact that the names were not demangled. >>> >>> >>>>> Are you by any chance pointing the tapealert/smartmonctl at the >>>>> tape drive >>>>> device rather than at the scsi control device? If you are, I am not >>>>> surprised, and you should remove it as two different programs >>>>> cannot >>> properly >>> >>>>> exist using the same tape device. >>>>> >>>> Yes. >>> Well, that is almost surely the cause of the problem. >>> >>> >>>> And I actually will remove that, but near as I can tell, Solaris >>>> does not have a way of addressing the tape drive as two different >>>> devices. >>> Well, Solaris has 20 ways to do everything connected with devices. >>> In any case, it is not a question of addressing the tape drive as two >>> different devices. One addresses the tape drive through the standard >>> tape driver, which is the /dev/rmt/xxxx stuff. >>> >>> The other is a SCSI pass through driver that allow SCSI commands to >>> go directly to the SCSI controller, and if I remember right they are >>> addressed with /dev/sg/xxxx. That you will need to find out from >>> someone else on the list or from your manuals -- I no longer have a >>> Solaris. >>> >>> >>>> From what I've read, the reason for this is that Solaris supposedly >>>> has the ability to do two actions on the device at once -- >>>> I'm not sure where I read that though in order to confirm it. I went >>>> looking for information about using the 'sgen' Solaris driver in order >>>> to instead use the control interface, but it appears as if sgen is only >>>> used to pick up devices that don't already have a type elsewhere; in >>>> other words, I could stop using 'st' and start using 'sgen', but that >>>> really wouldn't get me anywhere. Perhaps someone else will read about >>>> this and give me a pointer. >>>> >>> I'll leave this to others. >>> >> >> There is no such thing in Solaris. The st driver supports >> "scsi pass through" via the uscsi(7I) interface, and mtx (and >> therefore tapeinfo) does indeed use this interface on solaris. >> >> from mtx sources: >> >> mtx.h:/* the 'uscsi' interface, as used on Solaris: */ >> mtx.h:#include <sys/scsi/impl/uscsi.h> >> >> >> man uscsi: >> >> Ioctl Requests uscsi(7I) >> >> NAME >> uscsi - user SCSI command interface >> >> SYNOPSIS >> #include <sys/scsi/impl/uscsi.h> >> >> ioctl(int fildes, int request, struct uscsi_cmd *cmd); >> >> DESCRIPTION >> The uscsi command is very powerful and somewhat dangerous; >> therefore it has some permission restrictions. See WARNINGS >> for more details. >> >> Drivers supporting this ioctl(2) provide a general interface >> allowing user-level applications to cause individual SCSI >> commands to be directed to a particular SCSI or ATAPI device >> under control of that driver. The uscsi command is supported >> by the sd driver for SCSI disks and ATAPI CD-ROM drives, and >> by the st driver for SCSI tape drives. uscsi may also be >> supported by other device drivers; see the specific device >> driver manual page for complete information. >> >> ........ >> >> The uscsi(7I) interface is only accessible by root. The sgen(7D) >> driver exports the uscsi(7I) interface to user (non root) >> processes. But the sgen(7D) man page clearly states: >> >> In general, the uscsi(7I) interface exported by sd(7D) or >> st(7D) should be used to gain access to direct access and >> sequential devices. >> >> >> Sorry, but I can't say more since I have no solaris box with an >> attached tape right now. So I can't test any interaction between >> bacula and tapeinfo. >> >> >> >>>>> If you are pointing it at the scsi control device, I would be >>>>> interested >>> to >>> >>>>> see what the normal output of the command gives back as there may be a >>>>> possible buffer overrun though that really should not happen. >>>>> >>>>> In any case, I recommend that you remove the tape alert for a time >>>>> and see >>> if >>> >>>>> that eliminates the problem. >>>>> >>>> I suspect it will, as it only showed up when I added it, near as I can >>>> tell. A KABOOM seems like something that ought not happen either way, >>>> though, although I suppose if something is corrupting buffers, it can't >>>> be avoided. Curious, though, as the tapealert often returns "Device >>>> busy" which would seem to mean that there's no change that the other >>>> thing using the device would actually have an error. >>>> >>> Well, when Bacula calls the tapealert command, it releases the drive, >>> so it doesn't get a busy. >>> Many OSes such as Linux and Solaris permit addressing the SCSI >>> controller directly through the normal tape driver, but this is a >>> very bad idea (it is apparently what you are doing), and from >>> everything users have said, it causes lots of problems such as >>> resetting the SCSI controller. If you do that, all bets are off >>> concerning Bacula correctly interfacing through the normal tape >>> driver. If you would like to track it down, I think it would be good >>> to eliminate the KABOOM if it is possible (it may well not be >>> possible). However, you *are* apparently doing something very >>> non-standard, and that is where things go wrong. >>> >>> >>>>>> This does not happen every day, but every once in awhile... it >>>>>> occurs at >>>>>> the end of a set of concurrent backups to tape -- all >>>>>> incrementals, 7 in >>>>>> total. By the time my catalog backup runs 2 hours later, the -sd has >>>>>> died and there is no connection made. >>>>>> >>>>>> The host machine is running Solaris 9, and the binaries are from >>>>>> BlastWave (currently version 2.0.3 with 2.0.2 clients, but until >>>>>> the day >>>>>> before yesterday, the admin/server machine was running 2.0.2 with >>>>>> identical results). I have not tried 2.1.x, but I would not be >>>>>> allowed >>>>>> to run a production schedule on a beta -- perhaps an exact copy on >>>>>> the >>>>>> same machine but writing to disk might yield the same results, but I >>>>>> suspect that this is caused by the TapeAlert, so maybe not. >>>>>> >>>>> For a problem with tape alert, it is very unlikely that upgrading >>>>> to 2.1.x >>>>> will help. >>>>> >>>>> >>>>>> Thanks for any insights you can provide -- I'd be happy to report >>>>>> a bug >>>>>> if it is needed. >>>>>> >>>>> Until I see your response and think about it, I don't think this is >>>>> worth >>> a >>> >>>>> bug report, at least not just yet. >>>>> >>>> OK, that is fine. If there's any easy way to try to get more >>>> information >>>> out of this thing, let me know. I actually had a fair amount of trouble >>>> getting this much in the first place -- if you run your bacula-dir as a >>>> non-root user as one really should, it then cannot run proper traces >>>> against daemons that run as root. I had to involve sudo; originally, I >>>> had no idea that this even ran by itself until I saw a number of empty >>>> traceback e-mails in root's box. >>>> >>>> >>> Yes, well, system security measure often make debugging more >>> difficult. Unfortunately, there is not much I can do about that >>> except to say either you need to understand the finer points of your >>> OSes systems security or run as root when attempting to debug these >>> kinds of problems (I *never* debug as root here though). >>> >>> My recommendation is to stop using tapealert until you figure out >>> what the pass through SCSI driver is, then you could try using it. >>> If you can figure out how to run the debugger manually and dig deeper >>> into the problem, that would be interesting too, but it is likely to >>> take a lot of time ... >>> >>> Regards, >>> >>> Kern > > > Sorry, I'm tapping into this message late. Wasn't paying much attention > until my eyes caught Solaris, st, sgen, etc. Now I'm not clear on the > original setup -- seems it's coming from Ryan?
That's correct. > What kind of tape drive/library do you have and how have you configured it? No library, just two DDS class drives (DAT72 and DDS4). > I have a tape library that I have configured on Solaris 9 using the > st.conf for the tape drive and sgen.conf for the library functions. That > gave me /dev/rmt/1 for the tape drive (I already had /dev/rmt/0 for the > built in dds/3), and /dev/scsi/changer/c7t0d0 for the library. From > there, I got mt and mtx working, and then plugging into backup software > was elementary. I don't know if I can be of specific help, but maybe > some of the details are transferable even if we have different tape > hardware. Trouble is, Solaris only wants to see one driver for the tape drive, it appears. 'st' picks up my tape drives, and no other driver is needed. 'st' apparently utilizes uscsi to pass the direct stuff through to the tape drive. Trouble is, I get "Device busy" for all but the last job that runs (these jobs are running concurrently in a spooled situation). That by itself is unfortunate, but I guess if I only refer to the last-run job, I still get the information at least. Sometimes though, as a result of this interaction somehow, there will be an Bacula KABOOM that occurs just after the last run job. That is what the original reason for this thread was. However, I really would be able to like to check that tape drive after every job if there's some way. Do the SCSI generic drivers have the ability to read the tape while the tape is being written to in Linux, or does the same trouble exist? > I think another spawn of this thread came up with a specific point in > the code that could be a bug, so I'm not even sure if help is still > needed on the tape library configuration. Let me know. Maybe, maybe not. :) - -- ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Systems Programmer II |$&| |__| | | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGrj1jmb+gadEcsb4RAh6EAJ0Rj4F/RDh7M7yJ1aob6RhsnxNOVwCeOCEG VUPJHnvhi6dzVYhkiB5a0+Q= =4Y1C -----END PGP SIGNATURE----- ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users