Attila Fülöp wrote: > Kern Sibbald wrote: > >> On Monday 30 July 2007 17:29, Ryan Novosielski wrote: >> >>> Kern Sibbald wrote: >>> >>>> On Sunday 29 July 2007 19:28, Ryan Novosielski wrote: >>>> >>>>> Hi all, >>>>> >>>>> Ever since I added the TapeAlert/smartmonctl command to my tape drive, >>>>> it appears as if I get a fairly regular crash of that bacula-sd. I know >>>>> there is a case where Bacula and the utility can go for the tape drive >>>>> at the same time and cause problems, but I don't think Bacula should go >>>>> KABOOM when this happens. >>>>> >>>> The traceback, unfortunately, doesn't demangle the C++ subroutine names >>>> >> nor >> >>>> provide source line numbers, but the best that I can tell is that the heap >>>> has been corrupted, Bacula detects is, then does a Kaboom (self inflicted >>>> >> seg >> >>>> fault). >>>> >>> I'm not too good with development tools -- is there a reason this would >>> be that is on my end (stripped binaries or something like that)? >>> >> Yes, if they are stripped, that would at least explain the lack of line >> numbers and possibly the fact that the names were not demangled. >> >> >>>> Are you by any chance pointing the tapealert/smartmonctl at the tape drive >>>> device rather than at the scsi control device? If you are, I am not >>>> surprised, and you should remove it as two different programs cannot >>>> >> properly >> >>>> exist using the same tape device. >>>> >>> Yes. >>> >> Well, that is almost surely the cause of the problem. >> >> >>> And I actually will remove that, but near as I can tell, Solaris >>> does not have a way of addressing the tape drive as two different >>> devices. >>> >> Well, Solaris has 20 ways to do everything connected with devices. In any >> case, it is not a question of addressing the tape drive as two different >> devices. One addresses the tape drive through the standard tape driver, >> which is the /dev/rmt/xxxx stuff. >> >> The other is a SCSI pass through driver that allow SCSI commands to go >> directly to the SCSI controller, and if I remember right they are addressed >> with /dev/sg/xxxx. That you will need to find out from someone else on the >> list or from your manuals -- I no longer have a Solaris. >> >> >>> From what I've read, the reason for this is that Solaris >>> supposedly has the ability to do two actions on the device at once -- >>> I'm not sure where I read that though in order to confirm it. I went >>> looking for information about using the 'sgen' Solaris driver in order >>> to instead use the control interface, but it appears as if sgen is only >>> used to pick up devices that don't already have a type elsewhere; in >>> other words, I could stop using 'st' and start using 'sgen', but that >>> really wouldn't get me anywhere. Perhaps someone else will read about >>> this and give me a pointer. >>> >> I'll leave this to others. >> > > There is no such thing in Solaris. The st driver supports > "scsi pass through" via the uscsi(7I) interface, and mtx (and > therefore tapeinfo) does indeed use this interface on solaris. > > from mtx sources: > > mtx.h:/* the 'uscsi' interface, as used on Solaris: */ > mtx.h:#include <sys/scsi/impl/uscsi.h> > > > man uscsi: > > Ioctl Requests uscsi(7I) > > NAME > uscsi - user SCSI command interface > > SYNOPSIS > #include <sys/scsi/impl/uscsi.h> > > ioctl(int fildes, int request, struct uscsi_cmd *cmd); > > DESCRIPTION > The uscsi command is very powerful and somewhat dangerous; > therefore it has some permission restrictions. See WARNINGS > for more details. > > Drivers supporting this ioctl(2) provide a general interface > allowing user-level applications to cause individual SCSI > commands to be directed to a particular SCSI or ATAPI device > under control of that driver. The uscsi command is supported > by the sd driver for SCSI disks and ATAPI CD-ROM drives, and > by the st driver for SCSI tape drives. uscsi may also be > supported by other device drivers; see the specific device > driver manual page for complete information. > > ........ > > The uscsi(7I) interface is only accessible by root. The sgen(7D) > driver exports the uscsi(7I) interface to user (non root) > processes. But the sgen(7D) man page clearly states: > > In general, the uscsi(7I) interface exported by sd(7D) or > st(7D) should be used to gain access to direct access and > sequential devices. > > > Sorry, but I can't say more since I have no solaris box with an > attached tape right now. So I can't test any interaction between > bacula and tapeinfo. > > > >>>> If you are pointing it at the scsi control device, I would be interested >>>> >> to >> >>>> see what the normal output of the command gives back as there may be a >>>> possible buffer overrun though that really should not happen. >>>> >>>> In any case, I recommend that you remove the tape alert for a time and see >>>> >> if >> >>>> that eliminates the problem. >>>> >>> I suspect it will, as it only showed up when I added it, near as I can >>> tell. A KABOOM seems like something that ought not happen either way, >>> though, although I suppose if something is corrupting buffers, it can't >>> be avoided. Curious, though, as the tapealert often returns "Device >>> busy" which would seem to mean that there's no change that the other >>> thing using the device would actually have an error. >>> >> Well, when Bacula calls the tapealert command, it releases the drive, so it >> doesn't get a busy. >> >> Many OSes such as Linux and Solaris permit addressing the SCSI controller >> directly through the normal tape driver, but this is a very bad idea (it is >> apparently what you are doing), and from everything users have said, it >> causes lots of problems such as resetting the SCSI controller. If you do >> that, all bets are off concerning Bacula correctly interfacing through the >> normal tape driver. >> If you would like to track it down, I think it would be good to eliminate >> the >> KABOOM if it is possible (it may well not be possible). However, you *are* >> apparently doing something very non-standard, and that is where things go >> wrong. >> >> >>>>> This does not happen every day, but every once in awhile... it occurs at >>>>> the end of a set of concurrent backups to tape -- all incrementals, 7 in >>>>> total. By the time my catalog backup runs 2 hours later, the -sd has >>>>> died and there is no connection made. >>>>> >>>>> The host machine is running Solaris 9, and the binaries are from >>>>> BlastWave (currently version 2.0.3 with 2.0.2 clients, but until the day >>>>> before yesterday, the admin/server machine was running 2.0.2 with >>>>> identical results). I have not tried 2.1.x, but I would not be allowed >>>>> to run a production schedule on a beta -- perhaps an exact copy on the >>>>> same machine but writing to disk might yield the same results, but I >>>>> suspect that this is caused by the TapeAlert, so maybe not. >>>>> >>>> For a problem with tape alert, it is very unlikely that upgrading to 2.1.x >>>> will help. >>>> >>>> >>>>> Thanks for any insights you can provide -- I'd be happy to report a bug >>>>> if it is needed. >>>>> >>>> Until I see your response and think about it, I don't think this is worth >>>> >> a >> >>>> bug report, at least not just yet. >>>> >>> OK, that is fine. If there's any easy way to try to get more information >>> out of this thing, let me know. I actually had a fair amount of trouble >>> getting this much in the first place -- if you run your bacula-dir as a >>> non-root user as one really should, it then cannot run proper traces >>> against daemons that run as root. I had to involve sudo; originally, I >>> had no idea that this even ran by itself until I saw a number of empty >>> traceback e-mails in root's box. >>> >>> >> Yes, well, system security measure often make debugging more difficult. >> Unfortunately, there is not much I can do about that except to say either >> you >> need to understand the finer points of your OSes systems security or run as >> root when attempting to debug these kinds of problems (I *never* debug as >> root here though). >> >> My recommendation is to stop using tapealert until you figure out what the >> pass through SCSI driver is, then you could try using it. If you can figure >> out how to run the debugger manually and dig deeper into the problem, that >> would be interesting too, but it is likely to take a lot of time ... >> >> Regards, >> >> Kern
Sorry, I'm tapping into this message late. Wasn't paying much attention until my eyes caught Solaris, st, sgen, etc. Now I'm not clear on the original setup -- seems it's coming from Ryan? What kind of tape drive/library do you have and how have you configured it? I have a tape library that I have configured on Solaris 9 using the st.conf for the tape drive and sgen.conf for the library functions. That gave me /dev/rmt/1 for the tape drive (I already had /dev/rmt/0 for the built in dds/3), and /dev/scsi/changer/c7t0d0 for the library. From there, I got mt and mtx working, and then plugging into backup software was elementary. I don't know if I can be of specific help, but maybe some of the details are transferable even if we have different tape hardware. I think another spawn of this thread came up with a specific point in the code that could be a bug, so I'm not even sure if help is still needed on the tape library configuration. Let me know. --------------- Chris Hoogendyk - O__ ---- Systems Administrator c/ /'_ --- Biology & Geology Departments (*) \(*) -- 140 Morrill Science Center ~~~~~~~~~~ - University of Massachusetts, Amherst <[EMAIL PROTECTED]> --------------- Erdös 4 ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users