Attila Fülöp wrote:
> Kern Sibbald wrote:
>   
>> On Monday 30 July 2007 17:29, Ryan Novosielski wrote:
>>     
>>> Kern Sibbald wrote:
>>>       
>>>> On Sunday 29 July 2007 19:28, Ryan Novosielski wrote:
>>>>         
>>>>> Hi all,
>>>>>
>>>>> Ever since I added the TapeAlert/smartmonctl command to my tape drive,
>>>>> it appears as if I get a fairly regular crash of that bacula-sd. I know
>>>>> there is a case where Bacula and the utility can go for the tape drive
>>>>> at the same time and cause problems, but I don't think Bacula should go
>>>>> KABOOM when this happens.
>>>>>           
>>>> The traceback, unfortunately, doesn't demangle the C++ subroutine names 
>>>>         
>> nor
>>     
>>>> provide source line numbers, but the best that I can tell is that the heap
>>>> has been corrupted, Bacula detects is, then does a Kaboom (self inflicted 
>>>>         
>> seg
>>     
>>>> fault).
>>>>         
>>> I'm not too good with development tools -- is there a reason this would
>>> be that is on my end (stripped binaries or something like that)?
>>>       
>> Yes, if they are stripped, that would at least explain the lack of line 
>> numbers and possibly the fact that the names were not demangled.
>>
>>     
>>>> Are you by any chance pointing the tapealert/smartmonctl at the tape drive
>>>> device rather than at the scsi control device?  If you are, I am not
>>>> surprised, and you should remove it as two different programs cannot 
>>>>         
>> properly
>>     
>>>> exist using the same tape device.
>>>>         
>>> Yes. 
>>>       
>> Well, that is almost surely the cause of the problem.
>>
>>     
>>> And I actually will remove that, but near as I can tell, Solaris 
>>> does not have a way of addressing the tape drive as two different
>>> devices. 
>>>       
>> Well, Solaris has 20 ways to do everything connected with devices.  In any 
>> case, it is not a question of addressing the tape drive as two different 
>> devices.  One addresses the tape drive through the standard tape driver, 
>> which is the /dev/rmt/xxxx stuff.
>>
>> The other is a SCSI pass through driver that allow SCSI commands to go 
>> directly to the SCSI controller, and if I remember right they are addressed 
>> with /dev/sg/xxxx.  That you will need to find out from someone else on the 
>> list or from your manuals -- I no longer have a Solaris.
>>
>>     
>>> From what I've read, the reason for this is that Solaris 
>>> supposedly has the ability to do two actions on the device at once --
>>> I'm not sure where I read that though in order to confirm it. I went
>>> looking for information about using the 'sgen' Solaris driver in order
>>> to instead use the control interface, but it appears as if sgen is only
>>> used to pick up devices that don't already have a type elsewhere; in
>>> other words, I could stop using 'st' and start using 'sgen', but that
>>> really wouldn't get me anywhere. Perhaps someone else will read about
>>> this and give me a pointer.
>>>       
>> I'll leave this to others.
>>     
>
> There is no such thing in Solaris. The st driver supports
> "scsi pass through" via the uscsi(7I) interface, and mtx (and
> therefore tapeinfo) does indeed use this interface on solaris.
>
> from mtx sources:
>
> mtx.h:/* the 'uscsi' interface, as used on Solaris: */
> mtx.h:#include <sys/scsi/impl/uscsi.h>
>
>
> man uscsi:
>
> Ioctl Requests                                          uscsi(7I)
>
> NAME
>       uscsi - user SCSI command interface
>
> SYNOPSIS
>       #include <sys/scsi/impl/uscsi.h>
>
>       ioctl(int fildes, int request, struct uscsi_cmd *cmd);
>
> DESCRIPTION
>       The uscsi command is very powerful and  somewhat  dangerous;
>       therefore  it has some permission restrictions. See WARNINGS
>       for more details.
>
>       Drivers supporting this ioctl(2) provide a general interface
>       allowing  user-level  applications  to cause individual SCSI
>       commands to be directed to a particular SCSI or ATAPI device
>       under control of that driver. The uscsi command is supported
>       by the sd driver for SCSI disks and ATAPI CD-ROM drives, and
>       by  the  st  driver  for SCSI tape drives. uscsi may also be
>       supported by other device drivers; see the  specific  device
>       driver manual page for complete information.
>
>       ........
>
> The uscsi(7I) interface is only accessible by root. The sgen(7D)
> driver exports the uscsi(7I) interface to user (non root)
> processes. But the sgen(7D) man page clearly states:
>
>     In general, the  uscsi(7I)  interface  exported  by  sd(7D)  or
>        st(7D)  should  be  used to gain access to direct access and
>        sequential devices.
>
>
> Sorry, but I can't say more since I have no solaris box with an
> attached tape right now. So I can't test any interaction between
> bacula and tapeinfo.
>
>
>   
>>>> If you are pointing it at the scsi control device, I would be interested 
>>>>         
>> to
>>     
>>>> see what the normal output of the command gives back as there may be a
>>>> possible buffer overrun though that really should not happen.
>>>>
>>>> In any case, I recommend that you remove the tape alert for a time and see 
>>>>         
>> if
>>     
>>>> that eliminates the problem.
>>>>         
>>> I suspect it will, as it only showed up when I added it, near as I can
>>> tell. A KABOOM seems like something that ought not happen either way,
>>> though, although I suppose if something is corrupting buffers, it can't
>>> be avoided. Curious, though, as the tapealert often returns "Device
>>> busy" which would seem to mean that there's no change that the other
>>> thing using the device would actually have an error.
>>>       
>> Well, when Bacula calls the tapealert command, it releases the drive, so it 
>> doesn't get a busy. 
>>
>> Many OSes such as Linux and Solaris permit addressing the SCSI controller 
>> directly through the normal tape driver, but this is a very bad idea (it is 
>> apparently what you are doing), and from everything users have said, it 
>> causes lots of problems such as resetting the SCSI controller.  If you do 
>> that, all bets are off concerning Bacula correctly interfacing through the 
>> normal tape driver.  
>> If you would like to track it down, I think it would be good to eliminate 
>> the 
>> KABOOM if it is possible (it may well not be possible).  However, you *are* 
>> apparently doing something very non-standard, and that is where things go 
>> wrong.
>>
>>     
>>>>> This does not happen every day, but every once in awhile... it occurs at
>>>>> the end of a set of concurrent backups to tape -- all incrementals, 7 in
>>>>> total. By the time my catalog backup runs 2 hours later, the -sd has
>>>>> died and there is no connection made.
>>>>>
>>>>> The host machine is running Solaris 9, and the binaries are from
>>>>> BlastWave (currently version 2.0.3 with 2.0.2 clients, but until the day
>>>>> before yesterday, the admin/server machine was running 2.0.2 with
>>>>> identical results). I have not tried 2.1.x, but I would not be allowed
>>>>> to run a production schedule on a beta -- perhaps an exact copy on the
>>>>> same machine but writing to disk might yield the same results, but I
>>>>> suspect that this is caused by the TapeAlert, so maybe not.
>>>>>           
>>>> For a problem with tape alert, it is very unlikely that upgrading to 2.1.x
>>>> will help.
>>>>
>>>>         
>>>>> Thanks for any insights you can provide -- I'd be happy to report a bug
>>>>> if it is needed.
>>>>>           
>>>> Until I see your response and think about it, I don't think this is worth 
>>>>         
>> a
>>     
>>>> bug report, at least not just yet.
>>>>         
>>> OK, that is fine. If there's any easy way to try to get more information
>>> out of this thing, let me know. I actually had a fair amount of trouble
>>> getting this much in the first place -- if you run your bacula-dir as a
>>> non-root user as one really should, it then cannot run proper traces
>>> against daemons that run as root. I had to involve sudo; originally, I
>>> had no idea that this even ran by itself until I saw a number of empty
>>> traceback e-mails in root's box.
>>>
>>>       
>> Yes, well, system security measure often make debugging more difficult.  
>> Unfortunately, there is not much I can do about that except to say either 
>> you 
>> need to understand the finer points of your OSes systems security or run as 
>> root when attempting to debug these kinds of problems (I *never* debug as 
>> root here though).
>>
>> My recommendation is to stop using tapealert until you figure out what the 
>> pass through SCSI driver is, then you could try using it.  If you can figure 
>> out how to run the debugger manually and dig deeper into the problem, that 
>> would be interesting too, but it is likely to take a lot of time ...
>>
>> Regards,
>>
>> Kern


Sorry, I'm tapping into this message late. Wasn't paying much attention 
until my eyes caught Solaris, st, sgen, etc. Now I'm not clear on the 
original setup -- seems it's coming from Ryan?

What kind of tape drive/library do you have and how have you configured it?

I have a tape library that I have configured on Solaris 9 using the 
st.conf for the tape drive and sgen.conf for the library functions. That 
gave me /dev/rmt/1 for the tape drive (I already had /dev/rmt/0 for the 
built in dds/3), and /dev/scsi/changer/c7t0d0 for the library. From 
there, I got mt and mtx working, and then plugging into backup software 
was elementary. I don't know if I can be of specific help, but maybe 
some of the details are transferable even if we have different tape 
hardware.

I think another spawn of this thread came up with a specific point in 
the code that could be a bug, so I'm not even sure if help is still 
needed on the tape library configuration. Let me know.



---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst 

<[EMAIL PROTECTED]>

--------------- 

Erdös 4







-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to