-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Chris Hoogendyk wrote:
>
> Ryan Novosielski wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Chris Hoogendyk wrote:
>>
>>> Attila Fülöp wrote:
>>>
>>>> Kern Sibbald wrote:
>>>>
>>>>
>>>>> On Monday 30 July 2007 17:29, Ryan Novosielski wrote:
>>>>>
>>>>>
>>>>>> Kern Sibbald wrote:
>>>>>>
>>>>>>
>>>>>>> On Sunday 29 July 2007 19:28, Ryan Novosielski wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Ever since I added the TapeAlert/smartmonctl command to my tape
>>>>>>>> drive,
>>>>>>>> it appears as if I get a fairly regular crash of that bacula-sd. I
>>>>>>>> know
>>>>>>>> there is a case where Bacula and the utility can go for the tape
>>>>>>>> drive
>>>>>>>> at the same time and cause problems, but I don't think Bacula
>>>>>>>> should go
>>>>>>>> KABOOM when this happens.
>>>>>>>>
>>>>>>>>
>>>>>>> The traceback, unfortunately, doesn't demangle the C++ subroutine
>>>>>>> names
>>>>>>>
>>>>> nor
>>>>>
>>>>>
>>>>>>> provide source line numbers, but the best that I can tell is that
>>>>>>> the heap
>>>>>>> has been corrupted, Bacula detects is, then does a Kaboom (self
>>>>>>> inflicted
>>>>>>>
>>>>> seg
>>>>>
>>>>>
>>>>>>> fault).
>>>>>>>
>>>>>>>
>>>>>> I'm not too good with development tools -- is there a reason this would
>>>>>> be that is on my end (stripped binaries or something like that)?
>>>>>>
>>>>>>
>>>>> Yes, if they are stripped, that would at least explain the lack of
>>>>> line numbers and possibly the fact that the names were not demangled.
>>>>>
>>>>>
>>>>>
>>>>>>> Are you by any chance pointing the tapealert/smartmonctl at the
>>>>>>> tape drive
>>>>>>> device rather than at the scsi control device? If you are, I am not
>>>>>>> surprised, and you should remove it as two different programs
>>>>>>> cannot
>>>>>>>
>>>>> properly
>>>>>
>>>>>
>>>>>>> exist using the same tape device.
>>>>>>>
>>>>>>>
>>>>>> Yes.
>>>>>>
>>>>> Well, that is almost surely the cause of the problem.
>>>>>
>>>>>
>>>>>
>>>>>> And I actually will remove that, but near as I can tell, Solaris
>>>>>> does not have a way of addressing the tape drive as two different
>>>>>> devices.
>>>>>>
>>>>> Well, Solaris has 20 ways to do everything connected with devices.
>>>>> In any case, it is not a question of addressing the tape drive as two
>>>>> different devices. One addresses the tape drive through the standard
>>>>> tape driver, which is the /dev/rmt/xxxx stuff.
>>>>>
>>>>> The other is a SCSI pass through driver that allow SCSI commands to
>>>>> go directly to the SCSI controller, and if I remember right they are
>>>>> addressed with /dev/sg/xxxx. That you will need to find out from
>>>>> someone else on the list or from your manuals -- I no longer have a
>>>>> Solaris.
>>>>>
>>>>>
>>>>>
>>>>>> From what I've read, the reason for this is that Solaris supposedly
>>>>>> has the ability to do two actions on the device at once --
>>>>>> I'm not sure where I read that though in order to confirm it. I went
>>>>>> looking for information about using the 'sgen' Solaris driver in order
>>>>>> to instead use the control interface, but it appears as if sgen is only
>>>>>> used to pick up devices that don't already have a type elsewhere; in
>>>>>> other words, I could stop using 'st' and start using 'sgen', but that
>>>>>> really wouldn't get me anywhere. Perhaps someone else will read about
>>>>>> this and give me a pointer.
>>>>>>
>>>>>>
>>>>> I'll leave this to others.
>>>>>
>>>>>
>>>> There is no such thing in Solaris. The st driver supports
>>>> "scsi pass through" via the uscsi(7I) interface, and mtx (and
>>>> therefore tapeinfo) does indeed use this interface on solaris.
>>>>
>>>> from mtx sources:
>>>>
>>>> mtx.h:/* the 'uscsi' interface, as used on Solaris: */
>>>> mtx.h:#include <sys/scsi/impl/uscsi.h>
>>>>
>>>>
>>>> man uscsi:
>>>>
>>>> Ioctl Requests uscsi(7I)
>>>>
>>>> NAME
>>>> uscsi - user SCSI command interface
>>>>
>>>> SYNOPSIS
>>>> #include <sys/scsi/impl/uscsi.h>
>>>>
>>>> ioctl(int fildes, int request, struct uscsi_cmd *cmd);
>>>>
>>>> DESCRIPTION
>>>> The uscsi command is very powerful and somewhat dangerous;
>>>> therefore it has some permission restrictions. See WARNINGS
>>>> for more details.
>>>>
>>>> Drivers supporting this ioctl(2) provide a general interface
>>>> allowing user-level applications to cause individual SCSI
>>>> commands to be directed to a particular SCSI or ATAPI device
>>>> under control of that driver. The uscsi command is supported
>>>> by the sd driver for SCSI disks and ATAPI CD-ROM drives, and
>>>> by the st driver for SCSI tape drives. uscsi may also be
>>>> supported by other device drivers; see the specific device
>>>> driver manual page for complete information.
>>>>
>>>> ........
>>>>
>>>> The uscsi(7I) interface is only accessible by root. The sgen(7D)
>>>> driver exports the uscsi(7I) interface to user (non root)
>>>> processes. But the sgen(7D) man page clearly states:
>>>>
>>>> In general, the uscsi(7I) interface exported by sd(7D) or
>>>> st(7D) should be used to gain access to direct access and
>>>> sequential devices.
>>>>
>>>>
>>>> Sorry, but I can't say more since I have no solaris box with an
>>>> attached tape right now. So I can't test any interaction between
>>>> bacula and tapeinfo.
>>>>
>>>>
>>>>
>>>>
>>>>>>> If you are pointing it at the scsi control device, I would be
>>>>>>> interested
>>>>>>>
>>>>> to
>>>>>
>>>>>
>>>>>>> see what the normal output of the command gives back as there may be a
>>>>>>> possible buffer overrun though that really should not happen.
>>>>>>>
>>>>>>> In any case, I recommend that you remove the tape alert for a time
>>>>>>> and see
>>>>>>>
>>>>> if
>>>>>
>>>>>
>>>>>>> that eliminates the problem.
>>>>>>>
>>>>>>>
>>>>>> I suspect it will, as it only showed up when I added it, near as I can
>>>>>> tell. A KABOOM seems like something that ought not happen either way,
>>>>>> though, although I suppose if something is corrupting buffers, it can't
>>>>>> be avoided. Curious, though, as the tapealert often returns "Device
>>>>>> busy" which would seem to mean that there's no change that the other
>>>>>> thing using the device would actually have an error.
>>>>>>
>>>>>>
>>>>> Well, when Bacula calls the tapealert command, it releases the drive,
>>>>> so it doesn't get a busy.
>>>>> Many OSes such as Linux and Solaris permit addressing the SCSI
>>>>> controller directly through the normal tape driver, but this is a
>>>>> very bad idea (it is apparently what you are doing), and from
>>>>> everything users have said, it causes lots of problems such as
>>>>> resetting the SCSI controller. If you do that, all bets are off
>>>>> concerning Bacula correctly interfacing through the normal tape
>>>>> driver. If you would like to track it down, I think it would be good
>>>>> to eliminate the KABOOM if it is possible (it may well not be
>>>>> possible). However, you *are* apparently doing something very
>>>>> non-standard, and that is where things go wrong.
>>>>>
>>>>>
>>>>>
>>>>>>>> This does not happen every day, but every once in awhile... it
>>>>>>>> occurs at
>>>>>>>> the end of a set of concurrent backups to tape -- all
>>>>>>>> incrementals, 7 in
>>>>>>>> total. By the time my catalog backup runs 2 hours later, the -sd has
>>>>>>>> died and there is no connection made.
>>>>>>>>
>>>>>>>> The host machine is running Solaris 9, and the binaries are from
>>>>>>>> BlastWave (currently version 2.0.3 with 2.0.2 clients, but until
>>>>>>>> the day
>>>>>>>> before yesterday, the admin/server machine was running 2.0.2 with
>>>>>>>> identical results). I have not tried 2.1.x, but I would not be
>>>>>>>> allowed
>>>>>>>> to run a production schedule on a beta -- perhaps an exact copy on
>>>>>>>> the
>>>>>>>> same machine but writing to disk might yield the same results, but I
>>>>>>>> suspect that this is caused by the TapeAlert, so maybe not.
>>>>>>>>
>>>>>>>>
>>>>>>> For a problem with tape alert, it is very unlikely that upgrading
>>>>>>> to 2.1.x
>>>>>>> will help.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks for any insights you can provide -- I'd be happy to report
>>>>>>>> a bug
>>>>>>>> if it is needed.
>>>>>>>>
>>>>>>>>
>>>>>>> Until I see your response and think about it, I don't think this is
>>>>>>> worth
>>>>>>>
>>>>> a
>>>>>
>>>>>
>>>>>>> bug report, at least not just yet.
>>>>>>>
>>>>>>>
>>>>>> OK, that is fine. If there's any easy way to try to get more
>>>>>> information
>>>>>> out of this thing, let me know. I actually had a fair amount of trouble
>>>>>> getting this much in the first place -- if you run your bacula-dir as a
>>>>>> non-root user as one really should, it then cannot run proper traces
>>>>>> against daemons that run as root. I had to involve sudo; originally, I
>>>>>> had no idea that this even ran by itself until I saw a number of empty
>>>>>> traceback e-mails in root's box.
>>>>>>
>>>>>>
>>>>>>
>>>>> Yes, well, system security measure often make debugging more
>>>>> difficult. Unfortunately, there is not much I can do about that
>>>>> except to say either you need to understand the finer points of your
>>>>> OSes systems security or run as root when attempting to debug these
>>>>> kinds of problems (I *never* debug as root here though).
>>>>>
>>>>> My recommendation is to stop using tapealert until you figure out
>>>>> what the pass through SCSI driver is, then you could try using it.
>>>>> If you can figure out how to run the debugger manually and dig deeper
>>>>> into the problem, that would be interesting too, but it is likely to
>>>>> take a lot of time ...
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kern
>>>>>
>>> Sorry, I'm tapping into this message late. Wasn't paying much attention
>>> until my eyes caught Solaris, st, sgen, etc. Now I'm not clear on the
>>> original setup -- seems it's coming from Ryan?
>>>
>> That's correct.
>>
>>
>>> What kind of tape drive/library do you have and how have you configured it?
>>>
>> No library, just two DDS class drives (DAT72 and DDS4).
>>
>
> hmm. Sorry if I'm being dense, but I'm puzzled as to why there would be
> a problem. These aren't exactly bleeding edge drives where you have to
> roll your own drivers (I almost had to do that with AIT5, but Sony tech
> support rescued me).
>
>>> I have a tape library that I have configured on Solaris 9 using the
>>> st.conf for the tape drive and sgen.conf for the library functions. That
>>> gave me /dev/rmt/1 for the tape drive (I already had /dev/rmt/0 for the
>>> built in dds/3), and /dev/scsi/changer/c7t0d0 for the library. From
>>> there, I got mt and mtx working, and then plugging into backup software
>>> was elementary. I don't know if I can be of specific help, but maybe
>>> some of the details are transferable even if we have different tape
>>> hardware.
>>>
>> Trouble is, Solaris only wants to see one driver for the tape drive, it
>> appears. 'st' picks up my tape drives, and no other driver is needed.
>> 'st' apparently utilizes uscsi to pass the direct stuff through to the
>> tape drive. Trouble is, I get "Device busy" for all but the last job
>> that runs (these jobs are running concurrently in a spooled situation).
>> That by itself is unfortunate, but I guess if I only refer to the
>> last-run job, I still get the information at least.
>>
>
> So, the drives configure, st catches them, and you have something like
> /dev/rmt/0 and /dev/rmt/1?
Yup.
> And mt works? If you have something writing to the tape and, on another
> terminal, do `mt status`, you get busy?
I haven't tried this. That is quite possible however -- that's sort of
similar to what I want to do (or really, what Bacula wants to do... I'm
perfectly happy to have an Alert run after a batch of jobs. I suppose I
could schedule an admin job for it).
> And what exactly is the problem? Are the jobs running? What are you
> doing that is giving you device busy?
I'm guessing other jobs that are running concurrently in the spooling
situation. Sorry if I repeat myself some in this e-mail -- I read it
somewhat out-of-order.
>> Sometimes though, as a result of this interaction somehow, there will be
>> an Bacula KABOOM that occurs just after the last run job. That is what
>> the original reason for this thread was. However, I really would be able
>> to like to check that tape drive after every job if there's some way. Do
>> the SCSI generic drivers have the ability to read the tape while the
>> tape is being written to in Linux, or does the same trouble exist?
>
> My understanding, in general, not just Solaris, is that if a process has
> a device and is using it (writing data to it), then other processes
> cannot access it. They will get "busy". With a disk drive, it is the OS
> that has control of the device and moderates requests from other
> processes. So it seems as though you can have multiple processes reading
> and writing at the same time, but it's really the OS putting requests in
> sequence, and worst cases can get the drive whipping back and forth and
> slowing to a crawl. With a tape drive, the linearity and relative
> slowness preclude that kind of moderation of multiple requests. One
> process has to release the device before another can use it. In either
> case (disk or tape), if you try to dive under the OS or driver and do
> something to the device directly, you're likely to crash something.
I'm not sure if it's different when I don't really want to read or write
to the drive, just ask it how it's doing... but I suppose it's hard to
say. This question might also be one for the smartmontools folks. I've
asked Sun and they are clueless (one engineer told me to symlink the
device to another name -- don't think that would have any effect).
> Is the following a passable paraphrase of the situation? -- What you are
> after is scheduling several jobs to run concurrently and spool, and yet
> you want to jump in in the middle and do something yourself to see how
> things are going. Since the jobs are busy doing stuff, you can't do much
> more than get a busy signal until they are done. Or am I just off base
> and missing what you're talking about?
Not in the middle, I suppose at the end of a job... however, the end of
one job may not be the end of another if there are concurrent jobs
spooling to tape. This may mean that the Alert Command on a single tape
drive is not possible with concurrent jobs. Don't know. I'm not "jumping
in" though, I've configured a feature (though possibly incorrectly).
> Doesn't bacula have other ways of telling you how far along it is or how
> much it has done?
I think what you're missing is what is failing. :) I'm using the Alert
Command:
- ---
Alert Command = name-string
The name-string specifies an external program to be called at the
completion of each Job after the device is released. The purpose of this
command is to check for Tape Alerts, which are present when something is
wrong with your tape drive (at least for most modern tape drives). The
same substitution characters that may be specified in the Changer
Command may also be used in this string. For more information, please
see the Autochangers chapter of this manual.
Note, it is not necessary to have an autochanger to use this
command. The example below uses the tapeinfo program that comes with the
mtx package, but it can be used on any tape drive. However, you will
need to specify a Changer Device directive in your Device resource (see
above) so that the generic SCSI device name can be edited into the
command (with the %c).
An example of the use of this command to print Tape Alerts in the
Job report is:
Alert Command = "sh -c 'tapeinfo -f %c | grep TapeAlert'"
and an example output when there is a problem could be:
bacula-sd Alert: TapeAlert[32]: Interface: Problem with SCSI interface
between tape drive and initiator.
- ---
...but I'm having trouble since apparently Solaris is unhappy about
having me communicate with the tape drive while jobs are running (Bacula
first closes the drive and then runs the command, but in a concurrent
situation, things may work differently). I'm using smartmontools to
check the drive for errors, basically.
Now folks have told me that this is not a problem in Linux because you
talk to a different interface. I'm not 100% sure that's true, because
they still may be either querying the tape after only one job has run,
or perhaps are using an autochanger, where the generic device and the
tape drive really ARE two different things. Don't know, but I'm asking. :)
- --
---- _ _ _ _ ___ _ _ _
|Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Systems Programmer II
|$&| |__| | | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGrl35mb+gadEcsb4RAltuAKCLfgTKgzckZJeNZEhPxUKse0iPwACdF1UU
CPyXXDMCL+NHNX1QddnZ42k=
=oFgG
-----END PGP SIGNATURE-----
begin:vcard
fn:Ryan Novosielski
n:Novosielski;Ryan
org:UMDNJ;IST/AST
adr;dom:MSB C630;;185 South Orange Avenue;Newark;NJ;07103
email;internet:[EMAIL PROTECTED]
title:Systems Programmer III
tel;work:(973) 972-0922
tel;fax:(973) 972-7412
tel;pager:(866) 20-UMDNJ
x-mozilla-html:FALSE
version:2.1
end:vcard
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users