On Fri, 2005-03-04 at 23:57 +0100, Pavel Machek wrote:
> What prevents driver from being run on another CPU, maybe just doing
> mdelay() between hardware accesses?
Almost all drivers that I know have some sort of locking. Nothing nasty
about it. Besides, you can't expect everything to be as simp
On Fri, 2005-03-04 at 14:54 +0100, Pavel Machek wrote:
> Hi!
>
> > > If there's no ->error method, at leat call ->remove so one device only
> > > takes itself down.
> > >
> > > Does this make sense?
> >
> > This was my thought too last time we had this discussion. A completely
> > asynchronous
On Sat, 2005-03-05 at 00:18 +0100, Pavel Machek wrote:
> On So 05-03-05 10:03:37, Benjamin Herrenschmidt wrote:
> > On Fri, 2005-03-04 at 23:57 +0100, Pavel Machek wrote:
> >
> > > What prevents driver from being run on another CPU, maybe just doing
> > > mdelay() between hardware accesses?
> >
On So 05-03-05 10:03:37, Benjamin Herrenschmidt wrote:
> On Fri, 2005-03-04 at 23:57 +0100, Pavel Machek wrote:
>
> > What prevents driver from being run on another CPU, maybe just doing
> > mdelay() between hardware accesses?
>
> Almost all drivers that I know have some sort of locking. Nothing
Hi!
> > Hmm, before we go async way (nasty locking, no?) could driver simply
> > ask "did something bad happen while I was sleeping?" at begining of each
> > function?
> >
> > For DMA problems, driver probably has its own, timer-based,
> > "something is wrong" timer, anyway, no?
>
> No, there is
On Friday, March 4, 2005 5:54 am, Pavel Machek wrote:
> Hi!
>
> > > If there's no ->error method, at leat call ->remove so one device only
> > > takes itself down.
> > >
> > > Does this make sense?
> >
> > This was my thought too last time we had this discussion. A completely
> > asynchronous call
On Fri, Mar 04, 2005 at 11:03:29AM +0900, Hidetoshi Seto was heard to remark:
> >p.s. I would like to have iochk_read() take struct pci_dev * as an
> >argument. (I could store a pointer to pci_dev in the "cookie" but
> >that seems odd).
>
> I'd like to store the pointer and handle all only with t
Hi!
> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.
> >
> > Does this make sense?
>
> This was my thought too last time we had this discussion. A completely
> asynchronous call is probably needed in addition to Hidetoshi's proposed API,
> since
Thanks for all comments!
OK, I'd like to sort our situation:
$ Here are 2 features:
- iochk_clear/read() interface for error "detection"
by Seto ... me :-)
- callback, thread, and event notification for error "recovery"
by Linas ... expert in PPC64
$ What will "dete
Linas Vepstas wrote:
Below is some "pseudocode" version (mentally substitute
"pci error event" for every occurance of "eeh"). Its got some
ppc64-specific crud in there that we have to fix to make it
truly generic (I just cut and pasted from current code).
Would a cleaned up version of this code
Linas Vepstas wrote:
If their defaults are no-ops, device
maintainers who develops their driver on not-implemented arch should be
more careful.
Why? People who write device drivers already know if/when they need to
disable interrupts, and so they already disable if they need it.
OK, I'll remake
On Wednesday, March 2, 2005 3:30 pm, Linas Vepstas wrote:
> Put it another way: a device driver author should have the opportunity
> to poll the pci bus status if they so desire. Polling for bus status
> on ppc64 is real easy. Given what Jesse Barnes was saying, it sounded
> like a simple (option
On Thu, Mar 03, 2005 at 09:41:43AM +1100, Benjamin Herrenschmidt was heard to
remark:
> On Wed, 2005-03-02 at 12:22 -0600, Linas Vepstas wrote:
> > On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to
> > remark:
> > >
> > > The new API is what _allows_ a driver to care. It does
On Thu, Mar 03, 2005 at 09:46:12AM +1100, Benjamin Herrenschmidt was heard to
remark:
> On Wed, 2005-03-02 at 14:02 -0600, Linas Vepstas wrote:
> > On Wed, Mar 02, 2005 at 09:27:27AM +1100, Benjamin Herrenschmidt was heard
> > to remark:
>
> > That's a style issue. Propose an API, I'll code it.
On Wed, 2005-03-02 at 13:03 -0500, linux-os wrote:
> > event->dev = dev;
> > event->reset_state = rets[0];
> > event->time_unavail = rets[2];
> >
> > /* We may be called in an interrupt context */
> > spin_lock_irqsave(&eeh_eventlist_lock, flags);
> ^^
> > list_add
On Wed, 2005-03-02 at 12:22 -0600, Linas Vepstas wrote:
> On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to remark:
> >
> > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > I think that's because nobody knows how to handle it (ie it's probably
> > h
> One issue with that is how to notify drivers that they need to make this
> call.
> In may cases, DMA completion will be signalled by an interrupt, but if the
> DMA failed, that interrupt may never happen, which means the call to
> pci_unmap or the above function from the interrupt handler m
On Wed, 2005-03-02 at 14:02 -0600, Linas Vepstas wrote:
> On Wed, Mar 02, 2005 at 09:27:27AM +1100, Benjamin Herrenschmidt was heard to
> remark:
> That's a style issue. Propose an API, I'll code it. We can have
> the master recovery thread be a state machine, and so every device
> driver gets
On Wed, Mar 02, 2005 at 09:27:27AM +1100, Benjamin Herrenschmidt was heard to
remark:
> On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:
>
> > The current proposal (and prototype) has a "master recovery thread"
> > to handle the coordinated reset of the pci controller. This master
> > rec
On Wed, Mar 02, 2005 at 10:41:06AM -0800, Jesse Barnes was heard to remark:
> On Wednesday, March 2, 2005 10:22 am, Linas Vepstas wrote:
> > On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to
> remark:
> > > The new API is what _allows_ a driver to care. It doesn't handle DMA, b
On Wed, Mar 02, 2005 at 03:13:05PM +0900, Hidetoshi Seto was heard to remark:
[ .. iochk_clear() and iochk_read() ...]
> And then, I don't think it need to have "pci" ... limitation of this
> API's target. It would not be match if there are a recoverable device
> over some PCI to XXX bridge, or i
On Wednesday, March 2, 2005 10:22 am, Linas Vepstas wrote:
> On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to
remark:
> > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > I think that's because nobody knows how to handle it (ie it's probably
> > hw
On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to remark:
>
> The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> I think that's because nobody knows how to handle it (ie it's probably
> hw-dependent and all existign implementations would thus be
> drive
On Wed, 2 Mar 2005, Linas Vepstas wrote:
On Wed, Mar 02, 2005 at 11:28:01AM +0900, Hidetoshi Seto was heard to remark:
Note that here is a difficulty: the MCA handler on some arch would run on
special context - MCA environment. In other words, since some MCA handler
[SNIPPED...]
/**
* queue up a pc
On Wed, Mar 02, 2005 at 11:28:01AM +0900, Hidetoshi Seto was heard to remark:
>
> Note that here is a difficulty: the MCA handler on some arch would run on
> special context - MCA environment. In other words, since some MCA handler
> would be called by non-maskable interrupt(e.g. NMI), so it's dif
Linas Vepstas wrote:
>> I'd prefer to see it as ioerr_clear(), ioerr_read() ...
>
> I'd prefer pci_io_start() and pci_io_check_err()
>
> The names should have "pci" in them.
>
> I don't like "ioerr_clear" because it implies we are clearing the io error; we are not; we are clearing the checker
for
Jesse Barnes wrote:
This was my thought too last time we had this discussion. A completely
asynchronous call is probably needed in addition to Hidetoshi's proposed API,
since as you point out, the driver may not be running when an error occurs
(e.g. in the case of a DMA error or more general bu
Matthew Wilcox wrote:
I think what Jeff meant was "this new API handles none of this".
And that's true, it doesn't handle DMA errors. But I think that's just
something that hasn't been written/designed yet.
Yes, this API just supports drivers wanting to be more RAS-aware.
It would be happy if how
> In fact, I'd argue that even a driver that _uses_ the interface should not
> necessarily shut itself down on error. Obviously, it should always log the
> error, but outside of that it might be good if the operator can decide and
> set a flag whether it should try to re-try (which may not always
On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:
> The current proposal (and prototype) has a "master recovery thread"
> to handle the coordinated reset of the pci controller. This master
> recovery thyread makes three calls in struct pci_driver:
>
>void (*frozen) (struct pci_dev *);
On Tue, 2005-03-01 at 18:19 +0100, Andi Kleen wrote:
> Hidetoshi Seto <[EMAIL PROTECTED]> writes:
>
> >
> > int sample_read_with_iochk(struct pci_dev *dev, u32 *buf, int words)
> > {
> > unsigned long ofs = pci_resource_start(dev, 0) + DATA_OFFSET;
> > int i;
> >
> > /* Create magical
On Tue, 2005-03-01 at 09:10 -0800, Jesse Barnes wrote:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> > was an error on PCI bus 73. At t
On Tue, 2005-03-01 at 08:49 -0800, Linus Torvalds wrote:
>
> On Tue, 1 Mar 2005, Jeff Garzik wrote:
> >
> > A new API handles none of this.
>
> Ehh?
>
> The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> I think that's because nobody knows how to handle it (ie it's pro
> I have been thinking about PCI system and parity errors, and how to
> handle them. I do not think this is the correct approach.
>
> A simple retry is... too simple. If you are having a massive problem on
> your PCI bus, more action should be taken than a retry.
It goes beyond that, see bel
On Tue, 1 Mar 2005, Linas Vepstas wrote:
>
> > > - Additionally adds special token - abstract "iocookie" structure
> > > to control/identifies/manage I/Os, by passing it to OS.
> > > Actual type of "iocookie" could be arch-specific. Device drivers
> > > could use the iocookie structure wit
On Tue, Mar 01, 2005 at 02:42:11PM +, Matthew Wilcox was heard to remark:
> On Tue, Mar 01, 2005 at 05:33:48PM +0900, Hidetoshi Seto wrote:
> > Today's patch is 3rd one - iochk_clear/read() interface.
> > - This also adds pair-interface, but not to sandwich only readX().
> > Depends on platfo
On Tue, Mar 01, 2005 at 11:37:24AM -0500, Jeff Garzik was heard to remark:
>
> A new API handles none of this.
Seto is propsing an API that solves a different problem than what
you are thinking about.
In my case, the hardware (pci controller) will shut down a pci
slot(s) in the case of a pci err
On Tue, Mar 01, 2005 at 10:08:48AM -0800, Linus Torvalds was heard to remark:
>
> On Tue, 1 Mar 2005, Andi Kleen wrote:
> >
> > But what would the default handling be? It would be nice if there
> > was a simple way for a driver to say "just shut me down on an error"
> > without adding iochk_* to
On Tue, Mar 01, 2005 at 10:08:48AM -0800, Linus Torvalds wrote:
> The thing is, IO errors just will be very architecture-dependent. Some
> might have exceptions happening, without the exception handler really
> having much of an idea of who caused it, unless that driver had prepared
> it some wa
On Tue, Mar 01, 2005 at 09:10:29AM -0800, Jesse Barnes was heard to remark:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc).
I assume "MCA" stands for machine check a
On Tue, 1 Mar 2005, Andi Kleen wrote:
>
> But what would the default handling be? It would be nice if there
> was a simple way for a driver to say "just shut me down on an error"
> without adding iochk_* to each function. Ideally this would be just
> a standard callback that knows how to clean u
Hidetoshi Seto <[EMAIL PROTECTED]> writes:
>
> int sample_read_with_iochk(struct pci_dev *dev, u32 *buf, int words)
> {
> unsigned long ofs = pci_resource_start(dev, 0) + DATA_OFFSET;
> int i;
>
> /* Create magical cookie on the stack */
> iocookie cookie;
>
> /* Crit
On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> The MCA handler has to go and figure out what the hell just happened
> (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> was an error on PCI bus 73. At this point, I think the architecture
> error handler needs to ca
On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds wrote:
> On Tue, 1 Mar 2005, Jeff Garzik wrote:
> > A new API handles none of this.
>
> Ehh?
I think what Jeff meant was "this new API handles none of this".
And that's true, it doesn't handle DMA errors. But I think that's just
something
On Tue, 1 Mar 2005, Jeff Garzik wrote:
>
> A new API handles none of this.
Ehh?
The new API is what _allows_ a driver to care. It doesn't handle DMA, but
I think that's because nobody knows how to handle it (ie it's probably
hw-dependent and all existign implementations would thus be
driver-s
Hidetoshi Seto wrote:
Hi, long time no see :-)
Currently, I/O error is not a leading cause of system failure.
However, since Linux nowadays is making great progress on its
scalability, and ever larger number of PCI devices are being
connected to a single high-performance server, the risk of the
I/O
On Tue, Mar 01, 2005 at 05:33:48PM +0900, Hidetoshi Seto wrote:
> Today's patch is 3rd one - iochk_clear/read() interface.
> - This also adds pair-interface, but not to sandwich only readX().
> Depends on platform, starting with ioreadX(), inX(), writeX()
> if possible... and so on could be tar
Hi, long time no see :-)
Currently, I/O error is not a leading cause of system failure.
However, since Linux nowadays is making great progress on its
scalability, and ever larger number of PCI devices are being
connected to a single high-performance server, the risk of the
I/O error is increasing d
48 matches
Mail list logo