On Wed, 19 Feb 2020, jasper.low...@bt.com wrote:
When configuring devices, Solaris 10 uses the SET_FEATURE command on the CMD646
to set the transfer mode to MDMA mode.
From what I can tell, this is successful and the emulated IDE controller
raises an interrupt acknowledging that the command was completed
successfully. To determine whether or not this interrupt was
successfully propagated to Solaris 10, I made manual changes to ensure
that the interrupt was not raised for this event at this specific time.
This resulted in a new error from Solaris 10 regarding "set_features".
- Solaris 10 appears to be able to see the interrupt from the completion of the
SET_FEATURE command.
- Solaris 10 appears to then perform two reads on the status register. From
what I understand, this has the side effect of clearing interrupts.
- Solaris 10 then writes to the device/head register.
- Solaris 10 then spins on ARTTIM23_INTR_CH1 expecting it to be set.
When it is not set, the operation times out and we are presented with
the fatal error regarding set_features.
I am not intimately familiar with the workings of the CMD646 or the ATA
specification so I can only speculate.
- If the interrupt that Solaris 10 expects is the one from the
SET_FEATURE command, then Solaris 10 is not expecting reading from the
status register to clear ARTTIM23_INTR_CH1.
- If the interrupt that Solaris 10 expects is not the one from the
SET_FEATURE command, then it must expect an interrupt to occur from
writing to the device/head register.
I don't have definitive answers so these are some ideas but I may be
completely wrong.
I don't know about Solaris but what I've seen on PPC and via-ide is that
it works until switched to UDMA mode then it freezes on the first command
issued after switching to UDMA so it seems like it expects an interrupt
that's not generated or not routed correctly but only in DMA mode, in the
initial PIO mode it works. Not sure if this is useful at all for your case
though so you may just disregard it.
I found it strange that Solaris 10 was spinning on ARTTIM23_INTR_CH1. Is
it possible that Solaris 10 is not expecting the values of
ARTTIM23_INTR_CH1 and MRDMODE_INTR_CH1 to be synced? I made changes to
disable the syncing and the fatal error from Solaris 10 disappeared.
Unfortunately, I can't tell whether or not this actually improved the
emulation of Solaris 10 as the serial console is still unresponsive.
I think the syncing was added in commit 271dddd1 and the commit log:
https://lists.gnu.org/archive/html/qemu-devel/2014-08/msg02644.html
cites the data sheet for that and there were other commits around it that
were changing similar things as well. I guess this was fixing some problem
at the time (Mark may remember more) so maybe these are correct but I
don't know what actual hardware does. I also remember this IDE controller
chip had different versions with early ones having implementation bugs
that could cause problems so people generally avoided it or drivers may
have hacks to fix those so it's possible that this tries to work around
some hardware bug? I don't remember the details but maybe Linux kernel
source has some history on this.
If there is a bug in the Solaris 10 driver I would expect this error to
be more widely referenced online. I suspect that the emulated CMD646 is
not perfectly faithful to the hardware and this is causing problems for
Solaris 10.
I am not convinced that this problem is related to IRQ routing as
Solaris 10 appears to recognise interrupts when they happen (or don't).
Because of this, I don't think this error is related to the DMA problem
under MorphOS but I could be wrong.
I'm not sure either because during testing I've seen two cases and in one
IRQ was raised but did not reach CPU due to being masked in interrupt
controller so that suggests it's not a problem with generating the IRQ in
IDE code but problem is afterwards but still could not understand why it
fails. (Seems to work on Linux though so maybe understanding what the
working and non-working cases do differently could get closer to the
answer.)
Does anyone have any ideas that might explain why Solaris 10 insists
that ARTTIM23_INTR_CH1 is set despite two previous reads of the status
register?
I can only guess. The data sheet says that in PCI native mode these bits
should be checked to determine if an interrupt on PCI INTA is coming from
this controller (but for PIO mode, for DMA it just refers to Intel's spec
without any more info). Specifically:
"When an IDE port is in PCI IDE Native Mode, the IDE task file registers
may be mapped to non-standard port addresses, and IDE drive interrupts
occur at PCI INTA. [As opposed to Legacy mode when it uses standard ISA
IDE port numbers and IRQ14 and 15.] Therefore, if both IDE ports are in
PCI IDE Native Mode, drive interrupts from both IDE ports are multiplexed
into PCI INTA. In this case, the interrupt status bits must be polled to
determine which IDE port generated the interrupt, or whether the interrupt
was generated by another PCI device sharing INTA on the bus.
1) The host reads CFR (index 50h). If bit 2 is set, then the interrupt
occurred on the primary IDE port.
2) The host reads ARTTIM23. If bit 4 is set, then the interrupt occurred
on the secondary IDE port.
3) If 1) and 2) are both false, then the interrupt was generated by
another PCI device sharing INTA with the PCI0646."
As for why it polls this reg, if it's not expecting interrupt on primary
port and just reading both as described above, it may be expecting more
interrupts than QEMU is generating or it may expect them to arrive with
some delay or after previous one is cleared that QEMU could just raise
once due to being faster or doing something differently? Does someone know
what interrupts are generated on real hardware in DMA mode so we can
compare that to what we see with QEMU?
Regards,
BALATON Zoltan