On Tue, Nov 06, 2012 at 12:09:42AM -0000, Steven Hartland wrote: | Thanks Doug, actually just finished another test run with some more | debugging in and I believe I've found the reason for the non-recusive | lock and at least some of the queuing issues.| | The non-recursive lock is due to the mfi_tbolt_reset calling| mfi_process_fw_state_chg_isr with mfi_io_lock held which in turn calls | mfi_tbolt_init_MFI_queue which tries to acquire mfi_io_lock hence | the problem.| | mfi-lock.txt attached I believe fixes this as well as what appears| to be an invalid call to mtx_unlock(&sc->mfi_io_lock) in mfi_attach | which never acquires the lock as far as can see, possibly a cut and | paste error.I don't seem to see the attachment.Yer seems like some mail fail by me there, but I've had some more locking panics during todays tests anyway, requiring additional fixes. Will update and post when I'm happy with it.
OK two patches attached == zz-mfi-lock.patch == Fixes mfi panic on recused on non-recusive mutex MFI I/O lock Removes a mtx_unlock call for mfi_io_lock which is never aquired == zz-mfi-queue.patch == Fixes queuing issues where mfi_release_command blindly sets the cm_flags = 0 without first removing the command from the relavent queue. This was causing panics in the queue functions which check to ensure a command is not on another queue. Also fixed some cases where the error from mfi_mapcmd was lost and where the command was never released / dequeued in error cases. Ensure that all failures to mfi_mapcmd are logged Fixed possible null pointer exception in mfi_aen_setup if mfi_get_log_state failed. Fixed mfi_parse_entries & mfi_aen_setup not returning possible errors Corrected MFI_DUMP_CMDS calls with invalid vars SC vs sc Commands which have timed out now set cm_error to ETIMEDOUT and call mfi_complete which prevents them getting stuck in the busy queue forever. Fixed possible use of NULL pointer in mfi_tbolt_get_cmd Changed output formats to be more easily recognisable when debugging. A few style (9) fixes e.g. braced single line conditions and double blank lines ---------- I've just had another panic, trace below, but it doesn't seem to be related to my changes so I'd appreciate your feedback on them as they are for now. While the lock patch fixes the problems I've seen, its not clear to me why mfi_tbolt_reset is acquiring the lock and hence requiring mfi_process_fw_state_chg_isr to jump through hoops to ensure locking around queue manipulation is done correctly. Given what its doing (resetting the entire adapter) I wouldn't be surprised if it should really be acquiring the config lock. Other things I've noticed / questions * Should mfi_abort sleep even if its call to mfi_mapcmd fails? * Should mfi_get_controller_info really ignore the error from mfi_mapcmd? * Do these controllers not support none 512 byte requests? Currently all syspd requests are done assuming 512 byte sectors which the disk may not be. This will both reduce performance or potentially break totally if the firmware isn't translating it under the surface correctly. Anyway the new panic manually transcribed is:- panic: Bad linx elm 0xffffff0069b0fc0 next->prev != elm ... mfi_tbolt_get_cmd() mfi_build_mpt_pass_thru() mfi_tbolt_build_mpt_cmd() mfi_tbolt_send_frame() bus_dmamap_load() mfi_mapcmd() mfi_startio() mfi_syspd_strategy() g_disk_start() g_io_schedule_down() g_down_proc_body() fork_exit() fork_trampoline() Looks like mfi_cmd_tbolt_tqh has become corrupt some how, but as far as I can tell all manip is done using the TAILQ macros and under mfi_io_lock so its not obvious to me at this time why this is, any ideas? There was an obvious error in mfi_tbolt_get_cmd which is now fixed in the queue patch, where cmd can be used even if queue was empty and TAILQ_FIRST returned NULL, but I can't see this causing this panic. This is running with a debug kernel with:- options WITNESS options INVARIANTS options INVARIANT_SUPPORT options DDB options GDB options PRINTF_BUFR_SIZE=2048 options MFI_DEBUG Unfortunately I've only got this hardware till Friday unfortunately so any ideas would be most appreciated so I can get testing done before then. Regards Steve ================================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk.
zz-mfi-lock.patch
Description: Binary data
zz-mfi-queue.patch
Description: Binary data
_______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"