On Tue, Mar 10, 2015 at 01:34:14AM +0100, Thorsten Bschorr wrote: > > It looks like your patch runs into dead locks problems:
I was testing reading and removing the slave, I didn't test two readers, I'll keep that in mind. For some reason I was thinking it was a try lock after the sleep, it isn't, a signal can cause it to return a value, but yes that's not going to work. I have another version that uses an atomic counter in place of this mutex, it requires a loop in the remove slave. I have a one wire network with 14 temperature sensors on it. What I do is I have a program that talks to the kernel over netlink, which is now non-blocking for one wire after an earlier set of changes I made. It sends out all the conversion request requests, waits, sends the read request packets, then gets back the results, and relays the results to the program that requested them. It doesn't use the w1_therm and avoids all these problems. If I read them sequentially with w1_therm, that would take more than 14*.750 or 10.5 seconds, there's no way with w1_therm to read them at once without having 14 threads or programs doing so, the problem is it takes a lot more programming to do netlink. > > I have a cron job reading my sensors. If I read the sensors on another > > thread (e.g. via cat), the 2nd thread can produce a dead lock: > > > > * thread 1 has bus & sl lock > > * thread 1 drops bus lock, keeps sl locks and sleeps > > * thread 2 get bus lock, waits for sl lock > > * thread 1 returns from sleep, waits for bus lock, but this is help by > > thread 2 > > > Aquiring sl lock before the bus lock prevents this dead lock (no > change of locking-order). > With search enabled, removing a device by w1_search_process_cb then > also worked as intended: > > Mar 10 01:29:04 pi kernel: [ 924.870893] w1_therm_remove_slave about > to lock faily_data->lock > Mar 10 01:29:04 pi kernel: [ 924.870935] w1_therm_remove_slave > unlocked faily_data->lock > > Mar 10 01:29:04 pi kernel: [ 924.871115] w1_therm_remove_slave about > to lock faily_data->lock > Mar 10 01:29:05 pi kernel: [ 925.151242] start sleep d117a600 refcnt 1 ** > Mar 10 01:29:05 pi kernel: [ 925.151277] end sleep d117a600 refcnt 0 ** > Mar 10 01:29:11 pi kernel: [ 931.295344] w1_slave_driver > 10-000802cc045a: Read failed CRC check > Mar 10 01:29:11 pi kernel: [ 931.295437] w1_therm_remove_slave > unlocked faily_data->lock > ** I get the ref-cnt before and after the sleep and only log if they differ. > > > But I am unable to judge if mobing the sl lock at the beginning of > w1_slave_show can cause dead locks in other scenarios. I'm not sure, I would probably switch back to the referencing counting version I wrote earlier, or make the bus mutex lock a timed lock or try lock first. -- David Fries <da...@fries.net> PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/