>> Mar 02 11:40:35 localhost.localdomain multipathd[85474]: directio
>> checker refcount 6
>> Mar 02 11:40:35 localhost.localdomain multipathd[85474]: lxk free tur
>> checkerĀ //checker_put
>
>
> So we do not see "unloading tur checker". Like you said, that suggests
> that the crash occurs between libcheck_free() and the thread exiting.
"lxk free tur checker" is add in free_checker called by checker_put.
I don't change the level of "unloading tur checker", so we don't see it.
@@ -58,7 +58,7 @@ void free_checker (struct checker * c)
return;
c->refcount--;
if (c->refcount) {
- condlog(3, "%s checker refcount %d",
+ condlog(2, "%s checker refcount %d",
c->name, c->refcount);
return;
}
@@ -77,6 +77,7 @@ void free_checker (struct checker * c)
pthread_join(ct->thread, NULL);
};
}
+ condlog(2, "lxk free %s checker", c->name);
FREE(c);
}
> I suggest you put a message in tur.c:libcheck_free (), AFTER the call
> to cleanup_context(), printing the values of "running" and "holders"
> Anyway:
>
> holders = uatomic_sub_return(&ct->holders, 1);
> if (!holders)
> cleanup_context(ct);
>
> Whatever mistakes we have made, only one actor can have seenĀ
> holders == 0, and have called cleanup_context().
>
diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 4ea63af..900f960 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -105,8 +105,11 @@ void libcheck_free (struct checker * c)
pthread_cancel(ct->thread);
ct->thread = 0;
holders = uatomic_sub_return(&ct->holders, 1);
- if (!holders)
+ if (!holders) {
+ running = uatomic_xchg(&ct->running, 0);
cleanup_context(ct);
+ condlog(2, "lxk tur running is %d", running);
+ }
c->context = NULL;
}
return;
Here I add running print but it is zero.
> The stacks you have shown indicate that the instruction pointers were
> broken. That would suggest something similar as dicussed in the ML
> thread leading to 38ffd89 ("libmultipath: prevent DSO unloading with
> astray checker threads"). Your logs show "tur checker refcount 1", so
> the next call to checker_put would have unloaded the DSO.
Here I test 0.8.5 master code with commit 38ffd89. There is no crash
in five hours (without patch, crash happen in running test script
for 30 to 40 minutes.)
Regards,
Lixiaokeng
--
dm-devel mailing list
[email protected]
https://listman.redhat.com/mailman/listinfo/dm-devel