Am 05.03.2018 um 18:22 schrieb Felix Kuehling:
[SNIP]
I've been looking for documentation about the expected behaviour for
system calls interrupted by signals. All I can find is this statement in
Documentation/kernel-hacking/hacking.rst:
After you slept you should check if a signal occurred: the Unix/Linux
way of handling signals is to temporarily exit the system call with the
``-ERESTARTSYS`` error. The system call entry code will switch back to
user context, process the signal handler and then your system call will
be restarted (unless the user disabled that). So you should be prepared
to process the restart, e.g. if you're in the middle of manipulating
some data structure.
The same paragraph is also copied in
https://www.kernel.org/doc/htmldocs/kernel-hacking/ioctls.html.
The key sentence is "should be prepared to process the restart, e.g. if
you're in the middle of manipulating some data structure". Rolling back
everything would achieve that, but it's not the only way and not the
most efficient way.
In this case, I'm handling the restart by checking whether the VM is
already a compute VM. So on the restart the conversion to a compute VM
becomes a no-op.
Similarly, in the map_memory_to_gpu ioctl, mapping on a GPU where the
memory is already mapped becomes a no-op, which handles the restart
case. In this case we even added a specific test for this condition in
kfdtest, where we deliberately interrupt a big mapping operation with a
signal.
This approach is a clear no-go.
Changes done before interrupting a system call doesn't necessary need
to be reverted, but only if they don't have a user visible effect.
First of all, I don't see that requirement document anywhere, except in
emails from you. The documentation that I've been able to find states no
such requirement. Can you point me to where this rule is documented?
That is documented in mail archives as far as I know, but I clearly
remember that Daniel tried to write down documentation about that not so
long ago. Maybe I can dig up the mails about that.
Second, the way I see it, a mapping that is not complete has no user
visible effect. Before the mapping, the behaviour of accessing the
mapping is undefined. After an interrupted mapping, the behaviour is
still undefined.
That is a possibility, but I'm not sure if that counts here.
For example: When we map something we need to allocate multiple page
directories. It is not necessary to release all previous allocated
ones if an allocation suddenly says that it was interrupted, but only
because those page directories are not user accessible.
Similar exceptions can be done by avoiding things when the IOCTL was
interrupted. E.g. in the VA IOCTL we usually directly submit updating
the page tables, but if that is interrupted we just return success
instead of restarting the IOCTL and wait with the update until the
first command submission.
The easiest way to work around this to to add a separate IOCTL which
waits for VM updates to complete. Should be trivial to use
vm->last_update for this.
For the cost of an extra system call. That means every GPU mapping
operation now requires two system calls. With Spectre workarounds, that
can be a lot more expensive.
There is still very little overhead associated with that.
And additional to this as long you actually need to wait it doesn't
matter if you wait directly in one system call or if you split it into
two. The time from the initial call to the end of the operation stays
the same.
You can then also pipeline multiple mappings and wait for all of them to
finish in the end.
Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx