[PATCH v2] selftests/futex: Create test for robust list
Create a test for the robust list mechanism. Signed-off-by: André Almeida --- Changes from v1: - Change futex type from int to _Atomic(unsigned int) - Use old futex(FUTEX_WAIT) instead of the new sys_futex_wait() --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../selftests/futex/functional/robust_list.c | 448 ++ 3 files changed, 451 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/robust_list.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index fbcbdb6963b3..4726e1be7497 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -9,3 +9,4 @@ futex_wait_wouldblock futex_wait futex_requeue futex_waitv +robust_list diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index f79f9bac7918..b8635a1ac7f6 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -17,7 +17,8 @@ TEST_GEN_PROGS := \ futex_wait_private_mapped_file \ futex_wait \ futex_requeue \ - futex_waitv + futex_waitv \ + robust_list TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/robust_list.c b/tools/testing/selftests/futex/functional/robust_list.c new file mode 100644 index ..9308eb189d48 --- /dev/null +++ b/tools/testing/selftests/futex/functional/robust_list.c @@ -0,0 +1,448 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2024 Igalia S.L. + * + * Robust list test by André Almeida + * + * The robust list uAPI allows userspace to create "robust" locks, in the sense + * that if the lock holder thread dies, the remaining threads that are waiting + * for the lock won't block forever, waiting for a lock that will never be + * released. + * + * This is achieve by userspace setting a list where a thread can enter all the + * locks (futexes) that it is holding. The robust list is a linked list, and + * userspace register the start of the list with the syscall set_robust_list(). + * If such thread eventually dies, the kernel will walk this list, waking up one + * thread waiting for each futex and marking the futex word with the flag + * FUTEX_OWNER_DIED. + * + * See also + * man set_robust_list + * Documententation/locking/robust-futex-ABI.rst + * Documententation/locking/robust-futexes.rst + */ + +#define _GNU_SOURCE + +#include "../../kselftest_harness.h" + +#include "futextest.h" + +#include +#include +#include + +#define STACK_SIZE (1024 * 1024) + +#define FUTEX_TIMEOUT 3 + +static pthread_barrier_t barrier, barrier2; + +int set_robust_list(struct robust_list_head *head, size_t len) +{ + return syscall(SYS_set_robust_list, head, len); +} + +int get_robust_list(int pid, struct robust_list_head **head, size_t *len_ptr) +{ + return syscall(SYS_get_robust_list, pid, head, len_ptr); +} + +/* + * Basic lock struct, contains just the futex word and the robust list element + * Real implementations have also a *prev to easily walk in the list + */ +struct lock_struct { + _Atomic(unsigned int) futex; + struct robust_list list; +}; + +/* + * Helper function to spawn a child thread. Returns -1 on error, pid on success + */ +static int create_child(int (*fn)(void *arg), void *arg) +{ + char *stack; + pid_t pid; + + stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE, +MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); + if (stack == MAP_FAILED) + return -1; + + stack += STACK_SIZE; + + pid = clone(fn, stack, CLONE_VM | SIGCHLD, arg); + + if (pid == -1) + return -1; + + return pid; +} + +/* + * Helper function to prepare and register a robust list + */ +static int set_list(struct robust_list_head *head) +{ + int ret; + + ret = set_robust_list(head, sizeof(struct robust_list_head)); + if (ret) + return ret; + + head->futex_offset = (size_t) offsetof(struct lock_struct, futex) - +(size_t) offsetof(struct lock_struct, list); + head->list.next = &head->list; + head->list_op_pending = NULL; + + return 0; +} + +/* + * A basic (and incomplete) mutex lock function with robustness + */ +static int mutex_lock(struct lock_struct *lock, struct robust_list_head *head, bool error_inject) +{ + _Atomic(unsigned int) *futex = &lock->futex; + int zero = 0, ret = -1; + pid_t tid = gettid(); + + /* +* Set list_op_pending before starting the lock, so the kernel can catch +* the case where the thread died during the lock operation +*/
Re: [PATCH RESEND] selftests/futex: Order calls in futex_requeue
Hi Edward, Thanks for your patch! Em 03/09/2024 17:39, Edward Liaw escreveu: Similar to fbf4dec70277 ("selftests/futex: Order calls to futex_lock_pi"), which fixed a flake in futex_lock_pi due to racing between the parent and child threads. The same issue can occur in the futex_requeue test, because it expects waiterfn to make progress to futex_wait before the parent starts to requeue. This is mitigated by the parent sleeping for WAKE_WAIT_US, but it still fails occasionally. This can be reproduced by adding a sleep in the waiterfn before futex_wait: TAP version 13 1..2 not ok 1 futex_requeue simple returned: 0 not ok 2 futex_requeue simple returned: 0 not ok 3 futex_requeue many returned: 0 not ok 4 futex_requeue many returned: 0 Instead, replace the sleep with barriers to make the sequencing explicit. Reviewed-by: André Almeida
[RFC PATCH 00/13] Add futex2 syscalls
of the interface and shouldn't be merged as it is. * Testing: This patchset provides selftests for each operation and their flags. Along with that, the following work was done: ** Stability To stress the interface in "real world scenarios": - glibc[4]: nptl's low level locking was modified to use futex2 API (except for robust and PI things). All relevant nptl/ tests passed. - Wine[5]: Proton/Wine was modified in order to use futex2() for the emulation of Windows NT sync mechanisms based on futex, called "fsync". Triple-A games with huge CPU's loads and tons of parallel jobs worked as expected when compared with the previous FUTEX_WAIT_MULTIPLE implementation at futex(). Some games issue 42k futex2() calls per second. - Full GNU/Linux distro: I installed the modified glibc in my host machine, so all pthread's programs would use futex2(). After tweaking systemd[6] to allow futex2() calls at seccomp, everything worked as expected (web browsers do some syscall sandboxing and need some configuration as well). - perf: The perf benchmarks tests can also be used to stress the interface, and they can be found in this patchset. ** Performance - For comparing futex() and futex2() performance, I used the artificial benchmarks implemented at perf (wake, wake-parallel, hash and requeue). The setup was 200 runs for each test and using 8, 80, 800, 8000 for the number of threads, Note that for this test, I'm not using patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained at "The patchset" section. - For the first three ones, I measured an average of 4% gain in performance. This is not a big step, but it shows that the new interface is at least comparable in performance with the current one. - For requeue, I measured an average of 21% decrease in performance compared to the original futex implementation. This is expected given the new design with individual flags. The performance trade-offs are explained at patch 4 ("futex2: Implement requeue operation"). [4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2 [5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13 [6] https://gitlab.collabora.com/tonyk/systemd * FAQ ** "Where's the code for NUMA and FUTEX_8/16?" The current code is already complex enough to take some time for review, so I believe it's better to split that work out to a future iteration of this patchset. Besides that, this RFC is the core part of the infrastructure, and the following features will not pose big design changes to it, the work will be more about wiring up the flags and modifying some functions. ** "And what's about FUTEX_64?" By supporting 64 bit futexes, the kernel structure for futex would need to have a 64 bit field for the value, and that could defeat one of the purposes of having different sized futexes in the first place: supporting smaller ones to decrease memory usage. This might be something that could be disabled for 32bit archs (and even for CONFIG_BASE_SMALL). Which use case would benefit for FUTEX_64? Does it worth the trade-offs? ** "Where's the PI/robust stuff?" As said by Peter Zijlstra at [3], all those new features are related to the "simple" futex interface, that doesn't use PI or robust. Do we want to have this complexity at futex2() and if so, should it be part of this patchset or can it be future work? Thanks, André André Almeida (13): futex2: Implement wait and wake functions futex2: Add support for shared futexes futex2: Implement vectorized wait futex2: Implement requeue operation futex2: Add compatibility entry point for x86_x32 ABI docs: locking: futex2: Add documentation selftests: futex2: Add wake/wait test selftests: futex2: Add timeout test selftests: futex2: Add wouldblock test selftests: futex2: Add waitv test selftests: futex2: Add requeue test perf bench: Add futex2 benchmark tests kernel: Enable waitpid() for futex2 Documentation/locking/futex2.rst | 198 +++ Documentation/locking/index.rst |1 + MAINTAINERS |2 +- arch/arm/tools/syscall.tbl|4 + arch/arm64/include/asm/unistd.h |2 +- arch/arm64/include/asm/unistd32.h |4 + arch/x86/entry/syscalls/syscall_32.tbl|4 + arch/x86/entry/syscalls/syscall_64.tbl|4 + fs/inode.c|1 + include/linux/compat.h| 23 + include/linux/fs.h|1 + include/linux/syscalls.h | 18 + include/uapi/asm-generic/unistd.h | 14 +- include/uapi/linux/futex.h| 56 + init/Kconfig
[RFC PATCH 01/13] futex2: Implement wait and wake functions
Create a new set of futex syscalls known as futex2. This new interface is aimed to implement a more maintainable code, while removing obsolete features and expanding it with new functionalities. Implements wait and wake semantics for futexes, along with the base infrastructure for future operations. The whole wait path is designed to be used by N waiters, thus making easier to implement vectorized wait. * Syscalls implemented by this patch: - futex_wait(void *uaddr, unsigned int val, unsigned int flags, struct timespec *timo) The user thread is put to sleep, waiting for a futex_wake() at uaddr, if the value at *uaddr is the same as val (otherwise, the syscall returns immediately with -EAGAIN). timo is an optional timeout value for the operation. Return 0 on success, error code otherwise. - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags) Wake `nr_wake` threads waiting at uaddr. Return the number of woken threads on success, error code otherwise. ** The `flag` argument The flag is used to specify the size of the futex word (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no default size. By default, the timeout uses a monotonic clock, but can be used as a realtime one by using the FUTEX_REALTIME_CLOCK flag. By default, futexes are of the private type, that means that this user address will be accessed by threads that shares the same memory region. This allows for some internal optimizations, so they are faster. However, if the address needs to be shared with different processes (like using `mmap()` or `shm()`), they need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. By default, the operation has no NUMA-awareness, meaning that the user can't choose the memory node where the kernel side futex data will be stored. The user can choose the node where it wants to operate by setting the FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or 32): struct futexX_numa { __uX value; __sX hint; }; This structure should be passed at the `void *uaddr` of futex functions. The address of the structure will be used to be waited/waken on, and the `value` will be compared to `val` as usual. The `hint` member is used to defined which node the futex will use. When waiting, the futex will be registered on a kernel-side table stored on that node; when waking, the futex will be searched for on that given table. That means that there's no redundancy between tables, and the wrong `hint` value will led to undesired behavior. Userspace is responsible for dealing with node migrations issues that may occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use the same node the current process is using. When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a global table on some node, defined at compilation time. ** The `timo` argument As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout options known to be buggy. Given that, `timo` should be a 64bit timeout at all platforms, using an absolute timeout value. Signed-off-by: André Almeida --- MAINTAINERS | 2 +- arch/arm/tools/syscall.tbl| 2 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 4 + arch/x86/entry/syscalls/syscall_32.tbl| 2 + arch/x86/entry/syscalls/syscall_64.tbl| 2 + include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 8 +- include/uapi/linux/futex.h| 56 ++ init/Kconfig | 7 + kernel/Makefile | 1 + kernel/futex2.c | 625 ++ kernel/sys_ni.c | 4 + tools/include/uapi/asm-generic/unistd.h | 8 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 2 + 15 files changed, 728 insertions(+), 4 deletions(-) create mode 100644 kernel/futex2.c diff --git a/MAINTAINERS b/MAINTAINERS index bfc1b86e3e73..86ed91b72aad 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7332,7 +7332,7 @@ F:Documentation/locking/*futex* F: include/asm-generic/futex.h F: include/linux/futex.h F: include/uapi/linux/futex.h -F: kernel/futex.c +F: kernel/futex* F: tools/perf/bench/futex* F: tools/testing/selftests/futex/ diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 20e1170e2e0a..4eef220cd2a2 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -455,3 +455,5 @@ 439common faccessat2 sys_faccessat2 440common process_madvise sys_process_madvise 441common epoll_pwait2sys_epoll_pwait2 +
[RFC PATCH 02/13] futex2: Add support for shared futexes
Add support for shared futexes for cross-process resources. This design relies on the same approach done in old futex to create an unique id for file-backed shared memory, by using a counter at struct inode. There are two types of futexes: private and shared ones. The private are futexes meant to be used by threads that shares the same memory space, are easier to be uniquely identified an thus can have some performance optimization. The elements for identifying one are: the start address of the page where the address is, the address offset within the page and the current->mm pointer. Now, for uniquely identifying shared futex: - If the page containing the user address is an anonymous page, we can just use the same data used for private futexes (the start address of the page, the address offset within the page and the current->mm pointer) that will be enough for uniquely identifying such futex. We also set one bit at the key to differentiate if a private futex is used on the same address (mixing shared and private calls are not allowed). - If the page is file-backed, current->mm maybe isn't the same one for every user of this futex, so we need to use other data: the page->index, an UUID for the struct inode and the offset within the page. Note that members of futex_key doesn't have any particular meaning after they are part of the struct - they are just bytes to identify a futex. Given that, we don't need to use a particular name or type that matches the original data, we only need to care about the bitsize of each component and make both private and shared data fit in the same memory space. Signed-off-by: André Almeida --- fs/inode.c | 1 + include/linux/fs.h | 1 + kernel/futex2.c| 220 +++-- 3 files changed, 217 insertions(+), 5 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 6442d97d9a4a..886fe11ccdaf 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -139,6 +139,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) inode->i_blkbits = sb->s_blocksize_bits; inode->i_flags = 0; atomic64_set(&inode->i_sequence, 0); + atomic64_set(&inode->i_sequence2, 0); atomic_set(&inode->i_count, 1); inode->i_op = &empty_iops; inode->i_fop = &no_open_fops; diff --git a/include/linux/fs.h b/include/linux/fs.h index fd47deea7c17..516bda982b3d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -681,6 +681,7 @@ struct inode { }; atomic64_t i_version; atomic64_t i_sequence; /* see futex */ + atomic64_t i_sequence2; /* see futex2 */ atomic_ti_count; atomic_ti_dio_count; atomic_ti_writecount; diff --git a/kernel/futex2.c b/kernel/futex2.c index 802578ad695b..27767b2d060a 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -14,8 +14,10 @@ */ #include +#include #include #include +#include #include #include #include @@ -23,8 +25,8 @@ /** * struct futex_key - Components to build unique key for a futex - * @pointer: Pointer to current->mm - * @index: Start address of the page containing futex + * @pointer: Pointer to current->mm or inode's UUID for file backed futexes + * @index: Start address of the page containing futex or index of the page * @offset: Address offset of uaddr in a page */ struct futex_key { @@ -97,6 +99,11 @@ struct futex_single_waiter { /* Mask for each futex in futex_waitv list */ #define FUTEXV_WAITER_MASK (FUTEX_SIZE_MASK | FUTEX_SHARED_FLAG) +#define is_object_shared ((futexv->objects[i].flags & FUTEX_SHARED_FLAG) ? true : false) + +#define FUT_OFF_INODE1 /* We set bit 0 if key has a reference on inode */ +#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */ + struct futex_bucket *futex_table; unsigned int futex2_hashsize; @@ -143,16 +150,200 @@ static inline int bucket_get_waiters(struct futex_bucket *bucket) #endif } +/** + * futex_get_inode_uuid - Gets an UUID for an inode + * @inode: inode to get UUID + * + * Generate a machine wide unique identifier for this inode. + * + * This relies on u64 not wrapping in the life-time of the machine; which with + * 1ns resolution means almost 585 years. + * + * This further relies on the fact that a well formed program will not unmap + * the file while it has a (shared) futex waiting on it. This mapping will have + * a file reference which pins the mount and inode. + * + * If for some reason an inode gets evicted and read back in again, it will get + * a new sequence number and will _NOT_ match, even though it is the exact same + * file. + * + * It is important that match_futex() will never have a false-positive, esp. + * for PI futexes that can mess up the state. The above argues that false-negative
[RFC PATCH 03/13] futex2: Implement vectorized wait
Add support to wait on multiple futexes. This is the interface implemented by this syscall: futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timo) struct futex_waitv { void *uaddr; unsigned int val; unsigned int flags; }; Given an array of struct futex_waitv, wait on each uaddr. The thread wakes if a futex_wake() is performed at any uaddr. The syscall returns immediately if any waiter has *uaddr != val. *timo is an optional timeout value for the operation. The flags argument of the syscall should be used solely for specifying the timeout as realtime, if needed. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. Returns the array index of one of the awakened futexes. There’s no given information of how many were awakened, or any particular attribute of it (if it’s the first awakened, if it is of the smaller index...). Signed-off-by: André Almeida --- arch/arm/tools/syscall.tbl| 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/x86/entry/syscalls/syscall_32.tbl| 1 + arch/x86/entry/syscalls/syscall_64.tbl| 1 + include/linux/compat.h| 11 ++ include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 5 +- kernel/futex2.c | 171 ++ kernel/sys_ni.c | 1 + tools/include/uapi/asm-generic/unistd.h | 5 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 1 + 11 files changed, 200 insertions(+), 3 deletions(-) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 4eef220cd2a2..6d0f6626a166 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -457,3 +457,4 @@ 441common epoll_pwait2sys_epoll_pwait2 442common futex_wait sys_futex_wait 443common futex_wake sys_futex_wake +444common futex_waitv sys_futex_waitv diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index d1f7d35f986e..64ebdc1ec581 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 444 +#define __NR_compat_syscalls 445 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index ece90c8d9739..fe242fa0baea 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -448,3 +448,4 @@ 441i386epoll_pwait2sys_epoll_pwait2 compat_sys_epoll_pwait2 442i386futex_wait sys_futex_wait 443i386futex_wake sys_futex_wake +444i386futex_waitv sys_futex_waitv compat_sys_futex_waitv diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 72fb65ef996a..9d0f07e054d1 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -365,6 +365,7 @@ 441common epoll_pwait2sys_epoll_pwait2 442common futex_wait sys_futex_wait 443common futex_wake sys_futex_wake +444common futex_waitv sys_futex_waitv # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/compat.h b/include/linux/compat.h index 6e65be753603..b63dc0899c2a 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -365,6 +365,12 @@ struct compat_robust_list_head { compat_uptr_t list_op_pending; }; +struct compat_futex_waitv { + compat_uptr_t uaddr; + compat_uint_t val; + compat_uint_t flags; +}; + #ifdef CONFIG_COMPAT_OLD_SIGACTION struct compat_old_sigaction { compat_uptr_t sa_handler; @@ -654,6 +660,11 @@ asmlinkage long compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr, compat_size_t __user *len_ptr); +/* kernel/futex2.c */ +asmlinkage long compat_sys_futex_waitv(struct compat_futex_waitv *waiters, + compat_uint_t nr_futexes, compat_uint_t flags + struct __kernel_timespec __user *timo); + /* kernel/itimer.c */ asmlinkage long compat_sys_getitimer(int which, struct old_itimerval32 __user *it); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index bf146c2b0c77..7da1ceb3685d 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -68,6 +68,7
[RFC PATCH 04/13] futex2: Implement requeue operation
Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation. This is the syscall implemented by this patch: futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2, unsigned int nr_wake, unsigned int nr_requeue, unsigned int cmpval, unsigned int flags) struct futex_requeue { void *uaddr; unsigned int flags; }; If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr and add them to uaddr2->uaddr list. Each uaddr has its own set of flags, that must be defined at struct futex_requeue (such as size, shared, NUMA). The flags argument of the syscall is there just for the sake of extensibility, and right now it needs to be zero. Return the number of the woken futexes + the number of requeued ones on success, error code otherwise. Signed-off-by: André Almeida --- The original FUTEX_CMP_REQUEUE interfaces is such as follows: futex(*uaddr1, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, *uaddr2, cmpval); Given that when this interface was created they was only one type of futex (as opposed to futex2, where there is shared, sizes, and NUMA), there was no way to specify individual flags for uaddr1 and 2. When FUTEX_PRIVATE was implemented, a new opcode was created as well (FUTEX_CMP_REQUEUE_PRIVATE), but they apply both futexes, so they should be of the same type regarding private/shared. This imposes a limitation on the use cases of the operation, and to overcome that at futex2, `struct futex_requeue` was created, so one can set individual flags for each futex. This flexibility is a trade-off with performance, given that now we need to perform two extra copy_from_user(). One alternative would be to use the upper half of flags bits to the first one, and the bottom half for the second futex, but this would also impose limitations, given that we would limit by half the flags possibilities. If equal futexes are common enough, the following extension could be added to overcome the current performance: - A flag FUTEX_REQUEUE_EQUAL is added to futex2() flags; - If futex_requeue() see this flag, that means that both futexes uses the same set of attributes. - Then, the function parses the flags as of futex_wait/wake(). - *uaddr1 and *uaddr2 are used as void* (instead of struct futex_requeue) just like wait/wake(). In that way, we could avoid the copy_from_user(). --- arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h| 2 +- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/compat.h | 12 ++ include/linux/syscalls.h | 5 + include/uapi/asm-generic/unistd.h | 5 +- kernel/futex2.c| 215 + kernel/sys_ni.c| 1 + 9 files changed, 241 insertions(+), 2 deletions(-) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6d0f6626a166..9aa1088024b0 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -458,3 +458,4 @@ 442common futex_wait sys_futex_wait 443common futex_wake sys_futex_wake 444common futex_waitv sys_futex_waitv +445common futex_requeue sys_futex_requeue diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 64ebdc1ec581..d1cc2849dc00 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 445 +#define __NR_compat_syscalls 446 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index fe242fa0baea..0cd1df235e70 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -449,3 +449,4 @@ 442i386futex_wait sys_futex_wait 443i386futex_wake sys_futex_wake 444i386futex_waitv sys_futex_waitv compat_sys_futex_waitv +445i386futex_requeue sys_futex_requeue compat_sys_futex_requeue diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 9d0f07e054d1..abbfddcdb15d 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -366,6 +366,7 @@ 442common futex_wait sys_futex_wait 443common futex_wake sys_futex_wake 444common futex_waitv sys_futex_waitv +445common futex_requeue sys_futex_requeue # # Due to a historical design error, certain syscalls
[RFC PATCH 08/13] selftests: futex2: Add timeout test
Adapt existing futex wait timeout file to test the same mechanism for futex2. futex2 accepts only absolute 64bit timers, but supports both monotonic and realtime clocks. Signed-off-by: André Almeida --- .../futex/functional/futex_wait_timeout.c | 58 --- 1 file changed, 49 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_timeout.c b/tools/testing/selftests/futex/functional/futex_wait_timeout.c index ee55e6d389a3..b4dffe9e3b44 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c +++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c @@ -11,6 +11,7 @@ * * HISTORY * 2009-Nov-6: Initial version by Darren Hart + * 2021-Feb-5: Add futex2 test by André * */ @@ -20,7 +21,7 @@ #include #include #include -#include "futextest.h" +#include "futex2test.h" #include "logging.h" #define TEST_NAME "futex-wait-timeout" @@ -40,7 +41,8 @@ void usage(char *prog) int main(int argc, char *argv[]) { futex_t f1 = FUTEX_INITIALIZER; - struct timespec to; + struct timespec to = {.tv_sec = 0, .tv_nsec = timeout_ns}; + struct timespec64 to64; int res, ret = RET_PASS; int c; @@ -65,22 +67,60 @@ int main(int argc, char *argv[]) } ksft_print_header(); - ksft_set_plan(1); + ksft_set_plan(3); ksft_print_msg("%s: Block on a futex and wait for timeout\n", basename(argv[0])); ksft_print_msg("\tArguments: timeout=%ldns\n", timeout_ns); - /* initialize timeout */ - to.tv_sec = 0; - to.tv_nsec = timeout_ns; - info("Calling futex_wait on f1: %u @ %p\n", f1, &f1); res = futex_wait(&f1, f1, &to, FUTEX_PRIVATE_FLAG); if (!res || errno != ETIMEDOUT) { - fail("futex_wait returned %d\n", ret < 0 ? errno : ret); + ksft_test_result_fail("futex_wait returned %d\n", ret < 0 ? errno : ret); + ret = RET_FAIL; + } else { + ksft_test_result_pass("futex_wait timeout succeeds\n"); + } + + /* setting absolute monotonic timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + info("Calling futex2_wait on f1: %u @ %p\n", f1, &f1); + res = futex2_wait(&f1, f1, FUTEX_32, &to64); + if (!res || errno != ETIMEDOUT) { + ksft_test_result_fail("futex2_wait monotonic returned %d\n", ret < 0 ? errno : ret); + ret = RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait monotonic timeout succeeds\n"); + } + + /* setting absolute realtime timeout for futex2 */ + if (gettime64(CLOCK_REALTIME, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + info("Calling futex2_wait on f1: %u @ %p\n", f1, &f1); + res = futex2_wait(&f1, f1, FUTEX_32 | FUTEX_CLOCK_REALTIME, &to64); + if (!res || errno != ETIMEDOUT) { + ksft_test_result_fail("futex2_wait realtime returned %d\n", ret < 0 ? errno : ret); ret = RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait realtime timeout succeeds\n"); } - print_result(TEST_NAME, ret); + ksft_print_cnts(); return ret; } -- 2.30.1
[RFC PATCH 06/13] docs: locking: futex2: Add documentation
Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida --- Documentation/locking/futex2.rst | 198 +++ Documentation/locking/index.rst | 1 + 2 files changed, 199 insertions(+) create mode 100644 Documentation/locking/futex2.rst diff --git a/Documentation/locking/futex2.rst b/Documentation/locking/futex2.rst new file mode 100644 index ..edd47c22f2df --- /dev/null +++ b/Documentation/locking/futex2.rst @@ -0,0 +1,198 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +futex2 +== + +:Author: André Almeida + +futex, or fast user mutex, is a set of syscalls to allow the userspace to create +performant synchronization mechanisms, such as mutexes, semaphores and +conditional variables in userspace. C standard libraries, like glibc, uses it +as means to implements more high level interfaces like pthreads. + +The interface += + +uAPI functions +-- + +.. kernel-doc:: kernel/futex2.c + :identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue + +uAPI structures +--- + +.. kernel-doc:: include/uapi/linux/futex.h + +The ``flag`` argument +- + +The flag is used to specify the size of the futex word +(FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no +default size. + +By default, the timeout uses a monotonic clock, but can be used as a realtime +one by using the FUTEX_REALTIME_CLOCK flag. + +By default, futexes are of the private type, that means that this user address +will be accessed by threads that shares the same memory region. This allows for +some internal optimizations, so they are faster. However, if the address needs +to be shared with different processes (like using ``mmap()`` or ``shm()``), they +need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. + +By default, the operation has no NUMA-awareness, meaning that the user can't +choose the memory node where the kernel side futex data will be stored. The +user can choose the node where it wants to operate by setting the +FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or +32):: + + struct futexX_numa { + __uX value; + __sX hint; + }; + +This structure should be passed at the ``void *uaddr`` of futex functions. The +address of the structure will be used to be waited on/waken on, and the +``value`` will be compared to ``val`` as usual. The ``hint`` member is used to +defined which node the futex will use. When waiting, the futex will be +registered on a kernel-side table stored on that node; when waking, the futex +will be searched for on that given table. That means that there's no redundancy +between tables, and the wrong ``hint`` value will led to undesired behavior. +Userspace is responsible for dealing with node migrations issues that may +occur. ``hint`` can range from [0, MAX_NUMA_NODES], for specifying a node, or +-1, to use the same node the current process is using. + +When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a +global table on some node, defined at compilation time. + +The ``timo`` argument +- + +As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout +options known to be buggy. Given that, ``timo`` should be a 64bit timeout at +all platforms, using an absolute timeout value. + +Implementation +== + +The internal implementation follows a similar design to the original futex. +Given that we want to replicate the same external behavior of current futex, +this should be somewhat expected. + +Waiting +--- + +For the wait operations, they are all treated as if you want to wait on N +futexes, so the path for futex_wait and futex_waitv is the basically the same. +For both syscalls, the first step is to prepare an internal list for the list +of futexes to wait for (using struct futexv_head). For futex_wait() calls, this +list will have a single object. + +We have a hash table, were waiters register themselves before sleeping. Then, +the wake function checks this table looking for waiters at uaddr. The hash +bucket to be used is determined by a struct futex_key, that stores information +to uniquely identify an address from a given process. Given the huge address +space, there'll be hash collisions, so we store information to be later used on +collision treatment. + +First, for every futex we want to wait on, we check if (``*uaddr == val``). +This check is done holding the bucket lock, so we are correctly serialized with +any futex_wake() calls. If any waiter fails the check above, we dequeue all +futexes. The check (``*uaddr == val``) can fail for two reasons: + +- The values are different, and we return -EAGAIN. However, if while + dequeueing we found that some futex were awakened, we prioritize this + and return success. + +-
[RFC PATCH 07/13] selftests: futex2: Add wake/wait test
Add a simple file to test wake/wait mechanism using futex2 interface. Test three scenarios: using a common local int variable as private futex, a shm futex as shared futex and a file-backed shared memory as a shared futex. This should test all branches of futex_get_key(). Create helper files so more tests can evaluate futex2. While 32bit ABIs from glibc aren't yet able to use 64 bit sized time variables, add a temporary workaround that implements the required types and calls the appropriated syscalls, since futex2 doesn't supports 32 bit sized time. Signed-off-by: André Almeida --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 6 +- .../selftests/futex/functional/futex2_wait.c | 209 ++ .../testing/selftests/futex/functional/run.sh | 3 + .../selftests/futex/include/futex2test.h | 79 +++ 5 files changed, 296 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c create mode 100644 tools/testing/selftests/futex/include/futex2test.h diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index 0efcd494daab..d61f1df94360 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -6,3 +6,4 @@ futex_wait_private_mapped_file futex_wait_timeout futex_wait_uninitialized_heap futex_wait_wouldblock +futex2_wait diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index 23207829ec75..9b334f190759 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -1,10 +1,11 @@ # SPDX-License-Identifier: GPL-2.0 -INCLUDES := -I../include -I../../ +INCLUDES := -I../include -I../../ -I../../../../../usr/include/ CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) LDLIBS := -lpthread -lrt HEADERS := \ ../include/futextest.h \ + ../include/futex2test.h \ ../include/atomic.h \ ../include/logging.h TEST_GEN_FILES := \ @@ -14,7 +15,8 @@ TEST_GEN_FILES := \ futex_requeue_pi_signal_restart \ futex_requeue_pi_mismatched_ops \ futex_wait_uninitialized_heap \ - futex_wait_private_mapped_file + futex_wait_private_mapped_file \ + futex2_wait TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/futex2_wait.c b/tools/testing/selftests/futex/functional/futex2_wait.c new file mode 100644 index ..4b5416585c79 --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_wait.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/** + * + * Copyright Collabora Ltd., 2021 + * + * DESCRIPTION + * Test wait/wake mechanism of futex2, using 32bit sized futexes. + * + * AUTHOR + * André Almeida + * + * HISTORY + * 2021-Feb-5: Initial version by André + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "futex2test.h" +#include "logging.h" + +#define TEST_NAME "futex2-wait" +#define timeout_ns 3000 +#define WAKE_WAIT_US 1 +#define SHM_PATH "futex2_shm_file" +futex_t *f1; + +void usage(char *prog) +{ + printf("Usage: %s\n", prog); + printf(" -cUse color\n"); + printf(" -hDisplay this help message\n"); + printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", + VQUIET, VCRITICAL, VINFO); +} + +void *waiterfn(void *arg) +{ + struct timespec64 to64; + unsigned int flags = 0; + + if (arg) + flags = *((unsigned int *) arg); + + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + if (futex2_wait(f1, *f1, FUTEX_32 | flags, &to64)) + printf("waiter failed errno %d\n", errno); + + return NULL; +} + +void *waitershm(void *arg) +{ + futex2_wait(arg, 0, FUTEX_32 | FUTEX_SHARED_FLAG, NULL); + + return NULL; +} + +int main(int argc, char *argv[]) +{ + pthread_t waiter; + unsigned int flags = FUTEX_SHARED_FLAG; + int res, ret = RET_PASS; + int c; + futex_t f_private = 0; + + f1 = &f_private; + + while ((c = getopt(argc, argv, "cht:v:")) != -1) { +
[RFC PATCH 05/13] futex2: Add compatibility entry point for x86_x32 ABI
New syscalls should use the same entry point for x86_64 and x86_x32 paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida --- kernel/futex2.c | 42 +++--- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/kernel/futex2.c b/kernel/futex2.c index bad8c183cf9f..8a8b45f98d3b 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -23,6 +23,10 @@ #include #include +#ifdef CONFIG_X86_64 +#include +#endif + /** * struct futex_key - Components to build unique key for a futex * @pointer: Pointer to current->mm or inode's UUID for file backed futexes @@ -875,7 +879,16 @@ SYSCALL_DEFINE4(futex_waitv, struct futex_waitv __user *, waiters, futexv->hint = false; futexv->task = current; - ret = futex_parse_waitv(futexv, waiters, nr_futexes); +#ifdef CONFIG_X86_X32_ABI + if (in_x32_syscall()) { + ret = compat_futex_parse_waitv(futexv, (struct compat_futex_waitv *)waiters, + nr_futexes); + } else +#endif + { + ret = futex_parse_waitv(futexv, waiters, nr_futexes); + } + if (!ret) ret = futex_set_timer_and_wait(futexv, nr_futexes, timo, flags); @@ -1181,13 +1194,28 @@ SYSCALL_DEFINE6(futex_requeue, struct futex_requeue __user *, uaddr1, if (flags) return -EINVAL; - ret = futex_parse_requeue(&rq1, uaddr1, &shared1); - if (ret) - return ret; +#ifdef CONFIG_X86_X32_ABI + if (in_x32_syscall()) { + ret = compat_futex_parse_requeue(&rq1, (struct compat_futex_requeue *)uaddr1, +&shared1); + if (ret) + return ret; - ret = futex_parse_requeue(&rq2, uaddr2, &shared2); - if (ret) - return ret; + ret = compat_futex_parse_requeue(&rq2, (struct compat_futex_requeue *)uaddr2, +&shared2); + if (ret) + return ret; + } else +#endif + { + ret = futex_parse_requeue(&rq1, uaddr1, &shared1); + if (ret) + return ret; + + ret = futex_parse_requeue(&rq2, uaddr2, &shared2); + if (ret) + return ret; + } return __futex_requeue(rq1, rq2, nr_wake, nr_requeue, cmpval, shared1, shared2); } -- 2.30.1
[RFC PATCH 10/13] selftests: futex2: Add waitv test
Create a new file to test the waitv mechanism. Test both private and shared futexes. Wake the last futex in the array, and check if the return value from futex_waitv() is the right index. Signed-off-by: André Almeida --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../selftests/futex/functional/futex2_waitv.c | 157 ++ .../testing/selftests/futex/functional/run.sh | 3 + .../selftests/futex/include/futex2test.h | 26 +++ 5 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index d61f1df94360..d0b8f637b786 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -7,3 +7,4 @@ futex_wait_timeout futex_wait_uninitialized_heap futex_wait_wouldblock futex2_wait +futex2_waitv diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index 9b334f190759..09c08ccdeaf2 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -16,7 +16,8 @@ TEST_GEN_FILES := \ futex_requeue_pi_mismatched_ops \ futex_wait_uninitialized_heap \ futex_wait_private_mapped_file \ - futex2_wait + futex2_wait \ + futex2_waitv TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/futex2_waitv.c b/tools/testing/selftests/futex/functional/futex2_waitv.c new file mode 100644 index ..2f81d296d95d --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_waitv.c @@ -0,0 +1,157 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/** + * + * Copyright Collabora Ltd., 2021 + * + * DESCRIPTION + * Test waitv/wake mechanism of futex2, using 32bit sized futexes. + * + * AUTHOR + * André Almeida + * + * HISTORY + * 2021-Feb-5: Initial version by André + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "futex2test.h" +#include "logging.h" + +#define TEST_NAME "futex2-wait" +#define timeout_ns 10 +#define WAKE_WAIT_US 1 +#define NR_FUTEXES 30 +struct futex_waitv waitv[NR_FUTEXES]; +u_int32_t futexes[NR_FUTEXES] = {0}; + +void usage(char *prog) +{ + printf("Usage: %s\n", prog); + printf(" -cUse color\n"); + printf(" -hDisplay this help message\n"); + printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", + VQUIET, VCRITICAL, VINFO); +} + +void *waiterfn(void *arg) +{ + struct timespec64 to64; + int res; + + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_sec++; + + res = futex2_waitv(waitv, NR_FUTEXES, 0, &to64); + if (res < 0) { + ksft_test_result_fail("futex2_waitv private returned: %d %s\n", + res ? errno : res, + res ? strerror(errno) : ""); + } else if (res != NR_FUTEXES - 1) { + ksft_test_result_fail("futex2_waitv private returned: %d %s\n", + res ? errno : res, + res ? strerror(errno) : ""); + } + + return NULL; +} + +int main(int argc, char *argv[]) +{ + pthread_t waiter; + int res, ret = RET_PASS; + int c, i; + + while ((c = getopt(argc, argv, "cht:v:")) != -1) { + switch (c) { + case 'c': + log_color(1); + break; + case 'h': + usage(basename(argv[0])); + exit(0); + case 'v': + log_verbosity(atoi(optarg)); + break; + default: + usage(basename(argv[0])); + exit(1); + } + } + + ksft_print_header(); + ksft_set_plan(2); + ksft_print_msg("%s: Test FUTEX2_WAITV\n", + basename(argv[0])); + + for (i = 0; i < NR_FUTEXES; i++) { + waitv[i].uaddr = &futexes[i]; + waitv[i].flags = FUTEX_32; + waitv[i].val = 0; + } + + /* Private waitv */ + if (pthread_create(&waiter, NULL, waite
[RFC PATCH 11/13] selftests: futex2: Add requeue test
Add testing for futex_requeue(). The first test just requeue from one waiter to another one, and wake it. The second performs both wake and requeue, and we check return values to see if the operation woke/requeued the expected number of waiters. Signed-off-by: André Almeida --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../futex/functional/futex2_requeue.c | 164 ++ .../selftests/futex/include/futex2test.h | 16 ++ 4 files changed, 183 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index d0b8f637b786..af7557e821da 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -8,3 +8,4 @@ futex_wait_uninitialized_heap futex_wait_wouldblock futex2_wait futex2_waitv +futex2_requeue diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index 09c08ccdeaf2..3ccb9ea58ddd 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -17,7 +17,8 @@ TEST_GEN_FILES := \ futex_wait_uninitialized_heap \ futex_wait_private_mapped_file \ futex2_wait \ - futex2_waitv + futex2_waitv \ + futex2_requeue TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/futex2_requeue.c b/tools/testing/selftests/futex/functional/futex2_requeue.c new file mode 100644 index ..1bc3704dc8c2 --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_requeue.c @@ -0,0 +1,164 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/** + * + * Copyright Collabora Ltd., 2021 + * + * DESCRIPTION + * Test requeue mechanism of futex2, using 32bit sized futexes. + * + * AUTHOR + * André Almeida + * + * HISTORY + * 2021-Feb-5: Initial version by André + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "futex2test.h" +#include "logging.h" + +#define TEST_NAME "futex2-wait" +#define timeout_ns 3000 +#define WAKE_WAIT_US 1 +volatile futex_t *f1; + +void usage(char *prog) +{ + printf("Usage: %s\n", prog); + printf(" -cUse color\n"); + printf(" -hDisplay this help message\n"); + printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", + VQUIET, VCRITICAL, VINFO); +} + +void *waiterfn(void *arg) +{ + struct timespec64 to64; + + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + if (futex2_wait(f1, *f1, FUTEX_32, &to64)) + printf("waiter failed errno %d\n", errno); + + return NULL; +} + +int main(int argc, char *argv[]) +{ + pthread_t waiter[10]; + int res, ret = RET_PASS; + int c, i; + volatile futex_t _f1 = 0; + volatile futex_t f2 = 0; + struct futex_requeue r1, r2; + + f1 = &_f1; + + r1.flags = FUTEX_32; + r2.flags = FUTEX_32; + + r1.uaddr = f1; + r2.uaddr = &f2; + + while ((c = getopt(argc, argv, "cht:v:")) != -1) { + switch (c) { + case 'c': + log_color(1); + break; + case 'h': + usage(basename(argv[0])); + exit(0); + case 'v': + log_verbosity(atoi(optarg)); + break; + default: + usage(basename(argv[0])); + exit(1); + } + } + + ksft_print_header(); + ksft_set_plan(2); + ksft_print_msg("%s: Test FUTEX2_REQUEUE\n", + basename(argv[0])); + + /* +* Requeue a waiter from f1 to f2, and wake f2. +*/ + if (pthread_create(&waiter[0], NULL, waiterfn, NULL)) + error("pthread_create failed\n", errno); + + usleep(WAKE_WAIT_US); + + res = futex2_requeue(&r1, &r2, 0, 1, 0, 0); + if (res != 1) { + ksft_test_result_fail("futex2_requeue private returned: %d %s\n", +
[RFC PATCH 12/13] perf bench: Add futex2 benchmark tests
Add support at the existing futex benchmarking code base to enable futex2 calls. `perf bench` tests can be used not only as a way to measure the performance of implementation, but also as stress testing for the kernel infrastructure. Signed-off-by: André Almeida --- tools/arch/x86/include/asm/unistd_64.h | 12 ++ tools/perf/bench/bench.h | 4 ++ tools/perf/bench/futex-hash.c | 24 +-- tools/perf/bench/futex-requeue.c | 57 -- tools/perf/bench/futex-wake-parallel.c | 41 +++--- tools/perf/bench/futex-wake.c | 37 + tools/perf/bench/futex.h | 47 + tools/perf/builtin-bench.c | 18 ++-- 8 files changed, 206 insertions(+), 34 deletions(-) diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/asm/unistd_64.h index 4205ed4158bf..cf5ad4ea1c1f 100644 --- a/tools/arch/x86/include/asm/unistd_64.h +++ b/tools/arch/x86/include/asm/unistd_64.h @@ -17,3 +17,15 @@ #ifndef __NR_setns #define __NR_setns 308 #endif + +#ifndef __NR_futex_wait +# define __NR_futex_wait 442 +#endif + +#ifndef __NR_futex_wake +# define __NR_futex_wake 443 +#endif + +#ifndef __NR_futex_requeue +# define __NR_futex_requeue 445 +#endif diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h index eac36afab2b3..12346844b354 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -38,9 +38,13 @@ int bench_mem_memcpy(int argc, const char **argv); int bench_mem_memset(int argc, const char **argv); int bench_mem_find_bit(int argc, const char **argv); int bench_futex_hash(int argc, const char **argv); +int bench_futex2_hash(int argc, const char **argv); int bench_futex_wake(int argc, const char **argv); +int bench_futex2_wake(int argc, const char **argv); int bench_futex_wake_parallel(int argc, const char **argv); +int bench_futex2_wake_parallel(int argc, const char **argv); int bench_futex_requeue(int argc, const char **argv); +int bench_futex2_requeue(int argc, const char **argv); /* pi futexes */ int bench_futex_lock_pi(int argc, const char **argv); int bench_epoll_wait(int argc, const char **argv); diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index 915bf3da7ce2..6e62e7708fde 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -34,7 +34,7 @@ static unsigned int nthreads = 0; static unsigned int nsecs= 10; /* amount of futexes per thread */ static unsigned int nfutexes = 1024; -static bool fshared = false, done = false, silent = false; +static bool fshared = false, done = false, silent = false, futex2 = false; static int futex_flag = 0; struct timeval bench__start, bench__end, bench__runtime; @@ -86,7 +86,10 @@ static void *workerfn(void *arg) * such as internal waitqueue handling, thus enlarging * the critical region protected by hb->lock. */ - ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag); + if (!futex2) + ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag); + else + ret = futex2_wait(&w->futex[i], 1234, futex_flag, NULL); if (!silent && (!ret || errno != EAGAIN || errno != EWOULDBLOCK)) warn("Non-expected futex return call"); @@ -117,7 +120,7 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } -int bench_futex_hash(int argc, const char **argv) +static int __bench_futex_hash(int argc, const char **argv) { int ret = 0; cpu_set_t cpuset; @@ -149,7 +152,9 @@ int bench_futex_hash(int argc, const char **argv) if (!worker) goto errmem; - if (!fshared) + if (futex2) + futex_flag = FUTEX_32 | (fshared * FUTEX_SHARED_FLAG); + else if (!fshared) futex_flag = FUTEX_PRIVATE_FLAG; printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n", @@ -229,3 +234,14 @@ int bench_futex_hash(int argc, const char **argv) errmem: err(EXIT_FAILURE, "calloc"); } + +int bench_futex_hash(int argc, const char **argv) +{ + return __bench_futex_hash(argc, argv); +} + +int bench_futex2_hash(int argc, const char **argv) +{ + futex2 = true; + return __bench_futex_hash(argc, argv); +} diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c index 7a15c2e61022..4c7486fbe923 100644 --- a/tools/perf/bench/futex-requeue.c +++ b/tools/perf/bench/futex-requeue.c @@ -2,8 +2,8 @@ /* * Copyright (C) 2013 Davidlohr Bueso * - * futex-requeue: Block a bunch of threads on futex1 and requeue them - *on futex2, N at a
[RFC PATCH 09/13] selftests: futex2: Add wouldblock test
Adapt existing futex wait wouldblock file to test the same mechanism for futex2. Signed-off-by: André Almeida --- .../futex/functional/futex_wait_wouldblock.c | 33 --- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c index 0ae390ff8164..ed3660090907 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c +++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c @@ -12,6 +12,7 @@ * * HISTORY * 2009-Nov-14: Initial version by Gowrishankar + * 2021-Feb-5: Add futex2 test by André * */ @@ -21,7 +22,7 @@ #include #include #include -#include "futextest.h" +#include "futex2test.h" #include "logging.h" #define TEST_NAME "futex-wait-wouldblock" @@ -39,6 +40,7 @@ void usage(char *prog) int main(int argc, char *argv[]) { struct timespec to = {.tv_sec = 0, .tv_nsec = timeout_ns}; + struct timespec64 to64; futex_t f1 = FUTEX_INITIALIZER; int res, ret = RET_PASS; int c; @@ -61,18 +63,41 @@ int main(int argc, char *argv[]) } ksft_print_header(); - ksft_set_plan(1); + ksft_set_plan(2); ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n", basename(argv[0])); info("Calling futex_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1); res = futex_wait(&f1, f1+1, &to, FUTEX_PRIVATE_FLAG); if (!res || errno != EWOULDBLOCK) { - fail("futex_wait returned: %d %s\n", + ksft_test_result_fail("futex_wait returned: %d %s\n", res ? errno : res, res ? strerror(errno) : ""); ret = RET_FAIL; + } else { + ksft_test_result_pass("futex_wait wouldblock succeeds\n"); } - print_result(TEST_NAME, ret); + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + info("Calling futex2_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1); + res = futex2_wait(&f1, f1+1, FUTEX_32, &to64); + if (!res || errno != EWOULDBLOCK) { + ksft_test_result_fail("futex2_wait returned: %d %s\n", +res ? errno : res, res ? strerror(errno) : ""); + ret = RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait wouldblock succeeds\n"); + } + + ksft_print_cnts(); return ret; } -- 2.30.1
[RFC PATCH 13/13] kernel: Enable waitpid() for futex2
To make pthreads works as expected if they are using futex2, wake clear_child_tid with futex2 as well. This is make applications that uses waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the child to terminate. Given that apps should not mix futex() and futex2(), any correct app will trigger a harmless noop wakeup on the interface that it isn't using. Signed-off-by: André Almeida --- This commit is here for the intend to show what we need to do in order to get a full NPTL working on top of futex2. It should be merged after we talk to glibc folks on the details around the futex_wait() side. For instance, we could use this as an opportunity to use private futexes or 8bit sized futexes, but both sides need to use the exactly same flags. --- include/linux/syscalls.h | 2 ++ kernel/fork.c| 2 ++ kernel/futex2.c | 30 ++ 3 files changed, 22 insertions(+), 12 deletions(-) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 06823bc7ef9d..38c2fc50ada9 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1312,6 +1312,8 @@ int ksys_ipc(unsigned int call, int first, unsigned long second, unsigned long third, void __user * ptr, long fifth); int compat_ksys_ipc(u32 call, int first, int second, u32 third, u32 ptr, u32 fifth); +long ksys_futex_wake(void __user *uaddr, unsigned long nr_wake, +unsigned int flags); /* * The following kernel syscall equivalents are just wrappers to fs-internal diff --git a/kernel/fork.c b/kernel/fork.c index d66cd1014211..e39846a73a43 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1308,6 +1308,8 @@ static void mm_release(struct task_struct *tsk, struct mm_struct *mm) put_user(0, tsk->clear_child_tid); do_futex(tsk->clear_child_tid, FUTEX_WAKE, 1, NULL, NULL, 0, 0); + ksys_futex_wake(tsk->clear_child_tid, 1, + FUTEX_32 | FUTEX_SHARED_FLAG); } tsk->clear_child_tid = NULL; } diff --git a/kernel/futex2.c b/kernel/futex2.c index 8a8b45f98d3b..a810b7f5c3a0 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -942,18 +942,8 @@ static inline bool futex_match(struct futex_key key1, struct futex_key key2) key1.offset == key2.offset); } -/** - * sys_futex_wake - Wake a number of futexes waiting on an address - * @uaddr: Address of futex to be woken up - * @nr_wake: Number of futexes waiting in uaddr to be woken up - * @flags: Flags for size and shared - * - * Wake `nr_wake` threads waiting at uaddr. - * - * Returns the number of woken threads on success, error code otherwise. - */ -SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, - unsigned int, flags) +long ksys_futex_wake(void __user *uaddr, unsigned long nr_wake, +unsigned int flags) { bool shared = (flags & FUTEX_SHARED_FLAG) ? true : false; unsigned int size = flags & FUTEX_SIZE_MASK; @@ -990,6 +980,22 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, return ret; } +/** + * sys_futex_wake - Wake a number of futexes waiting on an address + * @uaddr: Address of futex to be woken up + * @nr_wake: Number of futexes waiting in uaddr to be woken up + * @flags: Flags for size and shared + * + * Wake `nr_wake` threads waiting at uaddr. + * + * Returns the number of woken threads on success, error code otherwise. + */ +SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, + unsigned int, flags) +{ + return ksys_futex_wake(uaddr, nr_wake, flags); +} + static void futex_double_unlock(struct futex_bucket *b1, struct futex_bucket *b2) { spin_unlock(&b1->lock); -- 2.30.1
Re: [PATCH 00/16] media: vimc: Add support for multiplanar formats
On 3/15/19 1:43 PM, André Almeida wrote: Hello, This series implements support for multiplane pixel formats at vimc. A lot of changes were required since vimc support for singleplane was "hardcoded". The code has been adapted in order to support both formats. When was possible, the functions were written generically, avoiding functions for just one type of pixel format (single/multi) and favoring code reuse. The debayer subdevice is the only one that currently doesn't supports multiplanar formats. Documentation to each device will be made in a future patch. Forgot to mention that this patch series depends on this one: "[PATCH] media: vimc: propagate pixel format in the stream" Thanks, André André Almeida (16): media: Move sp2mp functions to v4l2-common media: vimc: Remove unnecessary stream check media: vimc: Check if the stream is on using ved.stream media: vimc: cap: Change vimc_cap_device.format type media: vimc: Create multiplanar parameter media: vimc: cap: Dynamically define stream pixelformat media: vimc: cap: Add handler for singleplanar fmt ioctls media: vimc: cap: Add handler for multiplanar fmt ioctls media: vimc: cap: Add multiplanar formats media: vimc: cap: Add multiplanar default format media: vimc: cap: Allocate and verify mplanar buffers media: vimc: Add and use new struct vimc_frame media: vimc: sen: Add support for multiplanar formats media: vimc: sca: Add support for multiplanar formats media: vimc: cap: Add support for multiplanar formats media: vimc: cap: Dynamically define device caps drivers/media/platform/vimc/vimc-capture.c| 310 +++--- drivers/media/platform/vimc/vimc-common.c | 37 +++ drivers/media/platform/vimc/vimc-common.h | 50 ++- drivers/media/platform/vimc/vimc-core.c | 8 + drivers/media/platform/vimc/vimc-debayer.c| 38 +-- drivers/media/platform/vimc/vimc-scaler.c | 125 --- drivers/media/platform/vimc/vimc-sensor.c | 62 ++-- drivers/media/platform/vimc/vimc-streamer.c | 2 +- drivers/media/platform/vivid/vivid-vid-cap.c | 6 +- .../media/platform/vivid/vivid-vid-common.c | 59 .../media/platform/vivid/vivid-vid-common.h | 9 - drivers/media/platform/vivid/vivid-vid-out.c | 6 +- drivers/media/v4l2-core/v4l2-common.c | 62 include/media/v4l2-common.h | 31 ++ 14 files changed, 580 insertions(+), 225 deletions(-)
Re: [RFC PATCH 01/13] futex2: Implement wait and wake functions
Hi Gabriel, Às 16:59 de 15/02/21, Gabriel Krisman Bertazi escreveu: André Almeida writes: +/** + * struct futexv_head - List of futexes to be waited + * @task:Task to be awaken + * @hint:Was someone on this list awakened? + * @objects: List of futexes + */ +struct futexv_head { + struct task_struct *task; + bool hint; + struct futex_waiter objects[0]; +}; this structure is also used for a single futex. maybe struct futex_waiter_head? One could argue that a single futex is a futexv of one element, but I can see that futex_waiter_head makes more sense. Fixed. +/** + * struct futex_single_waiter - Wrapper for a futexv_head of one element + * @futexv: Single futexv element + * @waiter: Single waiter element + */ +struct futex_single_waiter { + struct futexv_head futexv; + struct futex_waiter waiter; +} __packed; Is this struct necessary? can't you just allocate the necessary space, i.e. a struct futexv_head with 1 futexv_head->object? I don't feel that makes sense to use dynamic allocation for a fixed sized memory. Given that, using this struct was the way I found to have a futexv_head of a single element in a static allocation fashion. + + key->offset = address % PAGE_SIZE; + address -= key->offset; + key->pointer = (u64)address; + key->index = (unsigned long)current->mm; Why split the key in offset and pointer and waste 1/3 more space to store each key? We need three fields for storing the shared key in the current design, and given that the futex key currently lives inside struct futex_waiter, private and shared keys need to use the same amount of space. Even if I don't use offset for now, the next patch would expand the memory anyway. I see that the way I organized the patches made this confusing. To avoid that we could allocate the key space in futex_wait and make futex key point there. + + /* Generate hash key for this futex using uaddr and current->mm */ + hash_key = jhash2((u32 *)key, sizeof(*key) / sizeof(u32), 0); + + /* Since HASH_SIZE is 2^n, subtracting 1 makes a perfect bit mask */ + return &futex_table[hash_key & (futex2_hashsize - 1)]; If someone inadvertely changes futex2_hashsize to something not 2^n this will silently break. futex2_hashsize should be constant and you need a BUILD_BUG_ON(). Given that futex2_hashsize is calcutated at boot time, not sure what we could to about this, maybe BUG_ON()? +static int futex_enqueue(struct futexv_head *futexv, unsigned int nr_futexes, +int *awakened) +{ + int i, ret; + u32 uval, *uaddr, val; + struct futex_bucket *bucket; + +retry: + set_current_state(TASK_INTERRUPTIBLE); + + for (i = 0; i < nr_futexes; i++) { + uaddr = (u32 * __user)futexv->objects[i].uaddr; + val = (u32)futexv->objects[i].val; + + bucket = futexv->objects[i].bucket; + + bucket_inc_waiters(bucket); + spin_lock(&bucket->lock); + + ret = futex_get_user(&uval, uaddr); + + if (unlikely(ret)) { + spin_unlock(&bucket->lock); + + bucket_dec_waiters(bucket); + __set_current_state(TASK_RUNNING); + *awakened = futex_dequeue_multiple(futexv, i); + + if (__get_user(uval, uaddr)) + return -EFAULT; + + if (*awakened >= 0) + return 1; If you are awakened, you don't need to waste time with trying to get the next key. Yes, and this is what this return is supposed to do. What I'm missing? +/** + * futex_wait - Setup the timer (if there's one) and wait on a list of futexes + * @futexv: List of futexes + * @nr_futexes: Length of futexv + * @timo: Timeout + * @flags: Timeout flags + * + * Return: + * * 0 >= - Hint of which futex woke us + * * 0 < - Error code + */ +static int futex_set_timer_and_wait(struct futexv_head *futexv, + unsigned int nr_futexes, + struct __kernel_timespec __user *timo, + unsigned int flags) +{ + struct hrtimer_sleeper timeout; + int ret; + + if (timo) { + ret = futex_setup_time(timo, &timeout, flags); + if (ret) + return ret; + } + + ret = __futex_wait(futexv, nr_futexes, timo ? &timeout : NULL); + + if (timo) + hrtimer_cancel(&timeout.timer); + + return ret; +} I'm having a hard time understanding why this function exists. part of the futex is set up outside of it, part inside. Not sure if this isn't just part of sys_futex_wait. I wrote this function si
Re: [RFC PATCH 06/13] docs: locking: futex2: Add documentation
Hi Randy, Thanks for the feedback. All suggestions/fixes were applied for next version. Às 15:34 de 16/02/21, Randy Dunlap escreveu: On 2/15/21 7:23 AM, André Almeida wrote: Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida
Re: [RFC PATCH 01/13] futex2: Implement wait and wake functions
Hi Peter, Às 06:02 de 16/02/21, Peter Zijlstra escreveu: On Mon, Feb 15, 2021 at 12:23:52PM -0300, André Almeida wrote: +static int __futex_wait(struct futexv_head *futexv, unsigned int nr_futexes, + struct hrtimer_sleeper *timeout) +{ + int ret; + + while (1) { + int awakened = -1; + Might be easier to understand if the set_current_state() is here, instead of squirreled away in futex_enqueue(). I placed set_current_state() inside futex_enqueue() because we need to set RUNNING and then INTERRUPTIBLE again for the retry path. + ret = futex_enqueue(futexv, nr_futexes, &awakened); + + if (ret) { + if (awakened >= 0) + return awakened; + return ret; + } + + /* Before sleeping, check if someone was woken */ + if (!futexv->hint && (!timeout || timeout->task)) + freezable_schedule(); + + __set_current_state(TASK_RUNNING); This is typically after the loop. Sorry, which loop? + + /* +* One of those things triggered this wake: +* +* * We have been removed from the bucket. futex_wake() woke +* us. We just need to dequeue and return 0 to userspace. +* +* However, if no futex was dequeued by a futex_wake(): +* +* * If the there's a timeout and it has expired, +* return -ETIMEDOUT. +* +* * If there is a signal pending, something wants to kill our +* thread, return -ERESTARTSYS. +* +* * If there's no signal pending, it was a spurious wake +* (scheduler gave us a change to do some work, even if we chance? Indeed, fixed. +* don't want to). We need to remove ourselves from the +* bucket and add again, to prevent losing wakeups in the +* meantime. +*/ Anyway, doing a dequeue and enqueue for spurious wakes is a bit of an anti-pattern that can lead to starvation. I've not actually looked at much detail yet as this is my first read-through, but did figure I'd mention it. So we could just leave everything enqueued for spurious wake? I was expecting that we would need to do all the work again (including rechecking *uaddr == val) to see if we didn't miss a futex_wake() between the kernel thread waking (spuriously) and going to sleep again. + + ret = futex_dequeue_multiple(futexv, nr_futexes); + + /* Normal wake */ + if (ret >= 0) + return ret; + + if (timeout && !timeout->task) + return -ETIMEDOUT; + + if (signal_pending(current)) + return -ERESTARTSYS; + + /* Spurious wake, do everything again */ + } +} Thanks, André
[PATCH 2/2] Documentation: admin-guide: Update kvm/xen config option
Since commit 9bba03d4473d ("kconfig: remove 'kvmconfig' and 'xenconfig' shorthands") kvm/xen config shortcuts are not available anymore. Update the file to reflect how they should be used, with the full filename. Signed-off-by: André Almeida --- Documentation/admin-guide/README.rst | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 261b7b4cca1f..35314b63008c 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -226,10 +226,11 @@ Configuring the kernel all module options to built in (=y) options. You can also preserve modules by LMC_KEEP. - "make kvmconfig" Enable additional options for kvm guest kernel support. + "make kvm_guest.config" Enable additional options for kvm guest kernel + support. - "make xenconfig" Enable additional options for xen dom0 guest kernel -support. + "make xen.config" Enable additional options for xen dom0 guest kernel + support. "make tinyconfig" Configure the tiniest possible kernel. -- 2.30.0
[PATCH 1/2] docs: Make syscalls' helpers naming consistent
The documentation explains the need to create internal syscalls' helpers, and that they should be called `kern_xyzzy()`. However, the comment at include/linux/syscall.h says that they should be named as `ksys_xyzzy()`, and so are all the helpers declared bellow it. Change the documentation to reflect this. Cc: Dominik Brodowski Fixes: 819671ff849b ("syscalls: define and explain goal to not call syscalls in the kernel") Signed-off-by: André Almeida --- Documentation/process/adding-syscalls.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/process/adding-syscalls.rst b/Documentation/process/adding-syscalls.rst index a3ecb236576c..61bdaec188ea 100644 --- a/Documentation/process/adding-syscalls.rst +++ b/Documentation/process/adding-syscalls.rst @@ -501,7 +501,7 @@ table, but not from elsewhere in the kernel. If the syscall functionality is useful to be used within the kernel, needs to be shared between an old and a new syscall, or needs to be shared between a syscall and its compatibility variant, it should be implemented by means of a "helper" function (such as -``kern_xyzzy()``). This kernel function may then be called within the +``ksys_xyzzy()``). This kernel function may then be called within the syscall stub (``sys_xyzzy()``), the compatibility syscall stub (``compat_sys_xyzzy()``), and/or other kernel code. -- 2.30.0
Re: [RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()"
Hi Matthew, Às 17:15 de 23/03/21, Matthew Wilcox escreveu: On Tue, Mar 23, 2021 at 04:59:38PM -0300, André Almeida wrote: This reverts commit 794c43f716845e2d48ce195ed5c4179a4e05ce5f. For implementing casefolding support at tmpfs, it needs to set dentry operations at superblock level, given that tmpfs has no support for fscrypt and we don't need to set operations on a per-dentry basis. Revert this commit so we can access those exported function from tmpfs code. But tmpfs / shmem are Kconfig bools, not tristate. They can't be built as modules, so there's no need to export the symbols. +#ifdef CONFIG_UNICODE +extern int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str); +extern int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, + const char *str, const struct qstr *name); +#endif There's no need for the ifdef (it only causes unnecessary rebuilds) and the 'extern' keyword is also unwelcome. Thank you. Instead of reverting the commit, I'll do a new commit doing this in a properly way.
Re: [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
Às 17:18 de 23/03/21, Gabriel Krisman Bertazi escreveu: André Almeida writes: opt = fs_parse(fc, shmem_fs_parameters, param, &result); if (opt < 0) @@ -3468,6 +3519,23 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param) ctx->full_inums = true; ctx->seen |= SHMEM_SEEN_INUMS; break; + case Opt_casefold: + if (strncmp(param->string, "utf8-", 5)) + return invalfc(fc, "Only utf8 encondings are supported"); + ret = strscpy(version, param->string + 5, sizeof(version)); Ugh. Now we are doing two strscpy for the parse api (in unicode_load). Can change the unicode_load api to reuse it? So instead of getting just the version number (e.g. "12.1.0") as parameter, utf8_load/unicode_load would get the full encoding string (e.g. "utf8-12.1.0") right?
Re: [RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
Hi Al Viro, Às 20:19 de 23/03/21, Al Viro escreveu: On Tue, Mar 23, 2021 at 04:59:39PM -0300, André Almeida wrote: * dcache handling: For now, negative lookups are not inserted in the dcache, since they would need to be invalidated anyway, because we can't trust missing file dentries. This is bad for performance but requires some leveraging of the VFS layer to fix. We can live without that for now, and so does everyone else. "For now"? Not a single practical suggestion has ever materialized. Pardon me, but by now I'm very sceptical about the odds of that ever changing. And no, I don't have any suggestions either. Right, I'll reword this to reflect that there's no expectation that this will be done, while keeping documented this performance issue. The lookup() path at tmpfs creates negatives dentries, that are later instantiated if the file is created. In that way, all files in tmpfs have a dentry given that the filesystem exists exclusively in memory. As explained above, we don't have negative dentries for casefold files, so dentries are created at lookup() iff files aren't casefolded. Else, the dentry is created just before being instantiated at create path. At the remove path, dentries are invalidated for casefolded files. Umm... What happens to those assertions if previously sane directory gets case-buggered? You've got an ioctl for doing just that... Incidentally, that ioctl is obviously racy - result of that simple_empty() might have nothing to do with reality before it is returned to caller. And while we are at it, simple_empty() doesn't check a damn thing about negative dentries in there... Thanks for pointing those issues. I'll move my lock at IOCTL to make impossible to change directory attributes and add a file there at the same time. About the negative dentries that existed before at that directory, I believe the way to solve this is by invalidating them all. How that sound to you? Thanks, André
Re: [RFC PATCH 4/4] docs: tmpfs: Add casefold options
Hi Gabriel, Às 19:19 de 23/03/21, Gabriel Krisman Bertazi escreveu: André Almeida writes: Document mounting options to enable casefold support in tmpfs. Signed-off-by: André Almeida --- Documentation/filesystems/tmpfs.rst | 26 ++ 1 file changed, 26 insertions(+) diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 0408c245785e..84c87c309bd7 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' will give you tmpfs instance on /mytmpfs which can allocate 10GB RAM/SWAP in 10240 inodes and it is only accessible by root. +tmpfs has the following mounting options for case-insesitive lookups support: + += == +casefoldEnable casefold support at this mount point using the given +argument as enconding. Currently only utf8 encondings are supported. +cf_strict Enable strict casefolding at this mouting point (disabled by +default). This means that invalid strings should be reject by the +file system. strict mode refers to the encoding, not exactly casefold. Maybe we could have a parameter encoding_flags that accepts the flag 'strict'. This would make it closer to the ext4 interface. What are the other enconding flags? Or is this more about having a properly extensible interface? Alternatively, call this option strict_encoding. Thanks, André
Re: [RFC PATCH 4/4] docs: tmpfs: Add casefold options
Às 18:58 de 23/03/21, Randy Dunlap escreveu: Hi-- On 3/23/21 12:59 PM, André Almeida wrote: Document mounting options to enable casefold support in tmpfs. Signed-off-by: André Almeida --- Documentation/filesystems/tmpfs.rst | 26 ++ 1 file changed, 26 insertions(+) diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 0408c245785e..84c87c309bd7 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' will give you tmpfs instance on /mytmpfs which can allocate 10GB RAM/SWAP in 10240 inodes and it is only accessible by root. +tmpfs has the following mounting options for case-insesitive lookups support: + += == +casefoldEnable casefold support at this mount point using the given +argument as enconding. Currently only utf8 encondings are supported. encoding. encodings +cf_strict Enable strict casefolding at this mouting point (disabled by mount +default). This means that invalid strings should be reject by the rejected +file system. += == + +Note that this option doesn't enable casefold by default, one needs to set default; one needs to set the +casefold flag per directory, setting the +F attribute in an empty directory. New +directories within a casefolded one will inherit the flag. Thanks for the feedback Randy, all changes applied.
[RFC PATCH 0/4] mm: shmem: Add case-insensitive support for tmpfs
Hello, This patchset adds support for case-insensitive file name lookups in tmpfs. The implementation (and even the commit message) was based on the work done at b886ee3e778e ("ext4: Support case-insensitive file name lookups"). * Use case The use case for this feature is similar to the use case for ext4, to better support compatibility layers (like Wine), particularly in combination with sandboxing/container tools (like Flatpak). Those containerization tools can share a subset of the host filesystem with an application. In the container, the root directory and any parent directories required for a shared directory are on tmpfs, with the shared directories bind-mounted into the container's view of the filesystem. If the host filesystem is using case-insensitive directories, then the application can do lookups inside those directories in a case-insensitive way, without this needing to be implemented in user-space. However, if the host is only sharing a subset of a case-insensitive directory with the application, then the parent directories of the mount point will be part of the container's root tmpfs. When the application tries to do case-insensitive lookups of those parent directories on a case-sensitive tmpfs, the lookup will fail. For example, if /srv/games is a case-insensitive directory on the host, then applications will expect /srv/games/Steam/Half-Life and /srv/games/steam/half-life to be interchangeable; but if the container framework is only sharing /srv/games/Steam/Half-Life and /srv/games/Steam/Portal (and not the rest of /srv/games) with the container, with /srv, /srv/games and /srv/games/Steam as part of the container's tmpfs root, then making /srv/games a case-insensitive directory inside the container would be necessary to meet that expectation. * The patchset Note that, since there's no on disk information about this filesystem (and thus, no mkfs support) we need to pass this information in the mount options. This is the main difference with other fs supporting casefolding like ext4 and f2fs. The folder attribute uses the same value used by ext4/f2fs, so userspace tools like chattr already works with this implementation. - Patch 1 reverts the unexportation of casefolding functions for dentry operations that are going to be used by tmpfs. - Patch 2 does the wiring up of casefold functions inside tmpfs, along with creating the mounting options for casefold support. - Patch 3 gives tmpfs support for IOCTL for get/set file flags. This is needed since the casefold is done in a per-directory basis at supported mount points, via directory flags. - Patch 4 documents the new options, along with an usage example. This work is also available at https://gitlab.collabora.com/tonyk/linux/-/tree/tmpfs-ic * Testing xfstests already has a test for casefold filesystems (generic/556). I have adapted it to work with tmpfs in a hacky way and this work can be found at https://gitlab.collabora.com/tonyk/xfstests. All tests succeed. Whenever we manage to get in a common ground around the interface, I'll make it more upstreamable so it can get merged along with the kernel work. * FAQ - Can't this be done in userspace? Yes, but it's slow and can't assure correctness (imagine two files named file.c and FILE.C; an app asks for FiLe.C, which one is the correct?). - Which changes are required in userspace? Apart of the container tools that will use this feature, no change is needed. Both mount and chattr already work with this patchset. - This will completely obliterate my setup! Casefold support in tmpfs is disabled by default. Thanks, André André Almeida (4): Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()" mm: shmem: Support case-insensitive file name lookups mm: shmem: Add IOCTL support for tmpfs docs: tmpfs: Add casefold options Documentation/filesystems/tmpfs.rst | 26 + fs/libfs.c | 8 +- include/linux/fs.h | 5 + include/linux/shmem_fs.h| 5 + mm/shmem.c | 175 +++- 5 files changed, 213 insertions(+), 6 deletions(-) -- 2.31.0
[RFC PATCH 1/4] Revert "libfs: unexport generic_ci_d_compare() and generic_ci_d_hash()"
This reverts commit 794c43f716845e2d48ce195ed5c4179a4e05ce5f. For implementing casefolding support at tmpfs, it needs to set dentry operations at superblock level, given that tmpfs has no support for fscrypt and we don't need to set operations on a per-dentry basis. Revert this commit so we can access those exported function from tmpfs code. Signed-off-by: André Almeida --- fs/libfs.c | 8 +--- include/linux/fs.h | 5 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/libfs.c b/fs/libfs.c index e2de5401abca..d1d06494463a 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1387,8 +1387,8 @@ static bool needs_casefold(const struct inode *dir) * * Return: 0 if names match, 1 if mismatch, or -ERRNO */ -static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, - const char *str, const struct qstr *name) +int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, + const char *str, const struct qstr *name) { const struct dentry *parent = READ_ONCE(dentry->d_parent); const struct inode *dir = READ_ONCE(parent->d_inode); @@ -1425,6 +1425,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, return 1; return !!memcmp(str, name->name, len); } +EXPORT_SYMBOL(generic_ci_d_compare); /** * generic_ci_d_hash - generic d_hash implementation for casefolding filesystems @@ -1433,7 +1434,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, * * Return: 0 if hash was successful or unchanged, and -EINVAL on error */ -static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) +int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) { const struct inode *dir = READ_ONCE(dentry->d_inode); struct super_block *sb = dentry->d_sb; @@ -1448,6 +1449,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) return -EINVAL; return 0; } +EXPORT_SYMBOL(generic_ci_d_hash); static const struct dentry_operations generic_ci_dentry_ops = { .d_hash = generic_ci_d_hash, diff --git a/include/linux/fs.h b/include/linux/fs.h index ec8f3ddf4a6a..29a0c6371f7d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3313,6 +3313,11 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int); extern int generic_check_addressable(unsigned, u64); +#ifdef CONFIG_UNICODE +extern int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str); +extern int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, + const char *str, const struct qstr *name); +#endif extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry); #ifdef CONFIG_MIGRATION -- 2.31.0
[RFC PATCH 2/4] mm: shmem: Support case-insensitive file name lookups
This patch implements the support for case-insensitive file name lookups in tmpfs, based on the encoding passed in the mount options. A filesystem that has the casefold feature set is able to configure directories with the +F (TMPFS_CASEFOLD_FL) attribute, enabling lookups to succeed in that directory in a case-insensitive fashion, i.e: match a directory entry even if the name used by userspace is not a byte per byte match with the disk name, but is an equivalent case-insensitive version of the Unicode string. This operation is called a case-insensitive file name lookup. The feature is configured as an inode attribute applied to directories and inherited by its children. This attribute can only be enabled on empty directories for filesystems that support the encoding feature, thus preventing collision of file names that only differ by case. * dcache handling: For a +F directory, tmpfs only stores the first equivalent name dentry used in the dcache. This is done to prevent unintentional duplication of dentries in the dcache, while also allowing the VFS code to quickly find the right entry in the cache despite which equivalent string was used in a previous lookup, without having to resort to ->lookup(). d_hash() of casefolded directories is implemented as the hash of the casefolded string, such that we always have a well-known bucket for all the equivalencies of the same string. d_compare() uses the utf8_strncasecmp() infrastructure, which handles the comparison of equivalent, same case, names as well. For now, negative lookups are not inserted in the dcache, since they would need to be invalidated anyway, because we can't trust missing file dentries. This is bad for performance but requires some leveraging of the VFS layer to fix. We can live without that for now, and so does everyone else. The lookup() path at tmpfs creates negatives dentries, that are later instantiated if the file is created. In that way, all files in tmpfs have a dentry given that the filesystem exists exclusively in memory. As explained above, we don't have negative dentries for casefold files, so dentries are created at lookup() iff files aren't casefolded. Else, the dentry is created just before being instantiated at create path. At the remove path, dentries are invalidated for casefolded files. * Dealing with invalid sequences: By default, when an invalid UTF-8 sequence is identified, tmpfs will treat it as an opaque byte sequence, ignoring the encoding and reverting to the old behavior for that unique file. This means that case-insensitive file name lookup will not work only for that file. An optional flag (cf_strict) can be set in the mount arguments telling the filesystem code and userspace tools to enforce the encoding. When that optional flag is set, any attempt to create a file name using an invalid UTF-8 sequence will fail and return an error to userspace. Signed-off-by: André Almeida --- include/linux/shmem_fs.h | 1 + mm/shmem.c | 91 +++- 2 files changed, 90 insertions(+), 2 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index d82b6f396588..29ee64352807 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -43,6 +43,7 @@ struct shmem_sb_info { spinlock_t shrinklist_lock; /* Protects shrinklist */ struct list_head shrinklist; /* List of shinkable inodes */ unsigned long shrinklist_len; /* Length of shrinklist */ + bool casefold; /* If this mount point supports casefolding */ }; static inline struct shmem_inode_info *SHMEM_I(struct inode *inode) diff --git a/mm/shmem.c b/mm/shmem.c index b2db4ed0fbc7..20df81763995 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -38,6 +38,7 @@ #include #include #include +#include #include /* for arch/microblaze update_mmu_cache() */ @@ -117,6 +118,8 @@ struct shmem_options { bool full_inums; int huge; int seen; + struct unicode_map *encoding; + bool cf_strict; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 @@ -161,6 +164,13 @@ static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb) return sb->s_fs_info; } +#ifdef CONFIG_UNICODE +static const struct dentry_operations casefold_dentry_ops = { + .d_hash = generic_ci_d_hash, + .d_compare = generic_ci_d_compare, +}; +#endif + /* * shmem_file_setup pre-accounts the whole fixed size of a VM object, * for shared memory and for shared anonymous (/dev/zero) mappings @@ -2859,8 +2869,18 @@ shmem_mknod(struct user_namespace *mnt_userns, struct inode *dir, struct inode *inode; int error = -ENOSPC; +#ifdef CONFIG_UNICODE + struct super_block *sb = dir->i_sb; + + if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) && + sb->s_encoding && utf8_validate(sb->s_encoding, &am
[RFC PATCH 4/4] docs: tmpfs: Add casefold options
Document mounting options to enable casefold support in tmpfs. Signed-off-by: André Almeida --- Documentation/filesystems/tmpfs.rst | 26 ++ 1 file changed, 26 insertions(+) diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index 0408c245785e..84c87c309bd7 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -170,6 +170,32 @@ So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs' will give you tmpfs instance on /mytmpfs which can allocate 10GB RAM/SWAP in 10240 inodes and it is only accessible by root. +tmpfs has the following mounting options for case-insesitive lookups support: + += == +casefoldEnable casefold support at this mount point using the given +argument as enconding. Currently only utf8 encondings are supported. +cf_strict Enable strict casefolding at this mouting point (disabled by +default). This means that invalid strings should be reject by the +file system. += == + +Note that this option doesn't enable casefold by default, one needs to set +casefold flag per directory, setting the +F attribute in an empty directory. New +directories within a casefolded one will inherit the flag. + +Example:: + +$ mount -t tmpfs -o casefold=utf8-12.1.0,cf_strict tmpfs /mytmpfs +$ cd /mytmpfs +$ touch a; touch A +$ ls +A a +$ mkdir dir +$ chattr +F dir +$ touch dir/a; touch dir/A +$ ls dir +a :Author: Christoph Rohland , 1.12.01 -- 2.31.0
[RFC PATCH 3/4] mm: shmem: Add IOCTL support for tmpfs
Implement IOCTL operations for files to set/get file flags. Implement the only supported flag by now, that is S_CASEFOLD. Signed-off-by: André Almeida --- include/linux/shmem_fs.h | 4 ++ mm/shmem.c | 84 +++- 2 files changed, 87 insertions(+), 1 deletion(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 29ee64352807..2c89c5a66508 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -140,4 +140,8 @@ extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm, dst_addr) ({ BUG(); 0; }) #endif +#define TMPFS_CASEFOLD_FL 0x4000 /* Casefolded file */ +#define TMPFS_USER_FLS TMPFS_CASEFOLD_FL /* Userspace supported flags */ +#define TMPFS_FLS S_CASEFOLD /* Kernel supported flags */ + #endif diff --git a/mm/shmem.c b/mm/shmem.c index 20df81763995..2f2c996d215b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -258,6 +258,7 @@ static inline void shmem_inode_unacct_blocks(struct inode *inode, long pages) static const struct super_operations shmem_ops; const struct address_space_operations shmem_aops; static const struct file_operations shmem_file_operations; +static const struct file_operations shmem_dir_operations; static const struct inode_operations shmem_inode_operations; static const struct inode_operations shmem_dir_inode_operations; static const struct inode_operations shmem_special_inode_operations; @@ -2347,7 +2348,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode /* Some things misbehave if size == 0 on a directory */ inode->i_size = 2 * BOGO_DIRENT_SIZE; inode->i_op = &shmem_dir_inode_operations; - inode->i_fop = &simple_dir_operations; + inode->i_fop = &shmem_dir_operations; break; case S_IFLNK: /* @@ -2838,6 +2839,76 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset, return error; } +static long shmem_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + int ret; + u32 fsflags = 0, old, new = 0; + struct inode *inode = file_inode(file); + struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb); + + switch (cmd) { + case FS_IOC_GETFLAGS: + if ((inode->i_flags & S_CASEFOLD) && S_ISDIR(inode->i_mode)) + fsflags |= TMPFS_CASEFOLD_FL; + + if (put_user(fsflags, (int __user *)arg)) + return -EFAULT; + + return 0; + + case FS_IOC_SETFLAGS: + if (get_user(fsflags, (int __user *)arg)) + return -EFAULT; + + old = inode->i_flags; + + if (fsflags & ~TMPFS_USER_FLS) + return -EINVAL; + + if (fsflags & TMPFS_CASEFOLD_FL) { + if (!sbinfo->casefold) { + pr_err("tmpfs: casefold disabled at this mount point\n"); + return -EOPNOTSUPP; + } + + if (!S_ISDIR(inode->i_mode)) + return -ENOTDIR; + + if (!simple_empty(file_dentry(file))) + return -ENOTEMPTY; + + new |= S_CASEFOLD; + } else if (old & S_CASEFOLD) { + if (!simple_empty(file_dentry(file))) + return -ENOTEMPTY; + } + + ret = mnt_want_write_file(file); + if (ret) + return ret; + + inode_lock(inode); + + ret = vfs_ioc_setflags_prepare(inode, old, new); + if (ret) { + inode_unlock(inode); + mnt_drop_write_file(file); + return ret; + } + + inode_set_flags(inode, new, TMPFS_FLS); + + inode_unlock(inode); + mnt_drop_write_file(file); + return 0; + + default: + return -ENOTTY; + } + + return 0; +} + static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf) { struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb); @@ -3916,6 +3987,7 @@ static const struct file_operations shmem_file_operations = { .splice_read= generic_file_splice_read, .splice_write = iter_file_splice_write, .fallocate = shmem_fallocate, + .unlocked_ioctl = shmem_ioctl, #endif }; @@ -3928,6 +4000,16 @@ static const struct inode_operations shmem_inode_operations = { #endif }; +static const struct file_operations shmem_dir_operations = { +
Re: [RFC PATCH v2 00/13] Add futex2 syscall
Hi Peter, Às 02:44 de 04/03/21, Peter Oskolkov escreveu: On Wed, Mar 3, 2021 at 5:22 PM André Almeida wrote: Hi, This patch series introduces the futex2 syscalls. * FAQ ** "And what's about FUTEX_64?" By supporting 64 bit futexes, the kernel structure for futex would need to have a 64 bit field for the value, and that could defeat one of the purposes of having different sized futexes in the first place: supporting smaller ones to decrease memory usage. This might be something that could be disabled for 32bit archs (and even for CONFIG_BASE_SMALL). Which use case would benefit for FUTEX_64? Does it worth the trade-offs? The ability to store a pointer value on 64bit platforms is an important use case. Imagine a simple producer/consumer scenario, with the producer updating some shared memory data and waking the consumer. Storing the pointer in the futex makes it so that only one shared memory location needs to be accessed "atomically", etc. With two atomics synchronization becomes more involved (= slower). So the idea is to, instead of doing this: T1: atomic_set(&shm_addr, buffer_addr); atomic_set(&futex, 0); futex_wake(&futex, 1); T2: consume(shm_addr); To do that: T1: atomic_set(&futex, buffer_addr); futex_wake(&futex, 1); T2: consume(futex); Right? I'll try to write a small test to see how the perf numbers looks like.
Re: [RFC PATCH v2 00/13] Add futex2 syscall
Hi Ted, Às 12:01 de 04/03/21, Theodore Ts'o escreveu: On Wed, Mar 03, 2021 at 09:42:06PM -0300, André Almeida wrote: ** Performance - For comparing futex() and futex2() performance, I used the artificial benchmarks implemented at perf (wake, wake-parallel, hash and requeue). The setup was 200 runs for each test and using 8, 80, 800, 8000 for the number of threads, Note that for this test, I'm not using patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained at "The patchset" section. How heavily contended where the benchmarks? One of the benefits of the original futex was that no system call was necessary in the happy path when the lock is uncontended. futex2 has the same design in that aspect, no syscall is needed in the happy path. Did something in the cover letter gave the impression that is not the case? I would like to reword it to clarify this. Especially on a non-NUMA system (which are the far more common case), since that's where relying on a single memory access was a huge win for the original futex. I would expect that futex2 will fare worse in this particular case, since it requires a system call entry for all operations --- the question is how large is the delta in this worst case (for futex2) and best case (for futex) scenario. Cheers, - Ted Thanks, André
Re: [PATCH 1/3] fs/dcache: Add d_clear_dir_neg_dentries()
Hi Eric, Às 22:48 de 29/03/21, Eric Biggers escreveu: On Sun, Mar 28, 2021 at 11:43:54AM -0300, André Almeida wrote: For directories with negative dentries that are becoming case-insensitive dirs, we need to remove all those negative dentries, otherwise they will become dangling dentries. During the creation of a new file, if a d_hash collision happens and the names match in a case-insensitive way, the name of the file will be the name defined at the negative dentry, that may be different from the specified by the user. To prevent this from happening, we need to remove all dentries in a directory. Given that the directory must be empty before we call this function we are sure that all dentries there will be negative. Create a function to remove all negative dentries from a directory, to be used as explained above by filesystems that support case-insensitive lookups. Signed-off-by: André Almeida --- fs/dcache.c| 27 +++ include/linux/dcache.h | 1 + 2 files changed, 28 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 7d24ff7eb206..fafb3016d6fd 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1723,6 +1723,33 @@ void d_invalidate(struct dentry *dentry) } EXPORT_SYMBOL(d_invalidate); +/** + * d_clear_dir_neg_dentries - Remove negative dentries in an inode + * @dir: Directory to clear negative dentries + * + * For directories with negative dentries that are becoming case-insensitive + * dirs, we need to remove all those negative dentries, otherwise they will + * become dangling dentries. During the creation of a new file, if a d_hash + * collision happens and the names match in a case-insensitive, the name of + * the file will be the name defined at the negative dentry, that can be + * different from the specified by the user. To prevent this from happening, we + * need to remove all dentries in a directory. Given that the directory must be + * empty before we call this function we are sure that all dentries there will + * be negative. + */ +void d_clear_dir_neg_dentries(struct inode *dir) +{ + struct dentry *alias, *dentry; + + hlist_for_each_entry(alias, &dir->i_dentry, d_u.d_alias) { + list_for_each_entry(dentry, &alias->d_subdirs, d_child) { + d_drop(dentry); + dput(dentry); + } + } +} +EXPORT_SYMBOL(d_clear_dir_neg_dentries); As Al already pointed out, this doesn't work as intended, for a number of different reasons. Did you consider just using shrink_dcache_parent()? That already does what you are trying to do here, I think. When I wrote this patch, I didn't know it, but after Al Viro comments I get back to the code and found it, and it seems do do what I intend indeed, and my test is happy as well. The harder part (which I don't think you've considered) is how to ensure that all negative dentries really get invalidated even if there are lookups of them happening concurrently. Concurrent lookups can take temporary references to the negative dentries, preventing them from being invalidated. I didn't consider that, thanks for the feedback. So this means that those lookups will increase the refcount of the dentry, and it will only get really invalidated when refcount reaches 0? Or do would I need to call d_invalidate() again, until I succeed? - Eric
[RFC PATCH v2 00/13] Add futex2 syscall
ther than part of the interface and shouldn't be merged as it is. * Testing: This patchset provides selftests for each operation and their flags. Along with that, the following work was done: ** Stability To stress the interface in "real world scenarios": - glibc[4]: nptl's low level locking was modified to use futex2 API (except for robust and PI things). All relevant nptl/ tests passed. - Wine[5]: Proton/Wine was modified in order to use futex2() for the emulation of Windows NT sync mechanisms based on futex, called "fsync". Triple-A games with huge CPU's loads and tons of parallel jobs worked as expected when compared with the previous FUTEX_WAIT_MULTIPLE implementation at futex(). Some games issue 42k futex2() calls per second. - Full GNU/Linux distro: I installed the modified glibc in my host machine, so all pthread's programs would use futex2(). After tweaking systemd[6] to allow futex2() calls at seccomp, everything worked as expected (web browsers do some syscall sandboxing and need some configuration as well). - perf: The perf benchmarks tests can also be used to stress the interface, and they can be found in this patchset. ** Performance - For comparing futex() and futex2() performance, I used the artificial benchmarks implemented at perf (wake, wake-parallel, hash and requeue). The setup was 200 runs for each test and using 8, 80, 800, 8000 for the number of threads, Note that for this test, I'm not using patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained at "The patchset" section. - For the first three ones, I measured an average of 4% gain in performance. This is not a big step, but it shows that the new interface is at least comparable in performance with the current one. - For requeue, I measured an average of 21% decrease in performance compared to the original futex implementation. This is expected given the new design with individual flags. The performance trade-offs are explained at patch 4 ("futex2: Implement requeue operation"). [4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2 [5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13 [6] https://gitlab.collabora.com/tonyk/systemd * FAQ ** "Where's the code for NUMA and FUTEX_8/16?" The current code is already complex enough to take some time for review, so I believe it's better to split that work out to a future iteration of this patchset. Besides that, this RFC is the core part of the infrastructure, and the following features will not pose big design changes to it, the work will be more about wiring up the flags and modifying some functions. ** "And what's about FUTEX_64?" By supporting 64 bit futexes, the kernel structure for futex would need to have a 64 bit field for the value, and that could defeat one of the purposes of having different sized futexes in the first place: supporting smaller ones to decrease memory usage. This might be something that could be disabled for 32bit archs (and even for CONFIG_BASE_SMALL). Which use case would benefit for FUTEX_64? Does it worth the trade-offs? ** "Where's the PI/robust stuff?" As said by Peter Zijlstra at [3], all those new features are related to the "simple" futex interface, that doesn't use PI or robust. Do we want to have this complexity at futex2() and if so, should it be part of this patchset or can it be future work? Thanks, André * Changelog Changes from v1: - Unified futex_set_timer_and_wait and __futex_wait code - Dropped _carefull from linked list function calls - Fixed typos on docs patch - uAPI flags are now added as features are introduced, instead of all flags in patch 1 - Removed struct futex_single_waiter in favor of an anon struct v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealm...@collabora.com/ André Almeida (13): futex2: Implement wait and wake functions futex2: Add support for shared futexes futex2: Implement vectorized wait futex2: Implement requeue operation futex2: Add compatibility entry point for x86_x32 ABI docs: locking: futex2: Add documentation selftests: futex2: Add wake/wait test selftests: futex2: Add timeout test selftests: futex2: Add wouldblock test selftests: futex2: Add waitv test selftests: futex2: Add requeue test perf bench: Add futex2 benchmark tests kernel: Enable waitpid() for futex2 Documentation/locking/futex2.rst | 198 +++ Documentation/locking/index.rst |1 + MAINTAINERS |2 +- arch/arm/tools/syscall.tbl|4 + arch/arm64/include/asm/unistd.h |2 +- arch/arm64/include/asm/unistd32.h |8 + arch/x86/entry/syscalls/syscall_32.tbl|4 + arch/x86/en
[RFC PATCH v2 01/13] futex2: Implement wait and wake functions
Create a new set of futex syscalls known as futex2. This new interface is aimed to implement a more maintainable code, while removing obsolete features and expanding it with new functionalities. Implements wait and wake semantics for futexes, along with the base infrastructure for future operations. The whole wait path is designed to be used by N waiters, thus making easier to implement vectorized wait. * Syscalls implemented by this patch: - futex_wait(void *uaddr, unsigned int val, unsigned int flags, struct timespec *timo) The user thread is put to sleep, waiting for a futex_wake() at uaddr, if the value at *uaddr is the same as val (otherwise, the syscall returns immediately with -EAGAIN). timo is an optional timeout value for the operation. Return 0 on success, error code otherwise. - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags) Wake `nr_wake` threads waiting at uaddr. Return the number of woken threads on success, error code otherwise. ** The `flag` argument The flag is used to specify the size of the futex word (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no default size. By default, the timeout uses a monotonic clock, but can be used as a realtime one by using the FUTEX_REALTIME_CLOCK flag. By default, futexes are of the private type, that means that this user address will be accessed by threads that shares the same memory region. This allows for some internal optimizations, so they are faster. However, if the address needs to be shared with different processes (like using `mmap()` or `shm()`), they need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. By default, the operation has no NUMA-awareness, meaning that the user can't choose the memory node where the kernel side futex data will be stored. The user can choose the node where it wants to operate by setting the FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or 32): struct futexX_numa { __uX value; __sX hint; }; This structure should be passed at the `void *uaddr` of futex functions. The address of the structure will be used to be waited/waken on, and the `value` will be compared to `val` as usual. The `hint` member is used to defined which node the futex will use. When waiting, the futex will be registered on a kernel-side table stored on that node; when waking, the futex will be searched for on that given table. That means that there's no redundancy between tables, and the wrong `hint` value will led to undesired behavior. Userspace is responsible for dealing with node migrations issues that may occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use the same node the current process is using. When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a global table on some node, defined at compilation time. ** The `timo` argument As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout options known to be buggy. Given that, `timo` should be a 64bit timeout at all platforms, using an absolute timeout value. Signed-off-by: André Almeida --- MAINTAINERS | 2 +- arch/arm/tools/syscall.tbl| 2 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 4 + arch/x86/entry/syscalls/syscall_32.tbl| 2 + arch/x86/entry/syscalls/syscall_64.tbl| 2 + include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 8 +- include/uapi/linux/futex.h| 5 + init/Kconfig | 7 + kernel/Makefile | 1 + kernel/futex2.c | 603 ++ kernel/sys_ni.c | 4 + tools/include/uapi/asm-generic/unistd.h | 8 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 2 + 15 files changed, 655 insertions(+), 4 deletions(-) create mode 100644 kernel/futex2.c diff --git a/MAINTAINERS b/MAINTAINERS index d92f85ca831d..01aceb92aa40 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7370,7 +7370,7 @@ F:Documentation/locking/*futex* F: include/asm-generic/futex.h F: include/linux/futex.h F: include/uapi/linux/futex.h -F: kernel/futex.c +F: kernel/futex* F: tools/perf/bench/futex* F: tools/testing/selftests/futex/ diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index dcc1191291a2..2bf93c69e00a 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -456,3 +456,5 @@ 440common process_madvise sys_process_madvise 441common epoll_pwait2sys_epoll_pwait2 442common mount_setattr sys_mount_setattr +
[RFC PATCH v2 03/13] futex2: Implement vectorized wait
Add support to wait on multiple futexes. This is the interface implemented by this syscall: futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timo) struct futex_waitv { void *uaddr; unsigned int val; unsigned int flags; }; Given an array of struct futex_waitv, wait on each uaddr. The thread wakes if a futex_wake() is performed at any uaddr. The syscall returns immediately if any waiter has *uaddr != val. *timo is an optional timeout value for the operation. The flags argument of the syscall should be used solely for specifying the timeout as realtime, if needed. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. Returns the array index of one of the awakened futexes. There’s no given information of how many were awakened, or any particular attribute of it (if it’s the first awakened, if it is of the smaller index...). Signed-off-by: André Almeida --- arch/arm/tools/syscall.tbl| 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/x86/entry/syscalls/syscall_32.tbl| 1 + arch/x86/entry/syscalls/syscall_64.tbl| 1 + include/linux/compat.h| 11 ++ include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/futex.h| 14 ++ kernel/futex2.c | 177 ++ kernel/sys_ni.c | 1 + tools/include/uapi/asm-generic/unistd.h | 5 +- .../arch/x86/entry/syscalls/syscall_64.tbl| 1 + 13 files changed, 222 insertions(+), 3 deletions(-) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 2bf93c69e00a..f9b55f2ea444 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -458,3 +458,4 @@ 442common mount_setattr sys_mount_setattr 443common futex_wait sys_futex_wait 444common futex_wake sys_futex_wake +445common futex_waitv sys_futex_waitv diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 64ebdc1ec581..d1cc2849dc00 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 445 +#define __NR_compat_syscalls 446 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 15c2cd5f1c95..1e19604b8885 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -897,6 +897,8 @@ __SYSCALL(__NR_mount_setattr, sys_mount_setattr) __SYSCALL(__NR_futex_wait, sys_futex_wait) #define __NR_futex_wake 444 __SYSCALL(__NR_futex_wake, sys_futex_wake) +#define __NR_futex_waitv 445 +__SYSCALL(__NR_futex_waitv, compat_sys_futex_waitv) /* * Please add new compat syscalls above this comment and update diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 17d22509d780..4bc546c841b0 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -449,3 +449,4 @@ 442i386mount_setattr sys_mount_setattr 443i386futex_wait sys_futex_wait 444i386futex_wake sys_futex_wake +445i386futex_waitv sys_futex_waitv compat_sys_futex_waitv diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 3336b5cd5bdb..a715e88e3d6d 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -366,6 +366,7 @@ 442common mount_setattr sys_mount_setattr 443common futex_wait sys_futex_wait 444common futex_wake sys_futex_wake +445common futex_waitv sys_futex_waitv # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/compat.h b/include/linux/compat.h index 6e65be753603..041d18174350 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -365,6 +365,12 @@ struct compat_robust_list_head { compat_uptr_t list_op_pending; }; +struct compat_futex_waitv { + compat_uptr_t uaddr; + compat_uint_t val; + compat_uint_t flags; +}; + #ifdef CONFIG_COMPAT_OLD_SIGACTION struct compat_old_sigaction { compat_uptr_t sa_handler; @@ -654,6 +660,11 @@ asmlinkage long compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr, compat_size_t
[RFC PATCH v2 04/13] futex2: Implement requeue operation
Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation. This is the syscall implemented by this patch: futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2, unsigned int nr_wake, unsigned int nr_requeue, unsigned int cmpval, unsigned int flags) struct futex_requeue { void *uaddr; unsigned int flags; }; If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr and add them to uaddr2->uaddr list. Each uaddr has its own set of flags, that must be defined at struct futex_requeue (such as size, shared, NUMA). The flags argument of the syscall is there just for the sake of extensibility, and right now it needs to be zero. Return the number of the woken futexes + the number of requeued ones on success, error code otherwise. Signed-off-by: André Almeida --- The original FUTEX_CMP_REQUEUE interfaces is such as follows: futex(*uaddr1, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, *uaddr2, cmpval); Given that when this interface was created they was only one type of futex (as opposed to futex2, where there is shared, sizes, and NUMA), there was no way to specify individual flags for uaddr1 and 2. When FUTEX_PRIVATE was implemented, a new opcode was created as well (FUTEX_CMP_REQUEUE_PRIVATE), but they apply both futexes, so they should be of the same type regarding private/shared. This imposes a limitation on the use cases of the operation, and to overcome that at futex2, `struct futex_requeue` was created, so one can set individual flags for each futex. This flexibility is a trade-off with performance, given that now we need to perform two extra copy_from_user(). One alternative would be to use the upper half of flags bits to the first one, and the bottom half for the second futex, but this would also impose limitations, given that we would limit by half the flags possibilities. If equal futexes are common enough, the following extension could be added to overcome the current performance: - A flag FUTEX_REQUEUE_EQUAL is added to futex2() flags; - If futex_requeue() see this flag, that means that both futexes uses the same set of attributes. - Then, the function parses the flags as of futex_wait/wake(). - *uaddr1 and *uaddr2 are used as void* (instead of struct futex_requeue) just like wait/wake(). In that way, we could avoid the copy_from_user(). --- arch/arm/tools/syscall.tbl| 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/x86/entry/syscalls/syscall_32.tbl| 1 + arch/x86/entry/syscalls/syscall_64.tbl| 1 + include/linux/compat.h| 12 + include/linux/syscalls.h | 5 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/futex.h| 10 + kernel/futex2.c | 215 ++ kernel/sys_ni.c | 1 + .../arch/x86/entry/syscalls/syscall_64.tbl| 1 + 12 files changed, 254 insertions(+), 2 deletions(-) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index f9b55f2ea444..24a700535747 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -459,3 +459,4 @@ 443common futex_wait sys_futex_wait 444common futex_wake sys_futex_wake 445common futex_waitv sys_futex_waitv +446common futex_requeue sys_futex_requeue diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index d1cc2849dc00..727bfc3be99b 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls(__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END(__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 446 +#define __NR_compat_syscalls 447 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 1e19604b8885..e5015a2b9c94 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -899,6 +899,8 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) __SYSCALL(__NR_futex_wake, sys_futex_wake) #define __NR_futex_waitv 445 __SYSCALL(__NR_futex_waitv, compat_sys_futex_waitv) +#define __NR_futex_waitv 446 +__SYSCALL(__NR_futex_requeue, compat_sys_futex_requeue) /* * Please add new compat syscalls above this comment and update diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4bc546c841b0..4d0111f44d79 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -450,3 +450,4 @@ 443i386futex_wait sys_futex_wait
[RFC PATCH v2 06/13] docs: locking: futex2: Add documentation
Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida --- Documentation/locking/futex2.rst | 198 +++ Documentation/locking/index.rst | 1 + 2 files changed, 199 insertions(+) create mode 100644 Documentation/locking/futex2.rst diff --git a/Documentation/locking/futex2.rst b/Documentation/locking/futex2.rst new file mode 100644 index ..3ab49f0e741c --- /dev/null +++ b/Documentation/locking/futex2.rst @@ -0,0 +1,198 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +futex2 +== + +:Author: André Almeida + +futex, or fast user mutex, is a set of syscalls to allow userspace to create +performant synchronization mechanisms, such as mutexes, semaphores and +conditional variables in userspace. C standard libraries, like glibc, uses it +as a means to implement more high level interfaces like pthreads. + +The interface += + +uAPI functions +-- + +.. kernel-doc:: kernel/futex2.c + :identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue + +uAPI structures +--- + +.. kernel-doc:: include/uapi/linux/futex.h + +The ``flag`` argument +- + +The flag is used to specify the size of the futex word +(FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no +default size. + +By default, the timeout uses a monotonic clock, but can be used as a realtime +one by using the FUTEX_REALTIME_CLOCK flag. + +By default, futexes are of the private type, that means that this user address +will be accessed by threads that share the same memory region. This allows for +some internal optimizations, so they are faster. However, if the address needs +to be shared with different processes (like using ``mmap()`` or ``shm()``), they +need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. + +By default, the operation has no NUMA-awareness, meaning that the user can't +choose the memory node where the kernel side futex data will be stored. The +user can choose the node where it wants to operate by setting the +FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or +32):: + + struct futexX_numa { + __uX value; + __sX hint; + }; + +This structure should be passed at the ``void *uaddr`` of futex functions. The +address of the structure will be used to be waited on/waken on, and the +``value`` will be compared to ``val`` as usual. The ``hint`` member is used to +define which node the futex will use. When waiting, the futex will be +registered on a kernel-side table stored on that node; when waking, the futex +will be searched for on that given table. That means that there's no redundancy +between tables, and the wrong ``hint`` value will lead to undesired behavior. +Userspace is responsible for dealing with node migrations issues that may +occur. ``hint`` can range from [0, MAX_NUMA_NODES), for specifying a node, or +-1, to use the same node the current process is using. + +When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a +global table on allocated on the first node. + +The ``timo`` argument +- + +As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout +options known to be buggy. Given that, ``timo`` should be a 64-bit timeout at +all platforms, using an absolute timeout value. + +Implementation +== + +The internal implementation follows a similar design to the original futex. +Given that we want to replicate the same external behavior of current futex, +this should be somewhat expected. + +Waiting +--- + +For the wait operations, they are all treated as if you want to wait on N +futexes, so the path for futex_wait and futex_waitv is the basically the same. +For both syscalls, the first step is to prepare an internal list for the list +of futexes to wait for (using struct futexv_head). For futex_wait() calls, this +list will have a single object. + +We have a hash table, where waiters register themselves before sleeping. Then +the wake function checks this table looking for waiters at uaddr. The hash +bucket to be used is determined by a struct futex_key, that stores information +to uniquely identify an address from a given process. Given the huge address +space, there'll be hash collisions, so we store information to be later used on +collision treatment. + +First, for every futex we want to wait on, we check if (``*uaddr == val``). +This check is done holding the bucket lock, so we are correctly serialized with +any futex_wake() calls. If any waiter fails the check above, we dequeue all +futexes. The check (``*uaddr == val``) can fail for two reasons: + +- The values are different, and we return -EAGAIN. However, if while + dequeueing we found that some futexes were awakened, we prioritize this + and return success. + +- When trying to
[RFC PATCH v2 08/13] selftests: futex2: Add timeout test
Adapt existing futex wait timeout file to test the same mechanism for futex2. futex2 accepts only absolute 64bit timers, but supports both monotonic and realtime clocks. Signed-off-by: André Almeida --- .../futex/functional/futex_wait_timeout.c | 58 --- 1 file changed, 49 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_timeout.c b/tools/testing/selftests/futex/functional/futex_wait_timeout.c index ee55e6d389a3..b4dffe9e3b44 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c +++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c @@ -11,6 +11,7 @@ * * HISTORY * 2009-Nov-6: Initial version by Darren Hart + * 2021-Feb-5: Add futex2 test by André * */ @@ -20,7 +21,7 @@ #include #include #include -#include "futextest.h" +#include "futex2test.h" #include "logging.h" #define TEST_NAME "futex-wait-timeout" @@ -40,7 +41,8 @@ void usage(char *prog) int main(int argc, char *argv[]) { futex_t f1 = FUTEX_INITIALIZER; - struct timespec to; + struct timespec to = {.tv_sec = 0, .tv_nsec = timeout_ns}; + struct timespec64 to64; int res, ret = RET_PASS; int c; @@ -65,22 +67,60 @@ int main(int argc, char *argv[]) } ksft_print_header(); - ksft_set_plan(1); + ksft_set_plan(3); ksft_print_msg("%s: Block on a futex and wait for timeout\n", basename(argv[0])); ksft_print_msg("\tArguments: timeout=%ldns\n", timeout_ns); - /* initialize timeout */ - to.tv_sec = 0; - to.tv_nsec = timeout_ns; - info("Calling futex_wait on f1: %u @ %p\n", f1, &f1); res = futex_wait(&f1, f1, &to, FUTEX_PRIVATE_FLAG); if (!res || errno != ETIMEDOUT) { - fail("futex_wait returned %d\n", ret < 0 ? errno : ret); + ksft_test_result_fail("futex_wait returned %d\n", ret < 0 ? errno : ret); + ret = RET_FAIL; + } else { + ksft_test_result_pass("futex_wait timeout succeeds\n"); + } + + /* setting absolute monotonic timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + info("Calling futex2_wait on f1: %u @ %p\n", f1, &f1); + res = futex2_wait(&f1, f1, FUTEX_32, &to64); + if (!res || errno != ETIMEDOUT) { + ksft_test_result_fail("futex2_wait monotonic returned %d\n", ret < 0 ? errno : ret); + ret = RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait monotonic timeout succeeds\n"); + } + + /* setting absolute realtime timeout for futex2 */ + if (gettime64(CLOCK_REALTIME, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + info("Calling futex2_wait on f1: %u @ %p\n", f1, &f1); + res = futex2_wait(&f1, f1, FUTEX_32 | FUTEX_CLOCK_REALTIME, &to64); + if (!res || errno != ETIMEDOUT) { + ksft_test_result_fail("futex2_wait realtime returned %d\n", ret < 0 ? errno : ret); ret = RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait realtime timeout succeeds\n"); } - print_result(TEST_NAME, ret); + ksft_print_cnts(); return ret; } -- 2.30.1
[RFC PATCH v2 07/13] selftests: futex2: Add wake/wait test
Add a simple file to test wake/wait mechanism using futex2 interface. Test three scenarios: using a common local int variable as private futex, a shm futex as shared futex and a file-backed shared memory as a shared futex. This should test all branches of futex_get_key(). Create helper files so more tests can evaluate futex2. While 32bit ABIs from glibc aren't yet able to use 64 bit sized time variables, add a temporary workaround that implements the required types and calls the appropriated syscalls, since futex2 doesn't supports 32 bit sized time. Signed-off-by: André Almeida --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 6 +- .../selftests/futex/functional/futex2_wait.c | 209 ++ .../testing/selftests/futex/functional/run.sh | 3 + .../selftests/futex/include/futex2test.h | 79 +++ 5 files changed, 296 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c create mode 100644 tools/testing/selftests/futex/include/futex2test.h diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index 0efcd494daab..d61f1df94360 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -6,3 +6,4 @@ futex_wait_private_mapped_file futex_wait_timeout futex_wait_uninitialized_heap futex_wait_wouldblock +futex2_wait diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index 23207829ec75..9b334f190759 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -1,10 +1,11 @@ # SPDX-License-Identifier: GPL-2.0 -INCLUDES := -I../include -I../../ +INCLUDES := -I../include -I../../ -I../../../../../usr/include/ CFLAGS := $(CFLAGS) -g -O2 -Wall -D_GNU_SOURCE -pthread $(INCLUDES) LDLIBS := -lpthread -lrt HEADERS := \ ../include/futextest.h \ + ../include/futex2test.h \ ../include/atomic.h \ ../include/logging.h TEST_GEN_FILES := \ @@ -14,7 +15,8 @@ TEST_GEN_FILES := \ futex_requeue_pi_signal_restart \ futex_requeue_pi_mismatched_ops \ futex_wait_uninitialized_heap \ - futex_wait_private_mapped_file + futex_wait_private_mapped_file \ + futex2_wait TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/futex2_wait.c b/tools/testing/selftests/futex/functional/futex2_wait.c new file mode 100644 index ..4b5416585c79 --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_wait.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/** + * + * Copyright Collabora Ltd., 2021 + * + * DESCRIPTION + * Test wait/wake mechanism of futex2, using 32bit sized futexes. + * + * AUTHOR + * André Almeida + * + * HISTORY + * 2021-Feb-5: Initial version by André + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "futex2test.h" +#include "logging.h" + +#define TEST_NAME "futex2-wait" +#define timeout_ns 3000 +#define WAKE_WAIT_US 1 +#define SHM_PATH "futex2_shm_file" +futex_t *f1; + +void usage(char *prog) +{ + printf("Usage: %s\n", prog); + printf(" -cUse color\n"); + printf(" -hDisplay this help message\n"); + printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", + VQUIET, VCRITICAL, VINFO); +} + +void *waiterfn(void *arg) +{ + struct timespec64 to64; + unsigned int flags = 0; + + if (arg) + flags = *((unsigned int *) arg); + + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + if (futex2_wait(f1, *f1, FUTEX_32 | flags, &to64)) + printf("waiter failed errno %d\n", errno); + + return NULL; +} + +void *waitershm(void *arg) +{ + futex2_wait(arg, 0, FUTEX_32 | FUTEX_SHARED_FLAG, NULL); + + return NULL; +} + +int main(int argc, char *argv[]) +{ + pthread_t waiter; + unsigned int flags = FUTEX_SHARED_FLAG; + int res, ret = RET_PASS; + int c; + futex_t f_private = 0; + + f1 = &f_private; + + while ((c = getopt(argc, argv, "cht:v:")) != -1) { +
[RFC PATCH v2 02/13] futex2: Add support for shared futexes
Add support for shared futexes for cross-process resources. This design relies on the same approach done in old futex to create an unique id for file-backed shared memory, by using a counter at struct inode. There are two types of futexes: private and shared ones. The private are futexes meant to be used by threads that shares the same memory space, are easier to be uniquely identified an thus can have some performance optimization. The elements for identifying one are: the start address of the page where the address is, the address offset within the page and the current->mm pointer. Now, for uniquely identifying shared futex: - If the page containing the user address is an anonymous page, we can just use the same data used for private futexes (the start address of the page, the address offset within the page and the current->mm pointer) that will be enough for uniquely identifying such futex. We also set one bit at the key to differentiate if a private futex is used on the same address (mixing shared and private calls are not allowed). - If the page is file-backed, current->mm maybe isn't the same one for every user of this futex, so we need to use other data: the page->index, an UUID for the struct inode and the offset within the page. Note that members of futex_key doesn't have any particular meaning after they are part of the struct - they are just bytes to identify a futex. Given that, we don't need to use a particular name or type that matches the original data, we only need to care about the bitsize of each component and make both private and shared data fit in the same memory space. Signed-off-by: André Almeida --- fs/inode.c | 1 + include/linux/fs.h | 1 + include/uapi/linux/futex.h | 2 + kernel/futex2.c| 222 - 4 files changed, 220 insertions(+), 6 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index a047ab306f9a..c5e1dd13fd40 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -139,6 +139,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) inode->i_blkbits = sb->s_blocksize_bits; inode->i_flags = 0; atomic64_set(&inode->i_sequence, 0); + atomic64_set(&inode->i_sequence2, 0); atomic_set(&inode->i_count, 1); inode->i_op = &empty_iops; inode->i_fop = &no_open_fops; diff --git a/include/linux/fs.h b/include/linux/fs.h index ec8f3ddf4a6a..33683ff94cb3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -683,6 +683,7 @@ struct inode { }; atomic64_t i_version; atomic64_t i_sequence; /* see futex */ + atomic64_t i_sequence2; /* see futex2 */ atomic_ti_count; atomic_ti_dio_count; atomic_ti_writecount; diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h index 8d30f4b6d094..70ea66fffb1c 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -46,6 +46,8 @@ #define FUTEX_SIZE_MASK0x3 +#define FUTEX_SHARED_FLAG 8 + /* * Support for robust futexes: the kernel cleans up held futexes at * thread exit time. diff --git a/kernel/futex2.c b/kernel/futex2.c index 91bbf06fef8a..0ac669fe6edd 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -14,8 +14,10 @@ */ #include +#include #include #include +#include #include #include #include @@ -23,8 +25,8 @@ /** * struct futex_key - Components to build unique key for a futex - * @pointer: Pointer to current->mm - * @index: Start address of the page containing futex + * @pointer: Pointer to current->mm or inode's UUID for file backed futexes + * @index: Start address of the page containing futex or index of the page * @offset: Address offset of uaddr in a page */ struct futex_key { @@ -78,7 +80,12 @@ struct futex_bucket { }; /* Mask for futex2 flag operations */ -#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_CLOCK_REALTIME) +#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_CLOCK_REALTIME | FUTEX_SHARED_FLAG) + +#define is_object_shared ((futexv->objects[i].flags & FUTEX_SHARED_FLAG) ? true : false) + +#define FUT_OFF_INODE1 /* We set bit 0 if key has a reference on inode */ +#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */ static struct futex_bucket *futex_table; static unsigned int futex2_hashsize; @@ -126,16 +133,200 @@ static inline int bucket_get_waiters(struct futex_bucket *bucket) #endif } +/** + * futex_get_inode_uuid - Gets an UUID for an inode + * @inode: inode to get UUID + * + * Generate a machine wide unique identifier for this inode. + * + * This relies on u64 not wrapping in the life-time of the machine; which with + * 1ns resolution means almost 585 years. + * + * This further relies on the fact that a well formed program will n
[RFC PATCH v2 10/13] selftests: futex2: Add waitv test
Create a new file to test the waitv mechanism. Test both private and shared futexes. Wake the last futex in the array, and check if the return value from futex_waitv() is the right index. Signed-off-by: André Almeida --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../selftests/futex/functional/futex2_waitv.c | 157 ++ .../testing/selftests/futex/functional/run.sh | 3 + .../selftests/futex/include/futex2test.h | 26 +++ 5 files changed, 189 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index d61f1df94360..d0b8f637b786 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -7,3 +7,4 @@ futex_wait_timeout futex_wait_uninitialized_heap futex_wait_wouldblock futex2_wait +futex2_waitv diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index 9b334f190759..09c08ccdeaf2 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -16,7 +16,8 @@ TEST_GEN_FILES := \ futex_requeue_pi_mismatched_ops \ futex_wait_uninitialized_heap \ futex_wait_private_mapped_file \ - futex2_wait + futex2_wait \ + futex2_waitv TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/futex2_waitv.c b/tools/testing/selftests/futex/functional/futex2_waitv.c new file mode 100644 index ..2f81d296d95d --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_waitv.c @@ -0,0 +1,157 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/** + * + * Copyright Collabora Ltd., 2021 + * + * DESCRIPTION + * Test waitv/wake mechanism of futex2, using 32bit sized futexes. + * + * AUTHOR + * André Almeida + * + * HISTORY + * 2021-Feb-5: Initial version by André + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "futex2test.h" +#include "logging.h" + +#define TEST_NAME "futex2-wait" +#define timeout_ns 10 +#define WAKE_WAIT_US 1 +#define NR_FUTEXES 30 +struct futex_waitv waitv[NR_FUTEXES]; +u_int32_t futexes[NR_FUTEXES] = {0}; + +void usage(char *prog) +{ + printf("Usage: %s\n", prog); + printf(" -cUse color\n"); + printf(" -hDisplay this help message\n"); + printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", + VQUIET, VCRITICAL, VINFO); +} + +void *waiterfn(void *arg) +{ + struct timespec64 to64; + int res; + + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_sec++; + + res = futex2_waitv(waitv, NR_FUTEXES, 0, &to64); + if (res < 0) { + ksft_test_result_fail("futex2_waitv private returned: %d %s\n", + res ? errno : res, + res ? strerror(errno) : ""); + } else if (res != NR_FUTEXES - 1) { + ksft_test_result_fail("futex2_waitv private returned: %d %s\n", + res ? errno : res, + res ? strerror(errno) : ""); + } + + return NULL; +} + +int main(int argc, char *argv[]) +{ + pthread_t waiter; + int res, ret = RET_PASS; + int c, i; + + while ((c = getopt(argc, argv, "cht:v:")) != -1) { + switch (c) { + case 'c': + log_color(1); + break; + case 'h': + usage(basename(argv[0])); + exit(0); + case 'v': + log_verbosity(atoi(optarg)); + break; + default: + usage(basename(argv[0])); + exit(1); + } + } + + ksft_print_header(); + ksft_set_plan(2); + ksft_print_msg("%s: Test FUTEX2_WAITV\n", + basename(argv[0])); + + for (i = 0; i < NR_FUTEXES; i++) { + waitv[i].uaddr = &futexes[i]; + waitv[i].flags = FUTEX_32; + waitv[i].val = 0; + } + + /* Private waitv */ + if (pthread_create(&waiter, NULL, waite
[RFC PATCH v2 09/13] selftests: futex2: Add wouldblock test
Adapt existing futex wait wouldblock file to test the same mechanism for futex2. Signed-off-by: André Almeida --- .../futex/functional/futex_wait_wouldblock.c | 33 --- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c index 0ae390ff8164..ed3660090907 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c +++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c @@ -12,6 +12,7 @@ * * HISTORY * 2009-Nov-14: Initial version by Gowrishankar + * 2021-Feb-5: Add futex2 test by André * */ @@ -21,7 +22,7 @@ #include #include #include -#include "futextest.h" +#include "futex2test.h" #include "logging.h" #define TEST_NAME "futex-wait-wouldblock" @@ -39,6 +40,7 @@ void usage(char *prog) int main(int argc, char *argv[]) { struct timespec to = {.tv_sec = 0, .tv_nsec = timeout_ns}; + struct timespec64 to64; futex_t f1 = FUTEX_INITIALIZER; int res, ret = RET_PASS; int c; @@ -61,18 +63,41 @@ int main(int argc, char *argv[]) } ksft_print_header(); - ksft_set_plan(1); + ksft_set_plan(2); ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n", basename(argv[0])); info("Calling futex_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1); res = futex_wait(&f1, f1+1, &to, FUTEX_PRIVATE_FLAG); if (!res || errno != EWOULDBLOCK) { - fail("futex_wait returned: %d %s\n", + ksft_test_result_fail("futex_wait returned: %d %s\n", res ? errno : res, res ? strerror(errno) : ""); ret = RET_FAIL; + } else { + ksft_test_result_pass("futex_wait wouldblock succeeds\n"); } - print_result(TEST_NAME, ret); + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + info("Calling futex2_wait on f1: %u @ %p with val=%u\n", f1, &f1, f1+1); + res = futex2_wait(&f1, f1+1, FUTEX_32, &to64); + if (!res || errno != EWOULDBLOCK) { + ksft_test_result_fail("futex2_wait returned: %d %s\n", +res ? errno : res, res ? strerror(errno) : ""); + ret = RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait wouldblock succeeds\n"); + } + + ksft_print_cnts(); return ret; } -- 2.30.1
[RFC PATCH v2 05/13] futex2: Add compatibility entry point for x86_x32 ABI
New syscalls should use the same entry point for x86_64 and x86_x32 paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida --- kernel/futex2.c | 42 +++--- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/kernel/futex2.c b/kernel/futex2.c index b3277ab39b3c..92b56020 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -23,6 +23,10 @@ #include #include +#ifdef CONFIG_X86_64 +#include +#endif + /** * struct futex_key - Components to build unique key for a futex * @pointer: Pointer to current->mm or inode's UUID for file backed futexes @@ -856,7 +860,16 @@ SYSCALL_DEFINE4(futex_waitv, struct futex_waitv __user *, waiters, futexv->hint = false; futexv->task = current; - ret = futex_parse_waitv(futexv, waiters, nr_futexes); +#ifdef CONFIG_X86_X32_ABI + if (in_x32_syscall()) { + ret = compat_futex_parse_waitv(futexv, (struct compat_futex_waitv *)waiters, + nr_futexes); + } else +#endif + { + ret = futex_parse_waitv(futexv, waiters, nr_futexes); + } + if (!ret) ret = __futex_waitv(futexv, nr_futexes, timo, flags); @@ -1163,13 +1176,28 @@ SYSCALL_DEFINE6(futex_requeue, struct futex_requeue __user *, uaddr1, if (flags) return -EINVAL; - ret = futex_parse_requeue(&rq1, uaddr1, &shared1); - if (ret) - return ret; +#ifdef CONFIG_X86_X32_ABI + if (in_x32_syscall()) { + ret = compat_futex_parse_requeue(&rq1, (struct compat_futex_requeue *)uaddr1, +&shared1); + if (ret) + return ret; - ret = futex_parse_requeue(&rq2, uaddr2, &shared2); - if (ret) - return ret; + ret = compat_futex_parse_requeue(&rq2, (struct compat_futex_requeue *)uaddr2, +&shared2); + if (ret) + return ret; + } else +#endif + { + ret = futex_parse_requeue(&rq1, uaddr1, &shared1); + if (ret) + return ret; + + ret = futex_parse_requeue(&rq2, uaddr2, &shared2); + if (ret) + return ret; + } return __futex_requeue(rq1, rq2, nr_wake, nr_requeue, cmpval, shared1, shared2); } -- 2.30.1
[RFC PATCH v2 11/13] selftests: futex2: Add requeue test
Add testing for futex_requeue(). The first test just requeue from one waiter to another one, and wake it. The second performs both wake and requeue, and we check return values to see if the operation woke/requeued the expected number of waiters. Signed-off-by: André Almeida --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- .../futex/functional/futex2_requeue.c | 164 ++ .../selftests/futex/include/futex2test.h | 16 ++ 4 files changed, 183 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore index d0b8f637b786..af7557e821da 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -8,3 +8,4 @@ futex_wait_uninitialized_heap futex_wait_wouldblock futex2_wait futex2_waitv +futex2_requeue diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile index 09c08ccdeaf2..3ccb9ea58ddd 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -17,7 +17,8 @@ TEST_GEN_FILES := \ futex_wait_uninitialized_heap \ futex_wait_private_mapped_file \ futex2_wait \ - futex2_waitv + futex2_waitv \ + futex2_requeue TEST_PROGS := run.sh diff --git a/tools/testing/selftests/futex/functional/futex2_requeue.c b/tools/testing/selftests/futex/functional/futex2_requeue.c new file mode 100644 index ..1bc3704dc8c2 --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_requeue.c @@ -0,0 +1,164 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/** + * + * Copyright Collabora Ltd., 2021 + * + * DESCRIPTION + * Test requeue mechanism of futex2, using 32bit sized futexes. + * + * AUTHOR + * André Almeida + * + * HISTORY + * 2021-Feb-5: Initial version by André + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "futex2test.h" +#include "logging.h" + +#define TEST_NAME "futex2-wait" +#define timeout_ns 3000 +#define WAKE_WAIT_US 1 +volatile futex_t *f1; + +void usage(char *prog) +{ + printf("Usage: %s\n", prog); + printf(" -cUse color\n"); + printf(" -hDisplay this help message\n"); + printf(" -v L Verbosity level: %d=QUIET %d=CRITICAL %d=INFO\n", + VQUIET, VCRITICAL, VINFO); +} + +void *waiterfn(void *arg) +{ + struct timespec64 to64; + + /* setting absolute timeout for futex2 */ + if (gettime64(CLOCK_MONOTONIC, &to64)) + error("gettime64 failed\n", errno); + + to64.tv_nsec += timeout_ns; + + if (to64.tv_nsec >= 10) { + to64.tv_sec++; + to64.tv_nsec -= 10; + } + + if (futex2_wait(f1, *f1, FUTEX_32, &to64)) + printf("waiter failed errno %d\n", errno); + + return NULL; +} + +int main(int argc, char *argv[]) +{ + pthread_t waiter[10]; + int res, ret = RET_PASS; + int c, i; + volatile futex_t _f1 = 0; + volatile futex_t f2 = 0; + struct futex_requeue r1, r2; + + f1 = &_f1; + + r1.flags = FUTEX_32; + r2.flags = FUTEX_32; + + r1.uaddr = f1; + r2.uaddr = &f2; + + while ((c = getopt(argc, argv, "cht:v:")) != -1) { + switch (c) { + case 'c': + log_color(1); + break; + case 'h': + usage(basename(argv[0])); + exit(0); + case 'v': + log_verbosity(atoi(optarg)); + break; + default: + usage(basename(argv[0])); + exit(1); + } + } + + ksft_print_header(); + ksft_set_plan(2); + ksft_print_msg("%s: Test FUTEX2_REQUEUE\n", + basename(argv[0])); + + /* +* Requeue a waiter from f1 to f2, and wake f2. +*/ + if (pthread_create(&waiter[0], NULL, waiterfn, NULL)) + error("pthread_create failed\n", errno); + + usleep(WAKE_WAIT_US); + + res = futex2_requeue(&r1, &r2, 0, 1, 0, 0); + if (res != 1) { + ksft_test_result_fail("futex2_requeue private returned: %d %s\n", +
[RFC PATCH v2 12/13] perf bench: Add futex2 benchmark tests
Add support at the existing futex benchmarking code base to enable futex2 calls. `perf bench` tests can be used not only as a way to measure the performance of implementation, but also as stress testing for the kernel infrastructure. Signed-off-by: André Almeida --- tools/arch/x86/include/asm/unistd_64.h | 12 ++ tools/perf/bench/bench.h | 4 ++ tools/perf/bench/futex-hash.c | 24 +-- tools/perf/bench/futex-requeue.c | 57 -- tools/perf/bench/futex-wake-parallel.c | 41 +++--- tools/perf/bench/futex-wake.c | 37 + tools/perf/bench/futex.h | 47 + tools/perf/builtin-bench.c | 18 ++-- 8 files changed, 206 insertions(+), 34 deletions(-) diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/asm/unistd_64.h index 4205ed4158bf..b65c51e8d675 100644 --- a/tools/arch/x86/include/asm/unistd_64.h +++ b/tools/arch/x86/include/asm/unistd_64.h @@ -17,3 +17,15 @@ #ifndef __NR_setns #define __NR_setns 308 #endif + +#ifndef __NR_futex_wait +# define __NR_futex_wait 443 +#endif + +#ifndef __NR_futex_wake +# define __NR_futex_wake 444 +#endif + +#ifndef __NR_futex_requeue +# define __NR_futex_requeue 446 +#endif diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h index eac36afab2b3..12346844b354 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -38,9 +38,13 @@ int bench_mem_memcpy(int argc, const char **argv); int bench_mem_memset(int argc, const char **argv); int bench_mem_find_bit(int argc, const char **argv); int bench_futex_hash(int argc, const char **argv); +int bench_futex2_hash(int argc, const char **argv); int bench_futex_wake(int argc, const char **argv); +int bench_futex2_wake(int argc, const char **argv); int bench_futex_wake_parallel(int argc, const char **argv); +int bench_futex2_wake_parallel(int argc, const char **argv); int bench_futex_requeue(int argc, const char **argv); +int bench_futex2_requeue(int argc, const char **argv); /* pi futexes */ int bench_futex_lock_pi(int argc, const char **argv); int bench_epoll_wait(int argc, const char **argv); diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index b65373ce5c4f..1068749af40c 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -33,7 +33,7 @@ static unsigned int nthreads = 0; static unsigned int nsecs= 10; /* amount of futexes per thread */ static unsigned int nfutexes = 1024; -static bool fshared = false, done = false, silent = false; +static bool fshared = false, done = false, silent = false, futex2 = false; static int futex_flag = 0; struct timeval bench__start, bench__end, bench__runtime; @@ -85,7 +85,10 @@ static void *workerfn(void *arg) * such as internal waitqueue handling, thus enlarging * the critical region protected by hb->lock. */ - ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag); + if (!futex2) + ret = futex_wait(&w->futex[i], 1234, NULL, futex_flag); + else + ret = futex2_wait(&w->futex[i], 1234, futex_flag, NULL); if (!silent && (!ret || errno != EAGAIN || errno != EWOULDBLOCK)) warn("Non-expected futex return call"); @@ -116,7 +119,7 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } -int bench_futex_hash(int argc, const char **argv) +static int __bench_futex_hash(int argc, const char **argv) { int ret = 0; cpu_set_t cpuset; @@ -148,7 +151,9 @@ int bench_futex_hash(int argc, const char **argv) if (!worker) goto errmem; - if (!fshared) + if (futex2) + futex_flag = FUTEX_32 | (fshared * FUTEX_SHARED_FLAG); + else if (!fshared) futex_flag = FUTEX_PRIVATE_FLAG; printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n", @@ -228,3 +233,14 @@ int bench_futex_hash(int argc, const char **argv) errmem: err(EXIT_FAILURE, "calloc"); } + +int bench_futex_hash(int argc, const char **argv) +{ + return __bench_futex_hash(argc, argv); +} + +int bench_futex2_hash(int argc, const char **argv) +{ + futex2 = true; + return __bench_futex_hash(argc, argv); +} diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c index 5fa23295ee5f..6cdd649b54f4 100644 --- a/tools/perf/bench/futex-requeue.c +++ b/tools/perf/bench/futex-requeue.c @@ -2,8 +2,8 @@ /* * Copyright (C) 2013 Davidlohr Bueso * - * futex-requeue: Block a bunch of threads on futex1 and requeue them - *on futex2, N at a
[RFC PATCH v2 13/13] kernel: Enable waitpid() for futex2
To make pthreads works as expected if they are using futex2, wake clear_child_tid with futex2 as well. This is make applications that uses waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the child to terminate. Given that apps should not mix futex() and futex2(), any correct app will trigger a harmless noop wakeup on the interface that it isn't using. Signed-off-by: André Almeida --- This commit is here for the intend to show what we need to do in order to get a full NPTL working on top of futex2. It should be merged after we talk to glibc folks on the details around the futex_wait() side. For instance, we could use this as an opportunity to use private futexes or 8bit sized futexes, but both sides need to use the exactly same flags. --- include/linux/syscalls.h | 2 ++ kernel/fork.c| 2 ++ kernel/futex2.c | 30 ++ 3 files changed, 22 insertions(+), 12 deletions(-) diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index b0675f236066..b07b7d4334a6 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1316,6 +1316,8 @@ int ksys_ipc(unsigned int call, int first, unsigned long second, unsigned long third, void __user * ptr, long fifth); int compat_ksys_ipc(u32 call, int first, int second, u32 third, u32 ptr, u32 fifth); +long ksys_futex_wake(void __user *uaddr, unsigned long nr_wake, +unsigned int flags); /* * The following kernel syscall equivalents are just wrappers to fs-internal diff --git a/kernel/fork.c b/kernel/fork.c index d66cd1014211..e39846a73a43 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1308,6 +1308,8 @@ static void mm_release(struct task_struct *tsk, struct mm_struct *mm) put_user(0, tsk->clear_child_tid); do_futex(tsk->clear_child_tid, FUTEX_WAKE, 1, NULL, NULL, 0, 0); + ksys_futex_wake(tsk->clear_child_tid, 1, + FUTEX_32 | FUTEX_SHARED_FLAG); } tsk->clear_child_tid = NULL; } diff --git a/kernel/futex2.c b/kernel/futex2.c index 92b56020..dd6f54ae0220 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -924,18 +924,8 @@ static inline bool futex_match(struct futex_key key1, struct futex_key key2) key1.offset == key2.offset); } -/** - * sys_futex_wake - Wake a number of futexes waiting on an address - * @uaddr: Address of futex to be woken up - * @nr_wake: Number of futexes waiting in uaddr to be woken up - * @flags: Flags for size and shared - * - * Wake `nr_wake` threads waiting at uaddr. - * - * Returns the number of woken threads on success, error code otherwise. - */ -SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, - unsigned int, flags) +long ksys_futex_wake(void __user *uaddr, unsigned long nr_wake, +unsigned int flags) { bool shared = (flags & FUTEX_SHARED_FLAG) ? true : false; unsigned int size = flags & FUTEX_SIZE_MASK; @@ -972,6 +962,22 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, return ret; } +/** + * sys_futex_wake - Wake a number of futexes waiting on an address + * @uaddr: Address of futex to be woken up + * @nr_wake: Number of futexes waiting in uaddr to be woken up + * @flags: Flags for size and shared + * + * Wake `nr_wake` threads waiting at uaddr. + * + * Returns the number of woken threads on success, error code otherwise. + */ +SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, + unsigned int, flags) +{ + return ksys_futex_wake(uaddr, nr_wake, flags); +} + static void futex_double_unlock(struct futex_bucket *b1, struct futex_bucket *b2) { spin_unlock(&b1->lock); -- 2.30.1
[PATCH 0/3] fs: Fix dangling dentries on casefold directories
Hello, This patchset fixes a bug in case-insensitive directories. When I submitted a patchset for adding case-insensitive support for tmpfs[0], Al Viro noted that my implementation didn't take in account previous dentries that the directory could have created before being changed. Further investigation showed that neither ext4 or f2fs also doesn't take this case in consideration as well. * Why can't we have negative dentries with casefold? The assumption that the directory has no dentries can lead to a buggy behavior (note that since the directory must be empty when setting the casefold flag, all dentries there are negative). Imagine the following operation on a mounted ext4 with casefold support enabled: mkdir dir mkdir dir/C # creates a dentry for `C` (dentry D) rm -r dir/C # makes dentry D a negative one Now, let's make it case-insensitive: chattr +F dir/ # now dir/ is a casefold directory mkdir dir/c # if hash for `c` collides with dentry D # d_compare does a case-insensitive compare # and assumes that dentry D is the one to be used ls dir/ # VFS uses the name at dentry D for the final file C # and here's the bug In that way, all negative dentries at dir/ will become dangling dentries that can't be trusted to be used an will just waste memory. The problem with negative dentries is well-know, and both the current code and commits documents it, but this case hasn't been taken in consideration so far. * Reproducing Given that the bug only happens with a hash collision, I added the following snippet at the beginning of generic_ci_d_hash(): str->hash = 0; return 0; This means that all dentries will have the same hash. This is not good for performance, but it should not break anything AFAIK. Then, just run the example showed in the latter section. * Fixing To fix this bug, I added a function that, given an inode, for each alias of it, will remove all the sub-dentries at that directory. Given that they are all negative dentries, we don't need to do the whole d_walk, since they don't have children and are also ready to be d_droped and dputed. Then, at ext4 and f2fs, when a dir is going to turn on the casefold flag, we call this function. Thanks, André [0] https://lore.kernel.org/linux-fsdevel/20210323195941.69720-1-andrealm...@collabora.com/T/#m3265579197095b792ee8b8e8b7f84a58c25c456b André Almeida (3): fs/dcache: Add d_clear_dir_neg_dentries() ext4: Prevent dangling dentries on casefold directories f2fs: Prevent dangling dentries on casefold directories fs/dcache.c| 27 +++ fs/ext4/ioctl.c| 3 +++ fs/f2fs/file.c | 4 include/linux/dcache.h | 1 + 4 files changed, 35 insertions(+) -- 2.31.0
[PATCH 1/3] fs/dcache: Add d_clear_dir_neg_dentries()
For directories with negative dentries that are becoming case-insensitive dirs, we need to remove all those negative dentries, otherwise they will become dangling dentries. During the creation of a new file, if a d_hash collision happens and the names match in a case-insensitive way, the name of the file will be the name defined at the negative dentry, that may be different from the specified by the user. To prevent this from happening, we need to remove all dentries in a directory. Given that the directory must be empty before we call this function we are sure that all dentries there will be negative. Create a function to remove all negative dentries from a directory, to be used as explained above by filesystems that support case-insensitive lookups. Signed-off-by: André Almeida --- fs/dcache.c| 27 +++ include/linux/dcache.h | 1 + 2 files changed, 28 insertions(+) diff --git a/fs/dcache.c b/fs/dcache.c index 7d24ff7eb206..fafb3016d6fd 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1723,6 +1723,33 @@ void d_invalidate(struct dentry *dentry) } EXPORT_SYMBOL(d_invalidate); +/** + * d_clear_dir_neg_dentries - Remove negative dentries in an inode + * @dir: Directory to clear negative dentries + * + * For directories with negative dentries that are becoming case-insensitive + * dirs, we need to remove all those negative dentries, otherwise they will + * become dangling dentries. During the creation of a new file, if a d_hash + * collision happens and the names match in a case-insensitive, the name of + * the file will be the name defined at the negative dentry, that can be + * different from the specified by the user. To prevent this from happening, we + * need to remove all dentries in a directory. Given that the directory must be + * empty before we call this function we are sure that all dentries there will + * be negative. + */ +void d_clear_dir_neg_dentries(struct inode *dir) +{ + struct dentry *alias, *dentry; + + hlist_for_each_entry(alias, &dir->i_dentry, d_u.d_alias) { + list_for_each_entry(dentry, &alias->d_subdirs, d_child) { + d_drop(dentry); + dput(dentry); + } + } +} +EXPORT_SYMBOL(d_clear_dir_neg_dentries); + /** * __d_alloc - allocate a dcache entry * @sb: filesystem it will belong to diff --git a/include/linux/dcache.h b/include/linux/dcache.h index c1e48014106f..c43cd0be077f 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -250,6 +250,7 @@ extern void shrink_dcache_sb(struct super_block *); extern void shrink_dcache_parent(struct dentry *); extern void shrink_dcache_for_umount(struct super_block *); extern void d_invalidate(struct dentry *); +extern void d_clear_dir_neg_dentries(struct inode *); /* only used at mount-time */ extern struct dentry * d_make_root(struct inode *); -- 2.31.0
[PATCH 2/3] ext4: Prevent dangling dentries on casefold directories
Before making a folder a case-insensitive one, this folder could have been used before and created some negative dentries (given that the folder needs to be empty before making it case-insensitive, all detries there are negative ones). During a new file creation, if a d_hash() collision happens and the name matches a negative dentry, the new file might have a name different than the specified by user. To prevent this from happening, remove all negative dentries in a directory before making it a case-folded one. Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups") Signed-off-by: André Almeida --- fs/ext4/ioctl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c index a2cf35066f46..0eede4c93c22 100644 --- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -381,6 +381,9 @@ static int ext4_ioctl_setflags(struct inode *inode, err = -ENOTEMPTY; goto flags_out; } + + if (!(oldflags & EXT4_CASEFOLD_FL) && (flags & EXT4_CASEFOLD_FL)) + d_clear_dir_neg_dentries(inode); } /* -- 2.31.0
[PATCH 3/3] f2fs: Prevent dangling dentries on casefold directories
Before making a folder a case-insensitive one, this folder could have been used before and created some negative dentries (given that the folder needs to be empty before making it case-insensitive, all detries there are negative ones). During a new file creation, if a d_hash() collision happens and the name matches a negative dentry, the new file might have a name different than the specified by user. To prevent this from happening, remove all negative dentries in a directory before making it a case-folded one. Fixes: 2c2eb7a300cd ("f2fs: Support case-insensitive file name lookups") Signed-off-by: André Almeida --- fs/f2fs/file.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index d26ff2ae3f5e..616b7eb43795 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1826,6 +1826,10 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask) return -EOPNOTSUPP; if (!f2fs_empty_dir(inode)) return -ENOTEMPTY; + + if (!(masked_flags & F2FS_CASEFOLD_FL) && + (iflags & F2FS_CASEFOLD_FL)) + d_clear_dir_neg_dentries(inode); } if (iflags & (F2FS_COMPR_FL | F2FS_NOCOMP_FL)) { -- 2.31.0
Re: [PATCH 1/3] fs/dcache: Add d_clear_dir_neg_dentries()
Às 12:07 de 28/03/21, Matthew Wilcox escreveu: On Sun, Mar 28, 2021 at 11:43:54AM -0300, André Almeida wrote: +/** + * d_clear_dir_neg_dentries - Remove negative dentries in an inode + * @dir: Directory to clear negative dentries + * + * For directories with negative dentries that are becoming case-insensitive + * dirs, we need to remove all those negative dentries, otherwise they will + * become dangling dentries. During the creation of a new file, if a d_hash + * collision happens and the names match in a case-insensitive, the name of + * the file will be the name defined at the negative dentry, that can be + * different from the specified by the user. To prevent this from happening, we + * need to remove all dentries in a directory. Given that the directory must be + * empty before we call this function we are sure that all dentries there will + * be negative. + */ This is quite the landmine of a function. It _assumes_ that the directory is empty, and clears all dentries in it. +void d_clear_dir_neg_dentries(struct inode *dir) +{ + struct dentry *alias, *dentry; + + hlist_for_each_entry(alias, &dir->i_dentry, d_u.d_alias) { + list_for_each_entry(dentry, &alias->d_subdirs, d_child) { + d_drop(dentry); + dput(dentry); + } I would be happier if it included a check for negativity. d_is_negative() or maybe this newfangled d_really_is_negative() (i haven't stayed up to speed on the precise difference between the two) Makes sense. And given that this only makes sense if the directory is empty, if it founds a non-negative dentry, it should return some error right? + } +} +EXPORT_SYMBOL(d_clear_dir_neg_dentries); I'd rather see this _GPL for such an internal thing.
Re: [PATCH] media: vimc: Implement debayer control for mean window size
Hello Arthur and Laís, Thanks for your patch! On 9/1/19 4:40 PM, Arthur Moraes do Lago wrote: > Add mean window size parameter for debayer filter as a control in > vimc-debayer. > > vimc-debayer was patched to allow changing mean windows parameter > of the filter without needing to reload the driver. The parameter > can now be set using a v4l2-ctl control(mean_window_size). > > Co-developed-by: Laís Pessine do Carmo > Signed-off-by: Laís Pessine do Carmo > Signed-off-by: Arthur Moraes do Lago > > --- > This patch was made on top of Shuah Khan's patch (162623). > Thanks. > --- > drivers/media/platform/vimc/vimc-common.h | 1 + > drivers/media/platform/vimc/vimc-debayer.c | 81 ++ > 2 files changed, 70 insertions(+), 12 deletions(-) > > diff --git a/drivers/media/platform/vimc/vimc-common.h > b/drivers/media/platform/vimc/vimc-common.h > index 5b2282de395c..547ff04a415e 100644 > --- a/drivers/media/platform/vimc/vimc-common.h > +++ b/drivers/media/platform/vimc/vimc-common.h > @@ -19,6 +19,7 @@ > #define VIMC_CID_VIMC_BASE (0x00f0 | 0xf000) > #define VIMC_CID_VIMC_CLASS (0x00f0 | 1) > #define VIMC_CID_TEST_PATTERN(VIMC_CID_VIMC_BASE + 0) > +#define VIMC_CID_MEAN_WIN_SIZE (VIMC_CID_VIMC_BASE + 1) > > #define VIMC_FRAME_MAX_WIDTH 4096 > #define VIMC_FRAME_MAX_HEIGHT 2160 > diff --git a/drivers/media/platform/vimc/vimc-debayer.c > b/drivers/media/platform/vimc/vimc-debayer.c > index 6cee911bf149..aa3edeed96bc 100644 > --- a/drivers/media/platform/vimc/vimc-debayer.c > +++ b/drivers/media/platform/vimc/vimc-debayer.c > @@ -11,17 +11,11 @@ > #include > #include > #include > +#include > #include > > #include "vimc-common.h" > > -static unsigned int deb_mean_win_size = 3; > -module_param(deb_mean_win_size, uint, ); > -MODULE_PARM_DESC(deb_mean_win_size, " the window size to calculate the > mean.\n" > - "NOTE: the window size needs to be an odd number, as the main pixel " > - "stays in the center of the window, otherwise the next odd number " > - "is considered"); > - Since you removed the module parameter, please make sure to update vimc documentation section explaining about it: https://git.linuxtv.org/media_tree.git/tree/Documentation/media/v4l-drivers/vimc.rst#n94 Thanks, André > #define IS_SINK(pad) (!pad) > #define IS_SRC(pad) (pad) > > @@ -49,6 +43,8 @@ struct vimc_deb_device { > u8 *src_frame; > const struct vimc_deb_pix_map *sink_pix_map; > unsigned int sink_bpp; > + unsigned int mean_win_size; > + struct v4l2_ctrl_handler hdl; > }; > > static const struct v4l2_mbus_framefmt sink_fmt_default = { > @@ -387,7 +383,7 @@ static void vimc_deb_calc_rgb_sink(struct vimc_deb_device > *vdeb, >* the top left corner of the mean window (considering the current >* pixel as the center) >*/ > - seek = deb_mean_win_size / 2; > + seek = vdeb->mean_win_size / 2; > > /* Sum the values of the colors in the mean window */ > > @@ -477,6 +473,33 @@ static void *vimc_deb_process_frame(struct > vimc_ent_device *ved, > > } > > +static inline void vimc_deb_s_mean_win_size(struct vimc_deb_device *vdeb, > + unsigned int mean_win_size) > +{ > + if (vdeb->mean_win_size == mean_win_size) > + return; > + vdeb->mean_win_size = mean_win_size; > +} > + > +static int vimc_deb_s_ctrl(struct v4l2_ctrl *ctrl) > +{ > + struct vimc_deb_device *vdeb = > + container_of(ctrl->handler, struct vimc_deb_device, hdl); > + > + switch (ctrl->id) { > + case VIMC_CID_MEAN_WIN_SIZE: > + vimc_deb_s_mean_win_size(vdeb, ctrl->val); > + break; > + default: > + return -EINVAL; > + } > + return 0; > +} > + > +static const struct v4l2_ctrl_ops vimc_deb_ctrl_ops = { > + .s_ctrl = vimc_deb_s_ctrl, > +}; > + > static void vimc_deb_release(struct v4l2_subdev *sd) > { > struct vimc_deb_device *vdeb = > @@ -502,6 +525,24 @@ void vimc_deb_rm(struct vimc_device *vimc, struct > vimc_ent_config *vcfg) > vimc_ent_sd_unregister(ved, &vdeb->sd); > } > > +static const struct v4l2_ctrl_config vimc_deb_ctrl_class = { > + .flags = V4L2_CTRL_FLAG_READ_ONLY | V4L2_CTRL_FLAG_WRITE_ONLY, > + .id = VIMC_CID_VIMC_CLASS, > + .name = "VIMC Controls", > + .type = V4L2_CTRL_TYPE_CTRL_CLASS, > +}; > + > +static const struct v4l2_ctrl_config vimc_deb_ctrl_mean_win_size = { > + .ops = &vimc_deb_ctrl_ops, > + .id = VIMC_CID_MEAN_WIN_SIZE, > + .name = "Mean window size", > + .type = V4L2_CTRL_TYPE_INTEGER, > + .min = 1, > + .max = 99, > + .step = 2, > + .def = 3, > +}; > + > int vimc_deb_add(struct vimc_device *vimc, struct vimc_ent_config *vcfg) > { > struct v4l2_device *v4l2_dev = &vimc->v4l2_dev; > @@ -513,6 +554,16 @@ int vimc_deb_add(struct vim
Re: [PATCH 1/1] blk-mq: fill header with kernel-doc
On 9/30/19 6:54 PM, Minwoo Im wrote: > Hi André, > >> -/* >> +/** >> + * blk_mq_rq_from_pdu - cast a PDU to a request >> + * @pdu: the PDU (protocol unit request) to be casted > > It makes sense, but it looks like PDU stands for protocol unit request. > Could we have it "PDU(Protocol Data Unit)" ? > Ops, thank your for the input :) > Thanks, > >
Re: [PATCH 1/1] blk-mq: fill header with kernel-doc
On 9/30/19 7:01 PM, Bart Van Assche wrote: > On 9/30/19 12:48 PM, André Almeida wrote: >> Insert documentation for structs, enums and functions at header file. >> Format existing and new comments at struct blk_mq_ops as >> kernel-doc comments. > > Hi André, > > Seeing the documentation being improved is great. However, this patch > conflicts with a patch series in my tree and that I plan to post soon. > So I would appreciate it if this patch would be withheld until after my > patch series has been accepted. > Sure, no problem. If it helps the workflow, I could rebase my patch at the top of your tree. > Thanks, > > Bart. >
Re: [Lkcamp] [RFC PATCH] media: vimc: vimc_pix_map_fmt_info() can be static
Hi Helen, On 10/5/19 10:36 PM, Helen Koike wrote: > Hi Carlos, > > On 10/5/19 9:28 PM, kbuild test robot wrote: >> Fixes: 4d124d159dff ("media: vimc: get pixformat info from v4l2_format_info >> to avoid code repetition") > Usually, the Fixes flag is used for something that is already accepted in > mainline. > If you want to fix anything in the previous version of the patch you sent, > you should update the last patch > and re-send it as a new version, i.e. [PATCH v2], adding a change log just > after the 3 dashes to explain > what you changed. The author of this commit is the "kbuild test robot" rather than Carlos, it was generated automatically to fix a warning the robot found at Carlos commit :) > Check this example: > > https://www.spinics.net/lists/linux-media/msg158251.html > > Let me know if you need help! > And thanks for working on this :) > Helen > >> Signed-off-by: kbuild test robot >> --- >> vimc-common.c |2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/media/platform/vimc/vimc-common.c >> b/drivers/media/platform/vimc/vimc-common.c >> index 9ea698810e1a1..c37442aba70b1 100644 >> --- a/drivers/media/platform/vimc/vimc-common.c >> +++ b/drivers/media/platform/vimc/vimc-common.c >> @@ -118,7 +118,7 @@ static struct vimc_pix_map_fmt vimc_pix_map_fmt_list[] = >> { >> }, >> }; >> >> -struct vimc_pix_map vimc_pix_map_fmt_info(struct vimc_pix_map_fmt *vfmt) >> +static struct vimc_pix_map vimc_pix_map_fmt_info(struct vimc_pix_map_fmt >> *vfmt) >> { >> >> struct vimc_pix_map vpix = { >> > ___ > Lkcamp mailing list > lkc...@lists.libreplanetbr.org > https://lists.libreplanetbr.org/mailman/listinfo/lkcamp
[PATCH v3 1/1] blk-mq: fill header with kernel-doc
Insert documentation for structs, enums and functions at header file. Format existing and new comments at struct blk_mq_ops as kernel-doc comments. Signed-off-by: André Almeida --- Hello, This patch is an effort to enhance the documentation of the multiqueue API. To check if the comments are confirming to kernel-doc format, you first need to apply a patch[1] solving a bug around the `__cacheline...` directive. Then, just run ./scripts/kernel-doc -none include/linux/blk-mq.h to check that there is no warning at this documentation. I've done my best at the source to check the purpose of each struct and its members, but please let me know if I get something wrong. Changes since v2: - Rewrite and improve definition of a lot of members (thanks Bart and Gabriel!) Changes since v1: - Fix the meaning of PDU - Uses nested comments for long structs - Slight reword member definition to look closer to work made at commit "block: Document all members of blk_mq_tag_set and bkl_mq_queue_map" [1] https://lkml.org/lkml/2019/9/17/820 --- include/linux/blk-mq.h | 225 + 1 file changed, 185 insertions(+), 40 deletions(-) diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index e0fce93ac127..4919d22e1aff 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -10,74 +10,171 @@ struct blk_mq_tags; struct blk_flush_queue; /** - * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware block device + * struct blk_mq_hw_ctx - State for a hardware queue facing the hardware + * block device */ struct blk_mq_hw_ctx { struct { + /** @lock: Protects the dispatch list. */ spinlock_t lock; + /** +* @dispatch: Used for requests that are ready to be +* dispatched to the hardware but for some reason (e.g. lack of +* resources) could not be sent to the hardware. As soon as the +* driver can send new requests, requests at this list will +* be sent first for a fairer dispatch. +*/ struct list_headdispatch; - unsigned long state; /* BLK_MQ_S_* flags */ +/** + * @state: BLK_MQ_S_* flags. Defines the state of the hw + * queue (active, scheduled to restart, stopped). + */ + unsigned long state; } cacheline_aligned_in_smp; + /** +* @run_work: Used for scheduling a hardware queue run at a later time. +*/ struct delayed_work run_work; + /** @cpumask: Map of available CPUs where this hctx can run. */ cpumask_var_t cpumask; + /** +* @next_cpu: Used by blk_mq_hctx_next_cpu() for round-robin CPU +* selection from @cpumask. +*/ int next_cpu; + /** +* @next_cpu_batch: Counter of how many works left in the batch before +* changing to the next CPU. +*/ int next_cpu_batch; - unsigned long flags; /* BLK_MQ_F_* flags */ + /** @flags: BLK_MQ_F_* flags. Defines the behaviour of the queue. */ + unsigned long flags; + /** +* @sched_data: Pointer owned by the IO scheduler attached to a request +* queue. It's up to the IO scheduler how to use this pointer. +*/ void*sched_data; + /** +* @queue: Pointer to the request queue that owns this hardware context. +*/ struct request_queue*queue; + /** @fq: Queue of requests that need to perform a flush operation. */ struct blk_flush_queue *fq; + /** +* @driver_data: Pointer to data owned by the block driver that created +* this hctx +*/ void*driver_data; + /** +* @ctx_map: Bitmap for each software queue. If bit is on, there is a +* pending request in that software queue. +*/ struct sbitmap ctx_map; + /** +* @dispatch_from: Software queue to be used when no scheduler was +* selected. +*/ struct blk_mq_ctx *dispatch_from; + /** +* @dispatch_busy: Number used by blk_mq_update_dispatch_busy() to +* decide if the hw_queue is busy using Exponential Weighted Moving +* Average algorithm. +*/ unsigned intdispatch_busy; + /** @type: HCTX_TYPE_* flags. Type of hardware queue. */ unsigned short type; + /** @nr_ctx: Number of software queues. */ unsigned short nr_ctx; + /** @ctxs: Array of software queues. */ struct blk_mq_ctx **ctxs; + /** @dispatch_wait_lock: Lock for dispatch_wait queue. *
Re: [PATCH 0/7] media: vimc: Add a V4L2 output device
Hello, On 7/10/19 4:33 AM, Hans Verkuil wrote: > On 7/10/19 12:19 AM, Helen Koike wrote: >> Hi André, >> >> Thanks for the patches. >> >> On 7/2/19 12:47 PM, André Almeida wrote: >>> Hello, >>> >>> This patch adds a V4L2 output device on vimc, that comply with V4L2 API >>> for video output. If there is an output device and a capture device at the >>> same pipeline, one can get a video loopback pipeline feeding frames at >>> the output and then seeing them at the capture. It's possible to insert >>> vimc submodules at the pipeline to modify the image (e.g. a scaler). >>> >>> If one starts a streaming at the capture, with the output off, the >>> capture will display a noisy frame. If one starts a streaming at the >>> output with the capture off, the output will just consume the buffers, >>> without sending them to the pipeline. If both output and capture are >>> streaming, the loopback will happen. >> I understand why it is done like this in vivid, but I was wondering, if we >> have a pipeline like: >> output -> capture >> Shouldn't streaming from the capture just stalls if there is no frame >> available in the output (i.e. streaming in the output is off) ? But then I'm >> not sure what the framerate in the capture would mean. >> >> Hans, what do you think? > If you set up the pipeline like this: > > Video Output -> Scaler -> Video Capture If the capture will stall if there's no frame from the video output, how can I add support for this kind of pipeline at test-media? It would be required to send frames to the output device while running `v4l2-compliance` at the capture device to make testing possible. Thanks, André > Then this is a mem2mem device (except with two separate video devices) and > framerate doesn't apply anymore. And video capture will just stall if there > is no video output frame provided. > > It's how e.g. omap3isp works. > > Regards, > > Hans > >> Thanks, >> Helen >> >>> The patches 1 and 2 provide some ground to create the output >>> device. The patch 3 creates the device and modify how the vimc-streamer >>> was dealing with the s_stream callback on other vimc modules, to make >>> simpler implementing this callback at vimc-output. Patch 4 change the >>> behavior of the pipeline in order to be closer to a real life hardware. >>> Patches 5-7 updates the default pipeline and the documentation to >>> include the new output device. >>> >>> This is the result of v4l2-compliance after this patch series: >>> $ v4l2-compliance -m0 -s50 >>> Grand Total for vimc device /dev/media0: 476, Succeeded: 476, Failed: 0, >>> Warnings: 0 >>> >>> A git tree up to date with media-master and with this changes can be found >>> at: https://gitlab.collabora.com/tonyk/linux/tree/vimc/output >>> >>> In order to test it, one can follow these instructions: >>> >>> 1 - Configure the pipeline (requires v4l-utils): >>> >>> $ media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]' >>> $ media-ctl -d platform:vimc -V '"Debayer A":0[fmt:SBGGR8_1X8/640x480]' >>> $ media-ctl -d platform:vimc -V '"Sensor B":0[fmt:SBGGR8_1X8/640x480]' >>> $ media-ctl -d platform:vimc -V '"Debayer B":0[fmt:SBGGR8_1X8/640x480]' >>> $ v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=1920,height=1440 >>> $ v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 >>> $ v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v pixelformat=BA81 >>> $ v4l2-ctl -z platform:vimc -e "RGB/YUV Input" -v width=640,height=480 >>> >>> 2 - Use a userspace application: >>> 2.a gst-launch (requires gstreamer and gst-plugins-good): >>> >>> Feed frames into the output and grab from the capture (rescaled for >>> convenience): >>> >>> $ gst-launch-1.0 videotestsrc pattern=ball ! \ >>> video/x-raw,width=640,height=480,format=RGB \ >>> ! v4l2sink device=/dev/video2 v4l2src device=/dev/video3 ! \ >>> video/x-raw,width=1920,height=1440,format=RGB ! videoscale ! \ >>> video/x-raw,width=640,height=480 ! videoconvert ! ximagesink >>> >>> 2.b qv4l2 (requires v4l-utils): >>> >>> Open the output device: >>> >>> $ qv4l2 -d2 >>> >>> Open the capture device: >>> >>> $ qv4l2 -d
Re: [PATCH 0/3] Collapse vimc into single monolithic driver
Hi Shuah, On 8/12/19 11:08 AM, Shuah Khan wrote: > On 8/9/19 9:51 PM, Helen Koike wrote: >> Hi Andre, >> >> Thanks for testing this. >> >> On 8/9/19 9:24 PM, André Almeida wrote: >>> On 8/9/19 9:17 PM, Shuah Khan wrote: >>>> Hi Andre, >>>> >>>> On 8/9/19 5:52 PM, André Almeida wrote: >>>>> Hello Shuah, >>>>> >>>>> Thanks for the patch, I did some comments below. >>>>> >>>>> On 8/9/19 6:45 PM, Shuah Khan wrote: >>>>>> vimc uses Component API to split the driver into functional >>>>>> components. >>>>>> The real hardware resembles a monolith structure than component and >>>>>> component structure added a level of complexity making it hard to >>>>>> maintain without adding any real benefit. >>>>>> The sensor is one vimc component that would makes sense to be a >>>>>> separate >>>>>> module to closely align with the real hardware. It would be easier to >>>>>> collapse vimc into single monolithic driver first and then split the >>>>>> sensor off as a separate module. >>>>>> >>>>>> This patch series emoves the component API and makes minimal >>>>>> changes to >>>>>> the code base preserving the functional division of the code >>>>>> structure. >>>>>> Preserving the functional structure allows us to split the sensor off >>>>>> as a separate module in the future. >>>>>> >>>>>> Major design elements in this change are: >>>>>> - Use existing struct vimc_ent_config and struct >>>>>> vimc_pipeline_config >>>>>> to drive the initialization of the functional components. >>>>>> - Make vimc_ent_config global by moving it to vimc.h >>>>>> - Add two new hooks add and rm to initialize and register, >>>>>> unregister >>>>>> and free subdevs. >>>>>> - All component API is now gone and bind and unbind hooks are >>>>>> modified >>>>>> to do "add" and "rm" with minimal changes to just add and rm >>>>>> subdevs. >>>>>> - vimc-core's bind and unbind are now register and unregister. >>>>>> - vimc-core invokes "add" hooks from its >>>>>> vimc_register_devices(). >>>>>> The "add" hooks remain the same and register subdevs. They >>>>>> don't >>>>>> create platform devices of their own and use vimc's >>>>>> pdev.dev as >>>>>> their reference device. The "add" hooks save their >>>>>> vimc_ent_device(s) >>>>>> in the corresponding vimc_ent_config. >>>>>> - vimc-core invokes "rm" hooks from its unregister to >>>>>> unregister >>>>>> subdevs >>>>>> and cleanup. >>>>>> - vimc-core invokes "add" and "rm" hooks with pointer to struct >>>>>> vimc_device >>>>>> and the corresponding struct vimc_ent_config pointer. >>>>>> The following configure and stream test works on all devices. >>>>>> media-ctl -d platform:vimc -V '"Sensor >>>>>> A":0[fmt:SBGGR8_1X8/640x480]' >>>>>> media-ctl -d platform:vimc -V '"Debayer >>>>>> A":0[fmt:SBGGR8_1X8/640x480]' >>>>>> media-ctl -d platform:vimc -V '"Sensor >>>>>> B":0[fmt:SBGGR8_1X8/640x480]' >>>>>> media-ctl -d platform:vimc -V '"Debayer >>>>>> B":0[fmt:SBGGR8_1X8/640x480]' >>>>>> v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v >>>>>> width=1920,height=1440 >>>>>> v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v >>>>>> pixelformat=BA81 >>>>>> v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v >>>>>> pixelformat=BA81 >>>>>> v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video1 >>>>>> v4l2-ctl --stream-mmap --stream-count
Re: [PATCH 0/3] Collapse vimc into single monolithic driver
Hello Shuah, Thanks for the patch, I did some comments below. On 8/9/19 6:45 PM, Shuah Khan wrote: > vimc uses Component API to split the driver into functional components. > The real hardware resembles a monolith structure than component and > component structure added a level of complexity making it hard to > maintain without adding any real benefit. > > The sensor is one vimc component that would makes sense to be a separate > module to closely align with the real hardware. It would be easier to > collapse vimc into single monolithic driver first and then split the > sensor off as a separate module. > > This patch series emoves the component API and makes minimal changes to > the code base preserving the functional division of the code structure. > Preserving the functional structure allows us to split the sensor off > as a separate module in the future. > > Major design elements in this change are: > - Use existing struct vimc_ent_config and struct vimc_pipeline_config > to drive the initialization of the functional components. > - Make vimc_ent_config global by moving it to vimc.h > - Add two new hooks add and rm to initialize and register, unregister > and free subdevs. > - All component API is now gone and bind and unbind hooks are modified > to do "add" and "rm" with minimal changes to just add and rm subdevs. > - vimc-core's bind and unbind are now register and unregister. > - vimc-core invokes "add" hooks from its vimc_register_devices(). > The "add" hooks remain the same and register subdevs. They don't > create platform devices of their own and use vimc's pdev.dev as > their reference device. The "add" hooks save their vimc_ent_device(s) > in the corresponding vimc_ent_config. > - vimc-core invokes "rm" hooks from its unregister to unregister subdevs > and cleanup. > - vimc-core invokes "add" and "rm" hooks with pointer to struct > vimc_device > and the corresponding struct vimc_ent_config pointer. > > The following configure and stream test works on all devices. > > media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]' > media-ctl -d platform:vimc -V '"Debayer A":0[fmt:SBGGR8_1X8/640x480]' > media-ctl -d platform:vimc -V '"Sensor B":0[fmt:SBGGR8_1X8/640x480]' > media-ctl -d platform:vimc -V '"Debayer B":0[fmt:SBGGR8_1X8/640x480]' > > v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=1920,height=1440 > v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 > v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v pixelformat=BA81 > > v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video1 > v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video2 > v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video3 > > The third patch in the series fixes a general protection fault found > when rmmod is done while stream is active. I applied your patch on top of media_tree/master and I did some testing. Not sure if I did something wrong, but just adding and removing the module generated a kernel panic: ~# modprobe vimc ~# rmmod vimc [ 16.452974] stack segment: [#1] SMP PTI [ 16.453688] CPU: 0 PID: 2038 Comm: rmmod Not tainted 5.3.0-rc2+ #36 [ 16.454678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 [ 16.456191] RIP: 0010:kfree+0x4d/0x240 [ 16.469188] Call Trace: [ 16.469666] vimc_remove+0x35/0x90 [vimc] [ 16.470436] platform_drv_remove+0x1f/0x40 [ 16.471233] device_release_driver_internal+0xd3/0x1b0 [ 16.472184] driver_detach+0x37/0x6b [ 16.472882] bus_remove_driver+0x50/0xc1 [ 16.473569] vimc_exit+0xc/0xca0 [vimc] [ 16.474231] __x64_sys_delete_module+0x18d/0x240 [ 16.475036] do_syscall_64+0x43/0x110 [ 16.475656] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 16.476504] RIP: 0033:0x7fceb8dafa4b [ 16.484853] Modules linked in: vimc(-) videobuf2_vmalloc videobuf2_memops v4l2_tpg videobuf2_v4l2 videobuf2_common [ 16.486187] ---[ end trace 91e5e0894e254d49 ]--- [ 16.486758] RIP: 0010:kfree+0x4d/0x240 fish: “rmmod vimc” terminated by signal SIGSEGV (Address boundary error) I just added the module after booting, no other action was made. Here is how my `git log --oneline` looks like: 897d708e922b media: vimc: Fix gpf in rmmod path when stream is active 2e4a5ad8ad6d media: vimc: Collapse component structure into a single monolithic driver 7c8da1687e92 media: vimc: move private defines to a common header 97299a303532 media: Remove dev_err() usage after platform_get_irq() 25a3d6bac6b9 media: adv7511/cobalt: rename driver name to adv7511-v4l2 ... > > vimc_print_dot (--print-dot) topology after this change: > digraph board { > rankdir=TB > n0001 [label="{{} | Sensor A\n/dev/v4l-subdev0 | { 0}}", > shape=Mrecord, style=filled, fillcolor=green] > n0001:port0 -> n0005:port0 [style=bold] > n00
Re: [PATCH 0/3] Collapse vimc into single monolithic driver
On 8/9/19 9:17 PM, Shuah Khan wrote: > Hi Andre, > > On 8/9/19 5:52 PM, André Almeida wrote: >> Hello Shuah, >> >> Thanks for the patch, I did some comments below. >> >> On 8/9/19 6:45 PM, Shuah Khan wrote: >>> vimc uses Component API to split the driver into functional components. >>> The real hardware resembles a monolith structure than component and >>> component structure added a level of complexity making it hard to >>> maintain without adding any real benefit. >>> The sensor is one vimc component that would makes sense to be a >>> separate >>> module to closely align with the real hardware. It would be easier to >>> collapse vimc into single monolithic driver first and then split the >>> sensor off as a separate module. >>> >>> This patch series emoves the component API and makes minimal changes to >>> the code base preserving the functional division of the code structure. >>> Preserving the functional structure allows us to split the sensor off >>> as a separate module in the future. >>> >>> Major design elements in this change are: >>> - Use existing struct vimc_ent_config and struct >>> vimc_pipeline_config >>> to drive the initialization of the functional components. >>> - Make vimc_ent_config global by moving it to vimc.h >>> - Add two new hooks add and rm to initialize and register, >>> unregister >>> and free subdevs. >>> - All component API is now gone and bind and unbind hooks are >>> modified >>> to do "add" and "rm" with minimal changes to just add and rm >>> subdevs. >>> - vimc-core's bind and unbind are now register and unregister. >>> - vimc-core invokes "add" hooks from its vimc_register_devices(). >>> The "add" hooks remain the same and register subdevs. They don't >>> create platform devices of their own and use vimc's pdev.dev as >>> their reference device. The "add" hooks save their >>> vimc_ent_device(s) >>> in the corresponding vimc_ent_config. >>> - vimc-core invokes "rm" hooks from its unregister to unregister >>> subdevs >>> and cleanup. >>> - vimc-core invokes "add" and "rm" hooks with pointer to struct >>> vimc_device >>> and the corresponding struct vimc_ent_config pointer. >>> The following configure and stream test works on all devices. >>> media-ctl -d platform:vimc -V '"Sensor >>> A":0[fmt:SBGGR8_1X8/640x480]' >>> media-ctl -d platform:vimc -V '"Debayer >>> A":0[fmt:SBGGR8_1X8/640x480]' >>> media-ctl -d platform:vimc -V '"Sensor >>> B":0[fmt:SBGGR8_1X8/640x480]' >>> media-ctl -d platform:vimc -V '"Debayer >>> B":0[fmt:SBGGR8_1X8/640x480]' >>> v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v >>> width=1920,height=1440 >>> v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 >>> v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v pixelformat=BA81 >>> v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video1 >>> v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video2 >>> v4l2-ctl --stream-mmap --stream-count=100 -d /dev/video3 >>> >>> The third patch in the series fixes a general protection fault found >>> when rmmod is done while stream is active. >> >> I applied your patch on top of media_tree/master and I did some testing. >> Not sure if I did something wrong, but just adding and removing the >> module generated a kernel panic: > > Thanks for testing. > > Odd. I tested modprobe and rmmod both.I was working on Linux 5.3-rc2. > I will apply these to media latest and work from there. I have to > rebase these on top of the reverts from Lucas and Helen Ok, please let me know if I succeeded to reproduce. >> >> ~# modprobe vimc >> ~# rmmod vimc >> [ 16.452974] stack segment: [#1] SMP PTI >> [ 16.453688] CPU: 0 PID: 2038 Comm: rmmod Not tainted 5.3.0-rc2+ #36 >> [ 16.454678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS 1.12.0-20181126_142135-anatol 04/01/2014 >> [ 16.456191] RIP: 0010:kfree+0x4d/0x240 >> >> >> >> [ 16.469188] Call Trace: >> [ 16.469666] vimc_remove+0x35/0x90
Re: [PATCH 0/7] media: vimc: Add a V4L2 output device
On 7/13/19 7:03 AM, Hans Verkuil wrote: > On 7/12/19 5:38 PM, André Almeida wrote: >> Hello, >> >> On 7/10/19 4:33 AM, Hans Verkuil wrote: >>> On 7/10/19 12:19 AM, Helen Koike wrote: >>>> Hi André, >>>> >>>> Thanks for the patches. >>>> >>>> On 7/2/19 12:47 PM, André Almeida wrote: >>>>> Hello, >>>>> >>>>> This patch adds a V4L2 output device on vimc, that comply with V4L2 API >>>>> for video output. If there is an output device and a capture device at the >>>>> same pipeline, one can get a video loopback pipeline feeding frames at >>>>> the output and then seeing them at the capture. It's possible to insert >>>>> vimc submodules at the pipeline to modify the image (e.g. a scaler). >>>>> >>>>> If one starts a streaming at the capture, with the output off, the >>>>> capture will display a noisy frame. If one starts a streaming at the >>>>> output with the capture off, the output will just consume the buffers, >>>>> without sending them to the pipeline. If both output and capture are >>>>> streaming, the loopback will happen. >>>> I understand why it is done like this in vivid, but I was wondering, if we >>>> have a pipeline like: >>>> output -> capture >>>> Shouldn't streaming from the capture just stalls if there is no frame >>>> available in the output (i.e. streaming in the output is off) ? But then >>>> I'm >>>> not sure what the framerate in the capture would mean. >>>> >>>> Hans, what do you think? >>> If you set up the pipeline like this: >>> >>> Video Output -> Scaler -> Video Capture >> >> If the capture will stall if there's no frame from the video output, how >> can I add support for this kind of pipeline at test-media? It would be >> required to send frames to the output device while running >> `v4l2-compliance` at the capture device to make testing possible. > > The compliance test doesn't support such devices at the moment. The implementation of the expected behavior can be found here: https://gitlab.collabora.com/tonyk/linux/tree/vimc/output-v2 > > I think a new option (or options) are needed to tell the compliance test > that the capture and output video devices together constitute an m2m device. I've reading the v4l-utils code base and I had a look at both m2m tests and capture/output tests, but I'm not sure how to implement this new option. How do you think it should be implemented? Should it resemble how v4l2-compliance tests vim2m? Something like this: v4l2-compliance -m platform:vim2m -z platform:vivid-002 -e vivid-002-vid-cap -s10 -P -a Thanks, André > > Regards, > > Hans > >> >> Thanks, >> André >> >>> Then this is a mem2mem device (except with two separate video devices) and >>> framerate doesn't apply anymore. And video capture will just stall if there >>> is no video output frame provided. >>> >>> It's how e.g. omap3isp works. >>> >>> Regards, >>> >>> Hans >>> >>>> Thanks, >>>> Helen >>>> >>>>> The patches 1 and 2 provide some ground to create the output >>>>> device. The patch 3 creates the device and modify how the vimc-streamer >>>>> was dealing with the s_stream callback on other vimc modules, to make >>>>> simpler implementing this callback at vimc-output. Patch 4 change the >>>>> behavior of the pipeline in order to be closer to a real life hardware. >>>>> Patches 5-7 updates the default pipeline and the documentation to >>>>> include the new output device. >>>>> >>>>> This is the result of v4l2-compliance after this patch series: >>>>> $ v4l2-compliance -m0 -s50 >>>>> Grand Total for vimc device /dev/media0: 476, Succeeded: 476, Failed: 0, >>>>> Warnings: 0 >>>>> >>>>> A git tree up to date with media-master and with this changes can be found >>>>> at: https://gitlab.collabora.com/tonyk/linux/tree/vimc/output >>>>> >>>>> In order to test it, one can follow these instructions: >>>>> >>>>> 1 - Configure the pipeline (requires v4l-utils): >>>>> >>>>> $ media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]' >>>>
Re: [PATCH v3 1/2] media: vimc: stream: add missing function documentation
Hello Mauro, On 6/21/19 6:17 PM, Mauro Carvalho Chehab wrote: > Em Mon, 17 Jun 2019 10:32:20 -0300 > André Almeida escreveu: > >> Add comments at vimc_streamer_s_stream and vimc_streamer_thread, making >> the vimc-stream totally documented. > I'm applying it right now. > > Yet, if this is fully documented, IMO you should add it to > Documentation/media/v4l-drivers, replacing the comments to kernel-doc > markups. This suggestion is a great improvement and it's simple to apply to the source. Where do you believe we can place this at[1]? Maybe something like Source code documentation - vimc-streamer .. kernel-doc:: drivers/media/platform/vimc/vimc-streamer.c :internal: at the end of the file? > That would make easier for the ones to read the comments and, if someone > changes a function call, warnings will be produced, and the developer > will be warned. > Thanks, > Mauro Thanks, André [1] https://git.linuxtv.org/media_tree.git/tree/Documentation/media/v4l-drivers/vimc.rst
[PATCH 4/5] media: vimc.rst: Add a proper alt attribute to vimc.dot
According to W3C, "the content of the alt attribute is: use text that fulfills the same function as the image". While it's hard to describe the whole content of this image, replace the actual alt to something more useful to people with slow connection or that uses screen readers. Signed-off-by: André Almeida --- Documentation/media/v4l-drivers/vimc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/media/v4l-drivers/vimc.rst b/Documentation/media/v4l-drivers/vimc.rst index 4628b12d417f..bece85867424 100644 --- a/Documentation/media/v4l-drivers/vimc.rst +++ b/Documentation/media/v4l-drivers/vimc.rst @@ -15,7 +15,7 @@ recompile the driver to achieve your own topology. This is the default topology: .. _vimc_topology_graph: .. kernel-figure:: vimc.dot -:alt: vimc.dot +:alt: Diagram of the default media pipeline topology :align: center Media pipeline graph on vimc -- 2.22.0
[PATCH 1/5] media: vimc: stream: remove obsolete function doc
As a more complete version of vimc_streamer_s_streamer comment was added at "media: vimc: stream: add missing function documentation" commit in .c file, remove the old documentation from .h file. Signed-off-by: André Almeida --- drivers/media/platform/vimc/vimc-streamer.h | 8 1 file changed, 8 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.h b/drivers/media/platform/vimc/vimc-streamer.h index 2b3667408794..28c3706e3c21 100644 --- a/drivers/media/platform/vimc/vimc-streamer.h +++ b/drivers/media/platform/vimc/vimc-streamer.h @@ -43,14 +43,6 @@ struct vimc_stream { u32 producer_pixfmt; }; -/** - * vimc_streamer_s_streamer - start/stop the stream - * - * @stream:the pointer to the stream to start or stop - * @ved: The last entity of the streamer pipeline - * @enable:any non-zero number start the stream, zero stop - * - */ int vimc_streamer_s_stream(struct vimc_stream *stream, struct vimc_ent_device *ved, int enable); -- 2.22.0
[PATCH 5/5] media: vimc.rst: add vimc-streamer source documentation
Since vimc-streamer.{c, h} are fully documented and conforming with the kernel-doc syntax, add those files to vimc.rst Signed-off-by: André Almeida Suggested-by: Mauro Carvalho Chehab --- Documentation/media/v4l-drivers/vimc.rst | 11 +++ 1 file changed, 11 insertions(+) diff --git a/Documentation/media/v4l-drivers/vimc.rst b/Documentation/media/v4l-drivers/vimc.rst index bece85867424..406417680db5 100644 --- a/Documentation/media/v4l-drivers/vimc.rst +++ b/Documentation/media/v4l-drivers/vimc.rst @@ -96,3 +96,14 @@ those arguments to each subdevice, not to the vimc module. For example:: Window size to calculate the mean. Note: the window size needs to be an odd number, as the main pixel stays in the center of the window, otherwise the next odd number is considered (the default value is 3). + +Source code documentation +- + +vimc-streamer +~ + +.. kernel-doc:: drivers/media/platform/vimc/vimc-streamer.h + :internal: + +.. kernel-doc:: drivers/media/platform/vimc/vimc-streamer.c -- 2.22.0
[PATCH 3/5] media: vimc: stream: format comments as kernel-doc
Format the current existing comments as kernel-doc comments, to be reused at kernel documention. Add opening marks (/**) and return values. Signed-off-by: André Almeida --- drivers/media/platform/vimc/vimc-streamer.c | 38 + 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.c b/drivers/media/platform/vimc/vimc-streamer.c index 3b3f36357a0e..9970650b0f26 100644 --- a/drivers/media/platform/vimc/vimc-streamer.c +++ b/drivers/media/platform/vimc/vimc-streamer.c @@ -20,6 +20,8 @@ * * Helper function that returns the media entity containing the source pad * linked with the first sink pad from the given media entity pad list. + * + * Return: The source pad or NULL, if it wasn't found. */ static struct media_entity *vimc_get_source_entity(struct media_entity *ent) { @@ -35,7 +37,7 @@ static struct media_entity *vimc_get_source_entity(struct media_entity *ent) return NULL; } -/* +/** * vimc_streamer_pipeline_terminate - Disable stream in all ved in stream * * @stream: the pointer to the stream structure with the pipeline to be @@ -63,15 +65,18 @@ static void vimc_streamer_pipeline_terminate(struct vimc_stream *stream) } } -/* - * vimc_streamer_pipeline_init - initializes the stream structure +/** + * vimc_streamer_pipeline_init - Initializes the stream structure * * @stream: the pointer to the stream structure to be initialized * @ved:the pointer to the vimc entity initializing the stream * * Initializes the stream structure. Walks through the entity graph to * construct the pipeline used later on the streamer thread. - * Calls s_stream to enable stream in all entities of the pipeline. + * Calls ``vimc_streamer_s_stream`` to enable stream in all entities of + * the pipeline. + * + * Return: 0 if success, error code otherwise. */ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, struct vimc_ent_device *ved) @@ -122,13 +127,17 @@ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, return -EINVAL; } -/* - * vimc_streamer_thread - process frames through the pipeline +/** + * vimc_streamer_thread - Process frames through the pipeline * * @data: vimc_stream struct of the current stream * * From the source to the sink, gets a frame from each subdevice and send to * the next one of the pipeline at a fixed framerate. + * + * Return: + * Always zero (created as ``int`` instead of ``void`` to comply with + * kthread API). */ static int vimc_streamer_thread(void *data) { @@ -157,19 +166,20 @@ static int vimc_streamer_thread(void *data) return 0; } -/* - * vimc_streamer_s_stream - start/stop the streaming on the media pipeline +/** + * vimc_streamer_s_stream - Start/stop the streaming on the media pipeline * * @stream:the pointer to the stream structure of the current stream * @ved: pointer to the vimc entity of the entity of the stream * @enable:flag to determine if stream should start/stop * - * When starting, check if there is no stream->kthread allocated. This should - * indicate that a stream is already running. Then, it initializes - * the pipeline, creates and runs a kthread to consume buffers through the - * pipeline. - * When stopping, analogously check if there is a stream running, stop - * the thread and terminates the pipeline. + * When starting, check if there is no ``stream->kthread`` allocated. This + * should indicate that a stream is already running. Then, it initializes the + * pipeline, creates and runs a kthread to consume buffers through the pipeline. + * When stopping, analogously check if there is a stream running, stop the + * thread and terminates the pipeline. + * + * Return: 0 if success, error code otherwise. */ int vimc_streamer_s_stream(struct vimc_stream *stream, struct vimc_ent_device *ved, -- 2.22.0
[PATCH 2/5] media: vimc: stream: fix style of argument description
As in "Function parameters" at doc-guide/kernel-doc.rst, "the continuation of the description should start at the same column as the previous line". Make the @producer_pixfmt comply with that. Signed-off-by: André Almeida --- drivers/media/platform/vimc/vimc-streamer.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.h b/drivers/media/platform/vimc/vimc-streamer.h index 28c3706e3c21..d744a787e0e7 100644 --- a/drivers/media/platform/vimc/vimc-streamer.h +++ b/drivers/media/platform/vimc/vimc-streamer.h @@ -26,10 +26,12 @@ * @pipe_size: size of @ved_pipeline * @kthread: thread that generates the frames of the stream. * @producer_pixfmt: the pixel format requested from the pipeline. This must - * be set just before calling vimc_streamer_s_stream(ent, 1). This value is - * propagated up to the source of the base image (usually a sensor node) and - * can be modified by entities during s_stream callback to request a different - * format from rest of the pipeline. + * be set just before calling + * vimc_streamer_s_stream(ent, 1). This value is propagated + * up to the source of the base image (usually a sensor + * node) and can be modified by entities during s_stream + * callback to request a differentformat from rest of + * the pipeline. * * When the user call stream_on in a video device, struct vimc_stream is * used to keep track of all entities and subdevices that generates and -- 2.22.0
Re: [PATCH 3/5] media: vimc: stream: format comments as kernel-doc
On 6/23/19 1:40 PM, André Almeida wrote: > Format the current existing comments as kernel-doc comments, to be > reused at kernel documention. Add opening marks (/**) and return values. > > Signed-off-by: André Almeida > --- > drivers/media/platform/vimc/vimc-streamer.c | 38 + > 1 file changed, 24 insertions(+), 14 deletions(-) > > diff --git a/drivers/media/platform/vimc/vimc-streamer.c > b/drivers/media/platform/vimc/vimc-streamer.c > index 3b3f36357a0e..9970650b0f26 100644 > --- a/drivers/media/platform/vimc/vimc-streamer.c > +++ b/drivers/media/platform/vimc/vimc-streamer.c > @@ -20,6 +20,8 @@ > * > * Helper function that returns the media entity containing the source pad > * linked with the first sink pad from the given media entity pad list. > + * > + * Return: The source pad or NULL, if it wasn't found. > */ > static struct media_entity *vimc_get_source_entity(struct media_entity *ent) > { > @@ -35,7 +37,7 @@ static struct media_entity *vimc_get_source_entity(struct > media_entity *ent) > return NULL; > } > > -/* > +/** > * vimc_streamer_pipeline_terminate - Disable stream in all ved in stream > * > * @stream: the pointer to the stream structure with the pipeline to be > @@ -63,15 +65,18 @@ static void vimc_streamer_pipeline_terminate(struct > vimc_stream *stream) > } > } > > -/* > - * vimc_streamer_pipeline_init - initializes the stream structure > +/** > + * vimc_streamer_pipeline_init - Initializes the stream structure > * > * @stream: the pointer to the stream structure to be initialized > * @ved:the pointer to the vimc entity initializing the stream > * > * Initializes the stream structure. Walks through the entity graph to > * construct the pipeline used later on the streamer thread. > - * Calls s_stream to enable stream in all entities of the pipeline. > + * Calls ``vimc_streamer_s_stream`` to enable stream in all entities of ``vimc_streamer_s_stream`` could also been written as :c:func:`vimc_streamer_s_stream`. In this latest setup, the Documentation output would display a nice hyperlink to the documentation of vimc_streamer_s_stream function. Is this a good improvement or it will be too verbose? > + * the pipeline. > + * > + * Return: 0 if success, error code otherwise. > */ > static int vimc_streamer_pipeline_init(struct vimc_stream *stream, > struct vimc_ent_device *ved) > @@ -122,13 +127,17 @@ static int vimc_streamer_pipeline_init(struct > vimc_stream *stream, > return -EINVAL; > } > > -/* > - * vimc_streamer_thread - process frames through the pipeline > +/** > + * vimc_streamer_thread - Process frames through the pipeline > * > * @data:vimc_stream struct of the current stream > * > * From the source to the sink, gets a frame from each subdevice and send to > * the next one of the pipeline at a fixed framerate. > + * > + * Return: > + * Always zero (created as ``int`` instead of ``void`` to comply with > + * kthread API). > */ > static int vimc_streamer_thread(void *data) > { > @@ -157,19 +166,20 @@ static int vimc_streamer_thread(void *data) > return 0; > } > > -/* > - * vimc_streamer_s_stream - start/stop the streaming on the media pipeline > +/** > + * vimc_streamer_s_stream - Start/stop the streaming on the media pipeline > * > * @stream: the pointer to the stream structure of the current stream > * @ved: pointer to the vimc entity of the entity of the stream > * @enable: flag to determine if stream should start/stop > * > - * When starting, check if there is no stream->kthread allocated. This should > - * indicate that a stream is already running. Then, it initializes > - * the pipeline, creates and runs a kthread to consume buffers through the > - * pipeline. > - * When stopping, analogously check if there is a stream running, stop > - * the thread and terminates the pipeline. > + * When starting, check if there is no ``stream->kthread`` allocated. This > + * should indicate that a stream is already running. Then, it initializes the > + * pipeline, creates and runs a kthread to consume buffers through the > pipeline. > + * When stopping, analogously check if there is a stream running, stop the > + * thread and terminates the pipeline. > + * > + * Return: 0 if success, error code otherwise. > */ > int vimc_streamer_s_stream(struct vimc_stream *stream, > struct vimc_ent_device *ved,
Re: [PATCH 3/5] media: vimc: stream: format comments as kernel-doc
On 6/24/19 6:40 AM, Mauro Carvalho Chehab wrote: > Em Sun, 23 Jun 2019 18:27:22 -0300 > André Almeida escreveu: > >> On 6/23/19 1:40 PM, André Almeida wrote: >>> - * Calls s_stream to enable stream in all entities of the pipeline. >>> + * Calls ``vimc_streamer_s_stream`` to enable stream in all entities of >> ``vimc_streamer_s_stream`` could also been written as >> :c:func:`vimc_streamer_s_stream`. In this latest setup, the >> Documentation output would display a nice hyperlink to the documentation >> of vimc_streamer_s_stream function. Is this a good improvement or it >> will be too verbose? > The best would be to use: vimc_streamer_s_stream(). Kernel-doc already > handles it (don't remember if it uses :c:func:, but I guess it does), > and this is the recommended way. Just tested here, and worked fine. I'll send a v2 using this style. > > Anyway, there's a patch under discussion right now at Linux docs ML that > will auto-replace these to :c:func`` automatically, not only on kernel-doc > tags, but also within the .rst files. It should be able to recognize > existing :c:func: tags, so no harm done if it is there somewhere. > > Thanks, > Mauro Thanks, André
[PATCH v2 1/5] media: vimc: stream: remove obsolete function doc
As a more complete version of vimc_streamer_s_streamer comment was added at "media: vimc: stream: add missing function documentation" commit in .c file, remove the old documentation from .h file. Signed-off-by: André Almeida --- Changes in v2: none drivers/media/platform/vimc/vimc-streamer.h | 8 1 file changed, 8 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.h b/drivers/media/platform/vimc/vimc-streamer.h index 2b3667408794..28c3706e3c21 100644 --- a/drivers/media/platform/vimc/vimc-streamer.h +++ b/drivers/media/platform/vimc/vimc-streamer.h @@ -43,14 +43,6 @@ struct vimc_stream { u32 producer_pixfmt; }; -/** - * vimc_streamer_s_streamer - start/stop the stream - * - * @stream:the pointer to the stream to start or stop - * @ved: The last entity of the streamer pipeline - * @enable:any non-zero number start the stream, zero stop - * - */ int vimc_streamer_s_stream(struct vimc_stream *stream, struct vimc_ent_device *ved, int enable); -- 2.22.0
[PATCH v2 2/5] media: vimc: stream: fix style of argument description
As in "Function parameters" at doc-guide/kernel-doc.rst, "the continuation of the description should start at the same column as the previous line". Make the @producer_pixfmt comply with that. Signed-off-by: André Almeida --- Changes in v2: none drivers/media/platform/vimc/vimc-streamer.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.h b/drivers/media/platform/vimc/vimc-streamer.h index 28c3706e3c21..d744a787e0e7 100644 --- a/drivers/media/platform/vimc/vimc-streamer.h +++ b/drivers/media/platform/vimc/vimc-streamer.h @@ -26,10 +26,12 @@ * @pipe_size: size of @ved_pipeline * @kthread: thread that generates the frames of the stream. * @producer_pixfmt: the pixel format requested from the pipeline. This must - * be set just before calling vimc_streamer_s_stream(ent, 1). This value is - * propagated up to the source of the base image (usually a sensor node) and - * can be modified by entities during s_stream callback to request a different - * format from rest of the pipeline. + * be set just before calling + * vimc_streamer_s_stream(ent, 1). This value is propagated + * up to the source of the base image (usually a sensor + * node) and can be modified by entities during s_stream + * callback to request a differentformat from rest of + * the pipeline. * * When the user call stream_on in a video device, struct vimc_stream is * used to keep track of all entities and subdevices that generates and -- 2.22.0
[PATCH v2 5/5] media: vimc.rst: add vimc-streamer source documentation
Since vimc-streamer.{c, h} are fully documented and conforming with the kernel-doc syntax, add those files to vimc.rst Signed-off-by: André Almeida Suggested-by: Mauro Carvalho Chehab --- Changes in v2: none Documentation/media/v4l-drivers/vimc.rst | 11 +++ 1 file changed, 11 insertions(+) diff --git a/Documentation/media/v4l-drivers/vimc.rst b/Documentation/media/v4l-drivers/vimc.rst index bece85867424..406417680db5 100644 --- a/Documentation/media/v4l-drivers/vimc.rst +++ b/Documentation/media/v4l-drivers/vimc.rst @@ -96,3 +96,14 @@ those arguments to each subdevice, not to the vimc module. For example:: Window size to calculate the mean. Note: the window size needs to be an odd number, as the main pixel stays in the center of the window, otherwise the next odd number is considered (the default value is 3). + +Source code documentation +- + +vimc-streamer +~ + +.. kernel-doc:: drivers/media/platform/vimc/vimc-streamer.h + :internal: + +.. kernel-doc:: drivers/media/platform/vimc/vimc-streamer.c -- 2.22.0
[PATCH v2 4/5] media: vimc.rst: Add a proper alt attribute to vimc.dot
According to W3C, "the content of the alt attribute is: use text that fulfills the same function as the image". While it's hard to describe the whole content of this image, replace the actual alt to something more useful to people with slow connection or that uses screen readers. Signed-off-by: André Almeida --- Changes in v2: none Documentation/media/v4l-drivers/vimc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/media/v4l-drivers/vimc.rst b/Documentation/media/v4l-drivers/vimc.rst index 4628b12d417f..bece85867424 100644 --- a/Documentation/media/v4l-drivers/vimc.rst +++ b/Documentation/media/v4l-drivers/vimc.rst @@ -15,7 +15,7 @@ recompile the driver to achieve your own topology. This is the default topology: .. _vimc_topology_graph: .. kernel-figure:: vimc.dot -:alt: vimc.dot +:alt: Diagram of the default media pipeline topology :align: center Media pipeline graph on vimc -- 2.22.0
[PATCH v2 3/5] media: vimc: stream: format comments as kernel-doc
Format the current existing comments as kernel-doc comments, to be reused at kernel documention. Add opening marks (/**) and return values. Signed-off-by: André Almeida --- Changes in v2: replace ``vimc_streamer_s_stream`` for vimc_streamer_s_stream(). drivers/media/platform/vimc/vimc-streamer.c | 38 + 1 file changed, 24 insertions(+), 14 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.c b/drivers/media/platform/vimc/vimc-streamer.c index 3b3f36357a0e..62dde7d74c24 100644 --- a/drivers/media/platform/vimc/vimc-streamer.c +++ b/drivers/media/platform/vimc/vimc-streamer.c @@ -20,6 +20,8 @@ * * Helper function that returns the media entity containing the source pad * linked with the first sink pad from the given media entity pad list. + * + * Return: The source pad or NULL, if it wasn't found. */ static struct media_entity *vimc_get_source_entity(struct media_entity *ent) { @@ -35,7 +37,7 @@ static struct media_entity *vimc_get_source_entity(struct media_entity *ent) return NULL; } -/* +/** * vimc_streamer_pipeline_terminate - Disable stream in all ved in stream * * @stream: the pointer to the stream structure with the pipeline to be @@ -63,15 +65,18 @@ static void vimc_streamer_pipeline_terminate(struct vimc_stream *stream) } } -/* - * vimc_streamer_pipeline_init - initializes the stream structure +/** + * vimc_streamer_pipeline_init - Initializes the stream structure * * @stream: the pointer to the stream structure to be initialized * @ved:the pointer to the vimc entity initializing the stream * * Initializes the stream structure. Walks through the entity graph to * construct the pipeline used later on the streamer thread. - * Calls s_stream to enable stream in all entities of the pipeline. + * Calls vimc_streamer_s_stream() to enable stream in all entities of + * the pipeline. + * + * Return: 0 if success, error code otherwise. */ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, struct vimc_ent_device *ved) @@ -122,13 +127,17 @@ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, return -EINVAL; } -/* - * vimc_streamer_thread - process frames through the pipeline +/** + * vimc_streamer_thread - Process frames through the pipeline * * @data: vimc_stream struct of the current stream * * From the source to the sink, gets a frame from each subdevice and send to * the next one of the pipeline at a fixed framerate. + * + * Return: + * Always zero (created as ``int`` instead of ``void`` to comply with + * kthread API). */ static int vimc_streamer_thread(void *data) { @@ -157,19 +166,20 @@ static int vimc_streamer_thread(void *data) return 0; } -/* - * vimc_streamer_s_stream - start/stop the streaming on the media pipeline +/** + * vimc_streamer_s_stream - Start/stop the streaming on the media pipeline * * @stream:the pointer to the stream structure of the current stream * @ved: pointer to the vimc entity of the entity of the stream * @enable:flag to determine if stream should start/stop * - * When starting, check if there is no stream->kthread allocated. This should - * indicate that a stream is already running. Then, it initializes - * the pipeline, creates and runs a kthread to consume buffers through the - * pipeline. - * When stopping, analogously check if there is a stream running, stop - * the thread and terminates the pipeline. + * When starting, check if there is no ``stream->kthread`` allocated. This + * should indicate that a stream is already running. Then, it initializes the + * pipeline, creates and runs a kthread to consume buffers through the pipeline. + * When stopping, analogously check if there is a stream running, stop the + * thread and terminates the pipeline. + * + * Return: 0 if success, error code otherwise. */ int vimc_streamer_s_stream(struct vimc_stream *stream, struct vimc_ent_device *ved, -- 2.22.0
[PATCH v3 1/2] media: vimc: stream: add missing function documentation
Add comments at vimc_streamer_s_stream and vimc_streamer_thread, making the vimc-stream totally documented. Signed-off-by: André Almeida --- Changes in v3: replace "streaming" by "stream" at vimc_streamer_thread(). Changes in v2: fix typos drivers/media/platform/vimc/vimc-streamer.c | 22 + 1 file changed, 22 insertions(+) diff --git a/drivers/media/platform/vimc/vimc-streamer.c b/drivers/media/platform/vimc/vimc-streamer.c index 236ade38f1da..1fea6d666c2e 100644 --- a/drivers/media/platform/vimc/vimc-streamer.c +++ b/drivers/media/platform/vimc/vimc-streamer.c @@ -122,6 +122,14 @@ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, return -EINVAL; } +/* + * vimc_streamer_thread - process frames through the pipeline + * + * @data: vimc_stream struct of the current stream + * + * From the source to the sink, gets a frame from each subdevice and send to + * the next one of the pipeline in a fixed framerate. + */ static int vimc_streamer_thread(void *data) { struct vimc_stream *stream = data; @@ -149,6 +157,20 @@ static int vimc_streamer_thread(void *data) return 0; } +/* + * vimc_streamer_s_stream - start/stop the streaming on the media pipeline + * + * @stream:the pointer to the stream structure of the current stream + * @ved: pointer to the vimc entity of the entity of the stream + * @enable:flag to determine if stream should start/stop + * + * When starting, check if there is no stream->kthread allocated. This should + * indicate that a streaming is already running. Then, it initializes + * the pipeline, creates and runs a kthread to consume buffers through the + * pipeline. + * When stopping, analogously check if there is a stream running, stops + * the thread and terminates the pipeline. + */ int vimc_streamer_s_stream(struct vimc_stream *stream, struct vimc_ent_device *ved, int enable) -- 2.22.0
[PATCH v3 2/2] media: docs: create vimc documentation
Create vimc documentation file to explain it basics features, it's topology, how to configure it and to document vimc's subdevices. Signed-off-by: André Almeida Suggested-by: Helen Koike --- Changes in v3: none Changes in v2: - Fix typos - Make clear what does means scale factor Documentation/media/v4l-drivers/index.rst | 1 + Documentation/media/v4l-drivers/vimc.dot | 22 + Documentation/media/v4l-drivers/vimc.rst | 98 +++ 3 files changed, 121 insertions(+) create mode 100644 Documentation/media/v4l-drivers/vimc.dot create mode 100644 Documentation/media/v4l-drivers/vimc.rst diff --git a/Documentation/media/v4l-drivers/index.rst b/Documentation/media/v4l-drivers/index.rst index 33a055907258..c4c78a28654c 100644 --- a/Documentation/media/v4l-drivers/index.rst +++ b/Documentation/media/v4l-drivers/index.rst @@ -64,5 +64,6 @@ For more details see the file COPYING in the source distribution of Linux. si476x soc-camera uvcvideo + vimc vivid zr364xx diff --git a/Documentation/media/v4l-drivers/vimc.dot b/Documentation/media/v4l-drivers/vimc.dot new file mode 100644 index ..57863a13fa39 --- /dev/null +++ b/Documentation/media/v4l-drivers/vimc.dot @@ -0,0 +1,22 @@ +# SPDX-License-Identifier: GPL-2.0 + +digraph board { + rankdir=TB + n0001 [label="{{} | Sensor A\n/dev/v4l-subdev0 | { 0}}", shape=Mrecord, style=filled, fillcolor=green] + n0001:port0 -> n0005:port0 [style=bold] + n0001:port0 -> n000b [style=bold] + n0003 [label="{{} | Sensor B\n/dev/v4l-subdev1 | { 0}}", shape=Mrecord, style=filled, fillcolor=green] + n0003:port0 -> n0008:port0 [style=bold] + n0003:port0 -> n000f [style=bold] + n0005 [label="{{ 0} | Debayer A\n/dev/v4l-subdev2 | { 1}}", shape=Mrecord, style=filled, fillcolor=green] + n0005:port1 -> n0017:port0 + n0008 [label="{{ 0} | Debayer B\n/dev/v4l-subdev3 | { 1}}", shape=Mrecord, style=filled, fillcolor=green] + n0008:port1 -> n0017:port0 [style=dashed] + n000b [label="Raw Capture 0\n/dev/video0", shape=box, style=filled, fillcolor=yellow] + n000f [label="Raw Capture 1\n/dev/video1", shape=box, style=filled, fillcolor=yellow] + n0013 [label="RGB/YUV Input\n/dev/video2", shape=box, style=filled, fillcolor=yellow] + n0013 -> n0017:port0 [style=dashed] + n0017 [label="{{ 0} | Scaler\n/dev/v4l-subdev4 | { 1}}", shape=Mrecord, style=filled, fillcolor=green] + n0017:port1 -> n001a [style=bold] + n001a [label="RGB/YUV Capture\n/dev/video3", shape=box, style=filled, fillcolor=yellow] +} diff --git a/Documentation/media/v4l-drivers/vimc.rst b/Documentation/media/v4l-drivers/vimc.rst new file mode 100644 index ..e235f806e252 --- /dev/null +++ b/Documentation/media/v4l-drivers/vimc.rst @@ -0,0 +1,98 @@ +.. SPDX-License-Identifier: GPL-2.0 + +The Virtual Media Controller Driver (vimc) +== + +The vimc driver emulates complex video hardware using the V4L2 API and the Media +API. It has a capture device and three subdevices: sensor, debayer and scaler. + +Topology + + +The topology is hardcoded, although you could modify it in vimc-core and +recompile the driver to achieve your own topology. This is the default topology: + +.. _vimc_topology_graph: + +.. kernel-figure:: vimc.dot +:alt: vimc.dot +:align: center + +Media pipeline graph on vimc + +Configuring the topology + + +Each subdevice will come with its default configuration (pixelformat, height, +width, ...). One needs to configure the topology in order to match the +configuration on each linked subdevice to stream frames through the pipeline. +If the configuration doesn't match, the stream will fail. The ``v4l2-utils`` +is a bundle of user-space applications, that comes with ``media-ctl`` and +``v4l2-ctl`` that can be used to configure the vimc configuration. This sequence +of commands fits for the default topology: + +.. code-block:: bash + +media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]' +media-ctl -d platform:vimc -V '"Debayer A":0[fmt:SBGGR8_1X8/640x480]' +media-ctl -d platform:vimc -V '"Sensor B":0[fmt:SBGGR8_1X8/640x480]' +media-ctl -d platform:vimc -V '"Debayer B":0[fmt:SBGGR8_1X8/640x480]' +v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=1920,height=1440 +v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 +v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v pixelformat=BA81 + +Subdevices +-- + +Sub
[PATCH] media: vimc: cap: check v4l2_fill_pixfmt return value
v4l2_fill_pixfmt() returns -EINVAL if the pixelformat used as parameter is invalid or if the user is trying to use a multiplanar format with the singleplanar API. Currently, the vimc_cap_try_fmt_vid_cap() returns such value, but vimc_cap_s_fmt_vid_cap() is ignoring it. Fix that and returns an error value if vimc_cap_try_fmt_vid_cap() has failed. Signed-off-by: André Almeida Suggested-by: Helen Koike --- Hello, This commit was suggest by Helen at "[v3,05/14] media: vimc: cap: refactor singleplanar as a subset of multiplanar" drivers/media/platform/vimc/vimc-capture.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/media/platform/vimc/vimc-capture.c b/drivers/media/platform/vimc/vimc-capture.c index 946dc0908566..664855708fdf 100644 --- a/drivers/media/platform/vimc/vimc-capture.c +++ b/drivers/media/platform/vimc/vimc-capture.c @@ -142,12 +142,15 @@ static int vimc_cap_s_fmt_vid_cap(struct file *file, void *priv, struct v4l2_format *f) { struct vimc_cap_device *vcap = video_drvdata(file); + int ret; /* Do not change the format while stream is on */ if (vb2_is_busy(&vcap->queue)) return -EBUSY; - vimc_cap_try_fmt_vid_cap(file, priv, f); + ret = vimc_cap_try_fmt_vid_cap(file, priv, f); + if (ret) + return ret; dev_dbg(vcap->dev, "%s: format update: " "old:%dx%d (0x%x, %d, %d, %d, %d) " -- 2.22.0
[PATCH 0/7] media: vimc: Add a V4L2 output device
Hello, This patch adds a V4L2 output device on vimc, that comply with V4L2 API for video output. If there is an output device and a capture device at the same pipeline, one can get a video loopback pipeline feeding frames at the output and then seeing them at the capture. It's possible to insert vimc submodules at the pipeline to modify the image (e.g. a scaler). If one starts a streaming at the capture, with the output off, the capture will display a noisy frame. If one starts a streaming at the output with the capture off, the output will just consume the buffers, without sending them to the pipeline. If both output and capture are streaming, the loopback will happen. The patches 1 and 2 provide some ground to create the output device. The patch 3 creates the device and modify how the vimc-streamer was dealing with the s_stream callback on other vimc modules, to make simpler implementing this callback at vimc-output. Patch 4 change the behavior of the pipeline in order to be closer to a real life hardware. Patches 5-7 updates the default pipeline and the documentation to include the new output device. This is the result of v4l2-compliance after this patch series: $ v4l2-compliance -m0 -s50 Grand Total for vimc device /dev/media0: 476, Succeeded: 476, Failed: 0, Warnings: 0 A git tree up to date with media-master and with this changes can be found at: https://gitlab.collabora.com/tonyk/linux/tree/vimc/output In order to test it, one can follow these instructions: 1 - Configure the pipeline (requires v4l-utils): $ media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]' $ media-ctl -d platform:vimc -V '"Debayer A":0[fmt:SBGGR8_1X8/640x480]' $ media-ctl -d platform:vimc -V '"Sensor B":0[fmt:SBGGR8_1X8/640x480]' $ media-ctl -d platform:vimc -V '"Debayer B":0[fmt:SBGGR8_1X8/640x480]' $ v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=1920,height=1440 $ v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 $ v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v pixelformat=BA81 $ v4l2-ctl -z platform:vimc -e "RGB/YUV Input" -v width=640,height=480 2 - Use a userspace application: 2.a gst-launch (requires gstreamer and gst-plugins-good): Feed frames into the output and grab from the capture (rescaled for convenience): $ gst-launch-1.0 videotestsrc pattern=ball ! \ video/x-raw,width=640,height=480,format=RGB \ ! v4l2sink device=/dev/video2 v4l2src device=/dev/video3 ! \ video/x-raw,width=1920,height=1440,format=RGB ! videoscale ! \ video/x-raw,width=640,height=480 ! videoconvert ! ximagesink 2.b qv4l2 (requires v4l-utils): Open the output device: $ qv4l2 -d2 Open the capture device: $ qv4l2 -d3 Start the streaming at both, at any order. You can change the frame content at "Test Pattern Generator" -> "Test Pattern" on the output. Thanks, André André Almeida (7): media: vimc: Create video module media: vimc: video: Add write file operation media: vimc: Create a V4L2 output device media: vimc: Send null buffer through the pipeline media: vimc: core: Add output device on the pipeline media: vimc.dot: Update default topology diagram media: vimc.rst: Add output device Documentation/media/v4l-drivers/vimc.dot| 4 +- Documentation/media/v4l-drivers/vimc.rst| 12 +- drivers/media/platform/vimc/Makefile| 4 +- drivers/media/platform/vimc/vimc-capture.c | 356 +++ drivers/media/platform/vimc/vimc-common.h | 5 +- drivers/media/platform/vimc/vimc-core.c | 7 +- drivers/media/platform/vimc/vimc-debayer.c | 14 +- drivers/media/platform/vimc/vimc-output.c | 362 drivers/media/platform/vimc/vimc-scaler.c | 13 +- drivers/media/platform/vimc/vimc-sensor.c | 10 +- drivers/media/platform/vimc/vimc-streamer.c | 24 +- drivers/media/platform/vimc/vimc-video.c| 273 +++ drivers/media/platform/vimc/vimc-video.h| 130 +++ 13 files changed, 849 insertions(+), 365 deletions(-) create mode 100644 drivers/media/platform/vimc/vimc-output.c create mode 100644 drivers/media/platform/vimc/vimc-video.c create mode 100644 drivers/media/platform/vimc/vimc-video.h -- 2.22.0
[PATCH 1/7] media: vimc: Create video module
V4L2 video capture and video output devices shares a lot of common code. To enhance code reuse with future video output, split vimc-capture.c in three files: vimc-capture.c and vimc-video.{c,h}. Keep strict capture related functions on vimc-capture.c. This change is meant to the future addition of a video output device in vimc, as it will make easier for code reuse and simplicity. Signed-off-by: André Almeida --- drivers/media/platform/vimc/Makefile | 2 +- drivers/media/platform/vimc/vimc-capture.c | 341 ++--- drivers/media/platform/vimc/vimc-video.c | 264 drivers/media/platform/vimc/vimc-video.h | 127 4 files changed, 417 insertions(+), 317 deletions(-) create mode 100644 drivers/media/platform/vimc/vimc-video.c create mode 100644 drivers/media/platform/vimc/vimc-video.h diff --git a/drivers/media/platform/vimc/Makefile b/drivers/media/platform/vimc/Makefile index 96d06f030c31..fb90aa0f33a5 100644 --- a/drivers/media/platform/vimc/Makefile +++ b/drivers/media/platform/vimc/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 vimc-y := vimc-core.o vimc-common.o vimc-streamer.o -obj-$(CONFIG_VIDEO_VIMC) += vimc.o vimc-capture.o vimc-debayer.o \ +obj-$(CONFIG_VIDEO_VIMC) += vimc.o vimc-video.o vimc-capture.o vimc-debayer.o \ vimc-scaler.o vimc-sensor.o diff --git a/drivers/media/platform/vimc/vimc-capture.c b/drivers/media/platform/vimc/vimc-capture.c index 664855708fdf..e80fa1ee3dc1 100644 --- a/drivers/media/platform/vimc/vimc-capture.c +++ b/drivers/media/platform/vimc/vimc-capture.c @@ -5,231 +5,18 @@ * Copyright (C) 2015-2017 Helen Koike */ -#include -#include -#include -#include -#include -#include -#include - -#include "vimc-common.h" -#include "vimc-streamer.h" +#include "vimc-video.h" #define VIMC_CAP_DRV_NAME "vimc-capture" -static const u32 vimc_cap_supported_pixfmt[] = { - V4L2_PIX_FMT_BGR24, - V4L2_PIX_FMT_RGB24, - V4L2_PIX_FMT_ARGB32, - V4L2_PIX_FMT_SBGGR8, - V4L2_PIX_FMT_SGBRG8, - V4L2_PIX_FMT_SGRBG8, - V4L2_PIX_FMT_SRGGB8, - V4L2_PIX_FMT_SBGGR10, - V4L2_PIX_FMT_SGBRG10, - V4L2_PIX_FMT_SGRBG10, - V4L2_PIX_FMT_SRGGB10, - V4L2_PIX_FMT_SBGGR10ALAW8, - V4L2_PIX_FMT_SGBRG10ALAW8, - V4L2_PIX_FMT_SGRBG10ALAW8, - V4L2_PIX_FMT_SRGGB10ALAW8, - V4L2_PIX_FMT_SBGGR10DPCM8, - V4L2_PIX_FMT_SGBRG10DPCM8, - V4L2_PIX_FMT_SGRBG10DPCM8, - V4L2_PIX_FMT_SRGGB10DPCM8, - V4L2_PIX_FMT_SBGGR12, - V4L2_PIX_FMT_SGBRG12, - V4L2_PIX_FMT_SGRBG12, - V4L2_PIX_FMT_SRGGB12, -}; - -struct vimc_cap_device { - struct vimc_ent_device ved; - struct video_device vdev; - struct device *dev; - struct v4l2_pix_format format; - struct vb2_queue queue; - struct list_head buf_list; - /* -* NOTE: in a real driver, a spin lock must be used to access the -* queue because the frames are generated from a hardware interruption -* and the isr is not allowed to sleep. -* Even if it is not necessary a spinlock in the vimc driver, we -* use it here as a code reference -*/ - spinlock_t qlock; - struct mutex lock; - u32 sequence; - struct vimc_stream stream; -}; - -static const struct v4l2_pix_format fmt_default = { - .width = 640, - .height = 480, - .pixelformat = V4L2_PIX_FMT_RGB24, - .field = V4L2_FIELD_NONE, - .colorspace = V4L2_COLORSPACE_DEFAULT, -}; - -struct vimc_cap_buffer { - /* -* struct vb2_v4l2_buffer must be the first element -* the videobuf2 framework will allocate this struct based on -* buf_struct_size and use the first sizeof(struct vb2_buffer) bytes of -* memory as a vb2_buffer -*/ - struct vb2_v4l2_buffer vb2; - struct list_head list; -}; - -static int vimc_cap_querycap(struct file *file, void *priv, -struct v4l2_capability *cap) -{ - strscpy(cap->driver, VIMC_PDEV_NAME, sizeof(cap->driver)); - strscpy(cap->card, KBUILD_MODNAME, sizeof(cap->card)); - snprintf(cap->bus_info, sizeof(cap->bus_info), -"platform:%s", VIMC_PDEV_NAME); - - return 0; -} - -static void vimc_cap_get_format(struct vimc_ent_device *ved, - struct v4l2_pix_format *fmt) -{ - struct vimc_cap_device *vcap = container_of(ved, struct vimc_cap_device, - ved); - - *fmt = vcap->format; -} - -static int vimc_cap_g_fmt_vid_cap(struct file *file, void *priv, - struct v4l2_format *f) -{ - struct vimc_cap_device *vcap = video_drvdata(file); - - f->fmt.pix = vcap->format; - - return 0; -} - -static int vimc_cap_try_fmt_vid_cap(struct file
[PATCH 2/7] media: vimc: video: Add write file operation
Add write on the list of vb2 file operations. This is required to create a V4L2 output device. Signed-off-by: André Almeida --- drivers/media/platform/vimc/vimc-video.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/media/platform/vimc/vimc-video.h b/drivers/media/platform/vimc/vimc-video.h index d329345cc77f..22cb0e2dbdbb 100644 --- a/drivers/media/platform/vimc/vimc-video.h +++ b/drivers/media/platform/vimc/vimc-video.h @@ -35,6 +35,7 @@ static const struct v4l2_file_operations vimc_vid_fops = { .open = v4l2_fh_open, .release= vb2_fop_release, .read = vb2_fop_read, + .write = vb2_fop_write, .poll = vb2_fop_poll, .unlocked_ioctl = video_ioctl2, .mmap = vb2_fop_mmap, -- 2.22.0
[PATCH 3/7] media: vimc: Create a V4L2 output device
Using the functions on vimc-video, create a V4L2 output device. When a streaming is initialized on the output device, it'll start returning those buffers on to the userspace, while always keeping one on the list. When the capture device starts streaming on the same pipeline, it will copy the first frame on the list to the process_frame pipeline. The vimc-streamer has a callback to warn all subdevices when a streaming will start or stop. This is useful so the subdevices can allocate a frame with suitable size to send through the pipeline, and then free then when it not used anymore. Since the vimc-output will also be part of the pipeline (but it is not a V4L2 subdevice), change the callback to be part of vimc_ent_device, rather than a linked with the subdevice API. In this way, it's possible to have a generic solution to call start/stop streaming callbacks to all devices in the pipeline. Add a vb2 buffer validate function. This is required to create a V4L2 output device. Signed-off-by: André Almeida --- drivers/media/platform/vimc/Makefile| 4 +- drivers/media/platform/vimc/vimc-common.h | 5 +- drivers/media/platform/vimc/vimc-debayer.c | 11 +- drivers/media/platform/vimc/vimc-output.c | 362 drivers/media/platform/vimc/vimc-scaler.c | 10 +- drivers/media/platform/vimc/vimc-sensor.c | 10 +- drivers/media/platform/vimc/vimc-streamer.c | 22 +- drivers/media/platform/vimc/vimc-video.c| 9 + drivers/media/platform/vimc/vimc-video.h| 2 + 9 files changed, 395 insertions(+), 40 deletions(-) create mode 100644 drivers/media/platform/vimc/vimc-output.c diff --git a/drivers/media/platform/vimc/Makefile b/drivers/media/platform/vimc/Makefile index fb90aa0f33a5..611331b9fceb 100644 --- a/drivers/media/platform/vimc/Makefile +++ b/drivers/media/platform/vimc/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 vimc-y := vimc-core.o vimc-common.o vimc-streamer.o -obj-$(CONFIG_VIDEO_VIMC) += vimc.o vimc-video.o vimc-capture.o vimc-debayer.o \ -vimc-scaler.o vimc-sensor.o +obj-$(CONFIG_VIDEO_VIMC) += vimc.o vimc-video.o vimc-capture.o vimc-output.o \ +vimc-debayer.o vimc-scaler.o vimc-sensor.o diff --git a/drivers/media/platform/vimc/vimc-common.h b/drivers/media/platform/vimc/vimc-common.h index 7b4d988b208b..003024bcdaed 100644 --- a/drivers/media/platform/vimc/vimc-common.h +++ b/drivers/media/platform/vimc/vimc-common.h @@ -78,6 +78,8 @@ struct vimc_platform_data { * @vdev_get_format: callback that returns the current format a pad, used * only when is_media_entity_v4l2_video_device(ent) returns * true + * @s_stream: callback used when vimc-capture will start/stop a + * streaming, to subdevice alloc/free it frames * * Each node of the topology must create a vimc_ent_device struct. Depending on * the node it will be of an instance of v4l2_subdev or video_device struct @@ -94,7 +96,8 @@ struct vimc_ent_device { void * (*process_frame)(struct vimc_ent_device *ved, const void *frame); void (*vdev_get_format)(struct vimc_ent_device *ved, - struct v4l2_pix_format *fmt); + struct v4l2_pix_format *fmt); + int (*s_stream)(struct vimc_ent_device *ved, int enable); }; /** diff --git a/drivers/media/platform/vimc/vimc-debayer.c b/drivers/media/platform/vimc/vimc-debayer.c index 00598fbf3cba..9e0c7c2c3c72 100644 --- a/drivers/media/platform/vimc/vimc-debayer.c +++ b/drivers/media/platform/vimc/vimc-debayer.c @@ -320,9 +320,10 @@ static void vimc_deb_set_rgb_pix_rgb24(struct vimc_deb_device *vdeb, vdeb->src_frame[index + i] = rgb[i]; } -static int vimc_deb_s_stream(struct v4l2_subdev *sd, int enable) +static int vimc_deb_s_stream(struct vimc_ent_device *ved, int enable) { - struct vimc_deb_device *vdeb = v4l2_get_subdevdata(sd); + struct vimc_deb_device *vdeb = container_of(ved, struct vimc_deb_device, + ved); if (enable) { u32 src_pixelformat = vdeb->ved.stream->producer_pixfmt; @@ -373,13 +374,8 @@ static int vimc_deb_s_stream(struct v4l2_subdev *sd, int enable) return 0; } -static const struct v4l2_subdev_video_ops vimc_deb_video_ops = { - .s_stream = vimc_deb_s_stream, -}; - static const struct v4l2_subdev_ops vimc_deb_ops = { .pad = &vimc_deb_pad_ops, - .video = &vimc_deb_video_ops, }; static unsigned int vimc_deb_get_val(const u8 *bytes, @@ -549,6 +545,7 @@ static int vimc_deb_comp_bind(struct device *comp, struct device *master, } vdeb->ved.process_frame = vimc_deb_process_frame; + vdeb->ved.s_stream = vimc_deb_s_stream; dev_set_drvdata(comp, &vdeb->ved); vdeb->dev = comp; diff --git a/d
[PATCH 4/7] media: vimc: Send null buffer through the pipeline
Send a NULL buffer through the video pipeline. If the Capture device gets a NULL buffer, it uses it default fallback frame. Make the capture device behave more like real devices when there's no frame to show. Signed-off-by: André Almeida --- drivers/media/platform/vimc/vimc-capture.c | 15 +++ drivers/media/platform/vimc/vimc-debayer.c | 3 +++ drivers/media/platform/vimc/vimc-scaler.c | 3 +++ drivers/media/platform/vimc/vimc-streamer.c | 2 +- 4 files changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/media/platform/vimc/vimc-capture.c b/drivers/media/platform/vimc/vimc-capture.c index e80fa1ee3dc1..4c65bf73530f 100644 --- a/drivers/media/platform/vimc/vimc-capture.c +++ b/drivers/media/platform/vimc/vimc-capture.c @@ -37,6 +37,15 @@ static int vimc_cap_start_streaming(struct vb2_queue *vq, unsigned int count) vcap->sequence = 0; + /* this is a fallback frame, it will be used if the pipeline +* is sending NULL frames +*/ + vcap->frame = vmalloc(vcap->format.sizeimage); + if (!vcap->frame) { + vimc_vid_return_all_buffers(vcap, VB2_BUF_STATE_QUEUED); + return -ENOMEM; + } + /* Start the media pipeline */ ret = media_pipeline_start(entity, &vcap->stream.pipe); if (ret) { @@ -65,6 +74,9 @@ void vimc_cap_stop_streaming(struct vb2_queue *vq) vimc_streamer_s_stream(&vcap->stream, &vcap->ved, 0); + vfree(vcap->frame); + vcap->frame = NULL; + media_pipeline_stop(&vcap->vdev.entity); vimc_vid_return_all_buffers(vcap, VB2_BUF_STATE_ERROR); @@ -96,6 +108,9 @@ static void *vimc_cap_process_frame(struct vimc_ent_device *ved, struct vimc_vid_buffer *vimc_buf; void *vbuf; + if (!frame) + frame = &vcap->frame; + spin_lock(&vcap->qlock); /* Get the first entry of the list */ diff --git a/drivers/media/platform/vimc/vimc-debayer.c b/drivers/media/platform/vimc/vimc-debayer.c index 9e0c7c2c3c72..d5f370f94573 100644 --- a/drivers/media/platform/vimc/vimc-debayer.c +++ b/drivers/media/platform/vimc/vimc-debayer.c @@ -483,6 +483,9 @@ static void *vimc_deb_process_frame(struct vimc_ent_device *ved, unsigned int rgb[3]; unsigned int i, j; + if (!sink_frame) + return NULL; + /* If the stream in this node is not active, just return */ if (!vdeb->src_frame) return ERR_PTR(-EINVAL); diff --git a/drivers/media/platform/vimc/vimc-scaler.c b/drivers/media/platform/vimc/vimc-scaler.c index 2c3e486a29c0..9edad3b14526 100644 --- a/drivers/media/platform/vimc/vimc-scaler.c +++ b/drivers/media/platform/vimc/vimc-scaler.c @@ -320,6 +320,9 @@ static void *vimc_sca_process_frame(struct vimc_ent_device *ved, struct vimc_sca_device *vsca = container_of(ved, struct vimc_sca_device, ved); + if (!sink_frame) + return NULL; + /* If the stream in this node is not active, just return */ if (!ved->stream) return ERR_PTR(-EINVAL); diff --git a/drivers/media/platform/vimc/vimc-streamer.c b/drivers/media/platform/vimc/vimc-streamer.c index fd330415ac38..f06c9308a41b 100644 --- a/drivers/media/platform/vimc/vimc-streamer.c +++ b/drivers/media/platform/vimc/vimc-streamer.c @@ -133,7 +133,7 @@ static int vimc_streamer_thread(void *data) for (i = stream->pipe_size - 1; i >= 0; i--) { frame = stream->ved_pipeline[i]->process_frame( stream->ved_pipeline[i], frame); - if (!frame || IS_ERR(frame)) + if (IS_ERR(frame)) break; } -- 2.22.0
[PATCH 7/7] media: vimc.rst: Add output device
Add information about the output device. Remove wrong information about what the capture exposes. Signed-off-by: André Almeida --- Documentation/media/v4l-drivers/vimc.rst | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/Documentation/media/v4l-drivers/vimc.rst b/Documentation/media/v4l-drivers/vimc.rst index 4628b12d417f..ccc04101ea51 100644 --- a/Documentation/media/v4l-drivers/vimc.rst +++ b/Documentation/media/v4l-drivers/vimc.rst @@ -4,7 +4,8 @@ The Virtual Media Controller Driver (vimc) == The vimc driver emulates complex video hardware using the V4L2 API and the Media -API. It has a capture device and three subdevices: sensor, debayer and scaler. +API. It has a capture device, an output device and three subdevices: sensor, +debayer and scaler. Topology @@ -40,6 +41,7 @@ of commands fits for the default topology: v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=1920,height=1440 v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81 v4l2-ctl -z platform:vimc -d "Raw Capture 1" -v pixelformat=BA81 +v4l2-ctl -z platform:vimc -e "RGB/YUV Input" -v width=640,height=480 Subdevices -- @@ -74,7 +76,13 @@ vimc-capture: Exposes: * 1 Pad sink - * 1 Pad source + +vimc-output: +Exposes node /dev/videoX to allow userspace to send frames to the +stream. +Exposes: + +* 1 Source sink Module options --- -- 2.22.0
[PATCH 6/7] media: vimc.dot: Update default topology diagram
Update the default topology diagram to reflect the current state of the driver. Signed-off-by: André Almeida --- Documentation/media/v4l-drivers/vimc.dot | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/media/v4l-drivers/vimc.dot b/Documentation/media/v4l-drivers/vimc.dot index 57863a13fa39..b40c151ff790 100644 --- a/Documentation/media/v4l-drivers/vimc.dot +++ b/Documentation/media/v4l-drivers/vimc.dot @@ -9,13 +9,13 @@ digraph board { n0003:port0 -> n0008:port0 [style=bold] n0003:port0 -> n000f [style=bold] n0005 [label="{{ 0} | Debayer A\n/dev/v4l-subdev2 | { 1}}", shape=Mrecord, style=filled, fillcolor=green] - n0005:port1 -> n0017:port0 + n0005:port1 -> n0017:port0 [style=dashed] n0008 [label="{{ 0} | Debayer B\n/dev/v4l-subdev3 | { 1}}", shape=Mrecord, style=filled, fillcolor=green] n0008:port1 -> n0017:port0 [style=dashed] n000b [label="Raw Capture 0\n/dev/video0", shape=box, style=filled, fillcolor=yellow] n000f [label="Raw Capture 1\n/dev/video1", shape=box, style=filled, fillcolor=yellow] n0013 [label="RGB/YUV Input\n/dev/video2", shape=box, style=filled, fillcolor=yellow] - n0013 -> n0017:port0 [style=dashed] + n0013 -> n0017:port0 [style=bold] n0017 [label="{{ 0} | Scaler\n/dev/v4l-subdev4 | { 1}}", shape=Mrecord, style=filled, fillcolor=green] n0017:port1 -> n001a [style=bold] n001a [label="RGB/YUV Capture\n/dev/video3", shape=box, style=filled, fillcolor=yellow] -- 2.22.0
[PATCH 5/7] media: vimc: core: Add output device on the pipeline
Add the output video device on the hardcoded pipeline. Change the link to it be enabled by default. Signed-off-by: André Almeida --- drivers/media/platform/vimc/vimc-core.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-core.c b/drivers/media/platform/vimc/vimc-core.c index 571c55aa0e16..ecdea1d631c5 100644 --- a/drivers/media/platform/vimc/vimc-core.c +++ b/drivers/media/platform/vimc/vimc-core.c @@ -95,8 +95,7 @@ static const struct vimc_ent_config ent_config[] = { }, { .name = "RGB/YUV Input", - /* TODO: change this to vimc-input when it is implemented */ - .drv = "vimc-sensor", + .drv = "vimc-output", }, { .name = "Scaler", @@ -118,11 +117,11 @@ static const struct vimc_ent_link ent_links[] = { /* Link: Sensor B (Pad 0)->(Pad 0) Raw Capture 1 */ VIMC_ENT_LINK(1, 0, 5, 0, MEDIA_LNK_FL_ENABLED | MEDIA_LNK_FL_IMMUTABLE), /* Link: Debayer A (Pad 1)->(Pad 0) Scaler */ - VIMC_ENT_LINK(2, 1, 7, 0, MEDIA_LNK_FL_ENABLED), + VIMC_ENT_LINK(2, 1, 7, 0, 0), /* Link: Debayer B (Pad 1)->(Pad 0) Scaler */ VIMC_ENT_LINK(3, 1, 7, 0, 0), /* Link: RGB/YUV Input (Pad 0)->(Pad 0) Scaler */ - VIMC_ENT_LINK(6, 0, 7, 0, 0), + VIMC_ENT_LINK(6, 0, 7, 0, MEDIA_LNK_FL_ENABLED | MEDIA_LNK_FL_IMMUTABLE), /* Link: Scaler (Pad 1)->(Pad 0) RGB/YUV Capture */ VIMC_ENT_LINK(7, 1, 8, 0, MEDIA_LNK_FL_ENABLED | MEDIA_LNK_FL_IMMUTABLE), }; -- 2.22.0
Re: [PATCH v7 02/14] media: doc: add document for rkisp1 meta buffer format
Hello Helen, On 7/3/19 4:08 PM, Helen Koike wrote: > From: Jacob Chen > > This commit add document for rkisp1 meta buffer format > > Signed-off-by: Jacob Chen > Acked-by: Hans Verkuil > [update for upstream] > Signed-off-by: Helen Koike > > --- > > Changes in v7: > - s/correspond/corresponding > - s/use/uses > - s/docuemnt/document > > Documentation/media/uapi/v4l/meta-formats.rst | 2 ++ > .../uapi/v4l/pixfmt-meta-rkisp1-params.rst| 20 +++ > .../uapi/v4l/pixfmt-meta-rkisp1-stat.rst | 18 + > 3 files changed, 40 insertions(+) > create mode 100644 Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-params.rst > create mode 100644 Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-stat.rst > > diff --git a/Documentation/media/uapi/v4l/meta-formats.rst > b/Documentation/media/uapi/v4l/meta-formats.rst > index b10ca9ee3968..5de621fea3cc 100644 > --- a/Documentation/media/uapi/v4l/meta-formats.rst > +++ b/Documentation/media/uapi/v4l/meta-formats.rst > @@ -24,3 +24,5 @@ These formats are used for the :ref:`metadata` interface > only. > pixfmt-meta-uvc > pixfmt-meta-vsp1-hgo > pixfmt-meta-vsp1-hgt > +pixfmt-meta-rkisp1-params > +pixfmt-meta-rkisp1-stat > diff --git a/Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-params.rst > b/Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-params.rst > new file mode 100644 > index ..61b81331f820 > --- /dev/null > +++ b/Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-params.rst > @@ -0,0 +1,20 @@ You can add a license information here, like this: .. SPDX-License-Identifier: GPL-2.0 > +.. -*- coding: utf-8; mode: rst -*- I believe it's not a good idea to include Emacs configuration in the source [1]. > + > +.. _v4l2-meta-fmt-rkisp1-params: > + > +*** > +V4L2_META_FMT_RK_ISP1_PARAMS > +*** There's 3 extra `*`, keep the length of marks just as the length of the text. Also, for titles, you should use `=` [2] instead of `*`, like this: = Title = > + > +Rockchip ISP1 Parameters Data > + > +Description > +=== > + > +This format describes input parameters for the Rockchip ISP1. > + > +It uses c-struct :c:type:`rkisp1_isp_params_cfg`, which is defined in > +the ``linux/rkisp1-config.h`` header file, see it for details. Since the struct is already using kernel-doc syntax, you can including in this file using something like this: .. kernel-doc:: include/uapi/linux/rkisp1-config.h :functions: rkisp1_isp_params_cfg > + > +The parameters consist of multiple modules. > +The module won't be updated if the corresponding bit was not set in > module_*_update. > diff --git a/Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-stat.rst > b/Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-stat.rst > new file mode 100644 > index ..5496e1d42273 > --- /dev/null > +++ b/Documentation/media/uapi/v4l/pixfmt-meta-rkisp1-stat.rst The previous feedback also applies to this file. > @@ -0,0 +1,18 @@ > +.. -*- coding: utf-8; mode: rst -*- > + > +.. _v4l2-meta-fmt-rkisp1-stat: > + > +*** > +V4L2_META_FMT_RK_ISP1_STAT_3A > +*** > + > +Rockchip ISP1 Statistics Data > + > +Description > +=== > + > +This format describes image color statistics information generated by the > Rockchip > +ISP1. > + > +It uses c-struct :c:type:`rkisp1_stat_buffer`, which is defined in > +the ``linux/cifisp_stat.h`` header file, see it for details. Regards, André [1] https://www.kernel.org/doc/html/latest/process/coding-style.html#editor-modelines-and-other-cruft [2] https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html#specific-guidelines-for-the-kernel-documentation
Re: [PATCH v7 10/14] dt-bindings: Document the Rockchip ISP1 bindings
Hello Helen, On 7/3/19 4:09 PM, Helen Koike wrote: > From: Jacob Chen > > Add DT bindings documentation for Rockchip ISP1 > > Signed-off-by: Jacob Chen > Reviewed-by: Rob Herring > [update for upstream] > Signed-off-by: Helen Koike > > --- > > Changes in v7: > - update document with new design and tested example > > .../bindings/media/rockchip-isp1.txt | 71 +++ > 1 file changed, 71 insertions(+) > create mode 100644 Documentation/devicetree/bindings/media/rockchip-isp1.txt > > diff --git a/Documentation/devicetree/bindings/media/rockchip-isp1.txt > b/Documentation/devicetree/bindings/media/rockchip-isp1.txt > new file mode 100644 > index ..a97fef0f189f > --- /dev/null > +++ b/Documentation/devicetree/bindings/media/rockchip-isp1.txt > @@ -0,0 +1,71 @@ > +Rockchip SoC Image Signal Processing unit v1 > +-- > + > +Rockchip ISP1 is the Camera interface for the Rockchip series of SoCs > +which contains image processing, scaling, and compression funcitons. > + > +Required properties: > +- compatible: value should be one of the following > + "rockchip,rk3288-cif-isp"; > + "rockchip,rk3399-cif-isp"; > +- reg : offset and length of the register set for the device. > +- interrupts: should contain ISP interrupt. > +- clocks: phandle to the required clocks. > +- clock-names: required clock name. > +- iommus: required a iommu node. > +- phys: the phandle for the PHY port > +- phy-names: must contain "dphy" > + > +port node > +--- I would remove those extra `---`, and keep as: ...node ... The same applies for the title. Thanks, André > + > +The device node should contain one 'ports' child node, with children 'port' > +with child 'endpoint'. > +nodes, according to the bindings defined in > Documentation/devicetree/bindings/ > +media/video-interfaces.txt. > + > +- endpoint(mipi): > + - remote-endpoint: Connecting to Rockchip MIPI-DPHY, > + which is defined in rockchip-mipi-dphy.txt. > + > +The port node must contain at least one endpoint, either parallel or mipi. > +It could have multiple endpoints, but please note the hardware don't support > +two sensors work at a time, they are supposed to work asynchronously. > + > +Device node example > +--- > + > + isp0: isp0@ff91 { > + compatible = "rockchip,rk3399-cif-isp"; > + reg = <0x0 0xff91 0x0 0x4000>; > + interrupts = ; > + clocks = <&cru SCLK_ISP0>, > + <&cru ACLK_ISP0>, <&cru ACLK_ISP0_WRAPPER>, > + <&cru HCLK_ISP0>, <&cru HCLK_ISP0_WRAPPER>; > + clock-names = "clk_isp", > + "aclk_isp", "aclk_isp_wrap", > + "hclk_isp", "hclk_isp_wrap"; > + power-domains = <&power RK3399_PD_ISP0>; > + iommus = <&isp0_mmu>; > + phys = <&dphy>; > + phy-names = "dphy"; > + > + ports { > + #address-cells = <1>; > + #size-cells = <0>; > + > + port@0 { > + mipi_in_wcam: endpoint@0 { > + reg = <0>; > + remote-endpoint = <&wcam_out>; > + data-lanes = <1 2>; > + }; > + > + mipi_in_ucam: endpoint@1 { > + reg = <1>; > + remote-endpoint = <&ucam_out>; > + data-lanes = <1>; > + }; > + }; > + }; > + };
Re: [PATCH v7 11/14] dt-bindings: Document the Rockchip MIPI RX D-PHY bindings
Hello Helen, On 7/3/19 4:09 PM, Helen Koike wrote: > From: Jacob Chen > > Add DT bindings documentation for Rockchip MIPI D-PHY RX > > Signed-off-by: Jacob Chen > Reviewed-by: Rob Herring > [update for upstream] > Signed-off-by: Helen Koike > > --- > > Changes in v7: > - updated doc with new design and tested example > > .../bindings/media/rockchip-mipi-dphy.txt | 38 +++ > 1 file changed, 38 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/media/rockchip-mipi-dphy.txt > > diff --git a/Documentation/devicetree/bindings/media/rockchip-mipi-dphy.txt > b/Documentation/devicetree/bindings/media/rockchip-mipi-dphy.txt > new file mode 100644 > index ..2305d44d92db > --- /dev/null > +++ b/Documentation/devicetree/bindings/media/rockchip-mipi-dphy.txt > @@ -0,0 +1,38 @@ > +Rockchip SoC MIPI RX D-PHY > +- Here I would also remove those extra `---`. Thanks, André > + > +Required properties: > +- compatible: value should be one of the following > + "rockchip,rk3288-mipi-dphy" > + "rockchip,rk3399-mipi-dphy" > +- clocks : list of clock specifiers, corresponding to entries in > + clock-names property; > +- clock-names: required clock name. > +- #phy-cells: Number of cells in a PHY specifier; Should be 0. > + > +MIPI RX D-PHY use registers in "general register files", it > +should be a child of the GRF. > + > +Optional properties: > +- reg: offset and length of the register set for the device. > +- rockchip,grf: MIPI TX1RX1 D-PHY not only has its own register but also > + the GRF, so it is only necessary for MIPI TX1RX1 D-PHY. > + > +Device node example > +--- > + > +grf: syscon@ff77 { > + compatible = "rockchip,rk3399-grf", "syscon", "simple-mfd"; > + > +... > + > + dphy: mipi-dphy { > + compatible = "rockchip,rk3399-mipi-dphy"; > + clocks = <&cru SCLK_MIPIDPHY_REF>, > + <&cru SCLK_DPHY_RX0_CFG>, > + <&cru PCLK_VIO_GRF>; > + clock-names = "dphy-ref", "dphy-cfg", "grf"; > + power-domains = <&power RK3399_PD_VIO>; > + #phy-cells = <0>; > + }; > +};
Re: [PATCH v7 08/14] media: rkisp1: add capture device driver
Hello Helen, On 7/3/19 4:09 PM, Helen Koike wrote: > From: Jacob Chen > > This is the capture device interface driver that provides the v4l2 > user interface. Frames can be received from ISP1. > > Signed-off-by: Jacob Chen > Signed-off-by: Shunqian Zheng > Signed-off-by: Yichong Zhong > Signed-off-by: Jacob Chen > Signed-off-by: Eddie Cai > Signed-off-by: Jeffy Chen > Signed-off-by: Allon Huang > Signed-off-by: Tomasz Figa > [updated for upstream] > Signed-off-by: Helen Koike > > --- > Hi, > > Hans, I removed your Acked-by tag since some things changed. Please let > me know if I can keep it. > > Thanks > Helen > > Changes in v7: > - s/strlcpy/strscpy > - Fix v4l2-compliance issues: > * remove input ioctls > media api can be used to define the topology, this input api is not > required. Besides it, if an input is enumerated, v4l2-compliance is not > happy with G_FMT returning the default colorspace instead of something > more specific. > * return the pixelformat to the userspace > G_/S_/TRY_ FORMAT should return a valid pixelformat to the user, even if > the user gave an invalid one > * add missing default colorspace and ycbcr > * fix wrong pixformat in mp_fmts[] table > * add buf type check in s_/g_selection > * queue_setup - check sizes > * normalize bus_info name > * fix field any v4l2-compliance -s complain - set field none > when streaming > - Fix compiling error: > s/vidioc_enum_fmt_vid_cap_mplane/vidioc_enum_fmt_vid_cap > - Replace stream state with a boolean > The rkisp1_state enum consists only of 3 entries, where 1 is completely > unused and the other two respectively mean not streaming or streaming. > Replace it with a boolean called "streaming". > - Simplify MI interrupt handling > Rather than adding unnecessary indirection, just use stream index to > handle MI interrupt enable/disable/clear, since the stream index matches > the order of bits now, thanks to previous patch. While at it, remove > some dead code. > - code styling and checkpatch fixes > - add link_validate: don't allow a link with bayer/non-bayer mismatch > > .../media/platform/rockchip/isp1/capture.c| 1754 + > .../media/platform/rockchip/isp1/capture.h| 164 ++ > drivers/media/platform/rockchip/isp1/regs.c | 223 +++ > drivers/media/platform/rockchip/isp1/regs.h | 1525 ++ > 4 files changed, 3666 insertions(+) > create mode 100644 drivers/media/platform/rockchip/isp1/capture.c > create mode 100644 drivers/media/platform/rockchip/isp1/capture.h > create mode 100644 drivers/media/platform/rockchip/isp1/regs.c > create mode 100644 drivers/media/platform/rockchip/isp1/regs.h > > diff --git a/drivers/media/platform/rockchip/isp1/capture.c > b/drivers/media/platform/rockchip/isp1/capture.c > new file mode 100644 > index ..86ceeb8443e7 > --- /dev/null > +++ b/drivers/media/platform/rockchip/isp1/capture.c > @@ -0,0 +1,1754 @@ > +// SPDX-License-Identifier: (GPL-2.0+ OR MIT) > +/* > + * Rockchip isp1 driver > + * > + * Copyright (C) 2017 Rockchip Electronics Co., Ltd. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "dev.h" > +#include "regs.h" > + > +/* > + * NOTE: > + * 1. There are two capture video devices in rkisp1, selfpath and mainpath > + * 2. Two capture device have separated memory-interface/crop/scale units. > + * 3. Besides describing stream hardware, this file also contain entries > + *for pipeline operations. > + * 4. The register read/write operations in this file are put into regs.c. > + */ > + > +/* > + * differences between selfpatch and mainpath > + * available mp sink input: isp > + * available sp sink input : isp, dma(TODO) > + * available mp sink pad fmts: yuv422, raw > + * available sp sink pad fmts: yuv422, yuv420.. > + * available mp source fmts: yuv, raw, jpeg(TODO) > + * available sp source fmts: yuv, rgb > + */ > + > +#define CIF_ISP_REQ_BUFS_MIN 1 > +#define CIF_ISP_REQ_BUFS_MAX 8 > + > +#define STREAM_PAD_SINK 0 > +#define STREAM_PAD_SOURCE1 > + > +#define STREAM_MAX_MP_RSZ_OUTPUT_WIDTH 4416 > +#define STREAM_MAX_MP_RSZ_OUTPUT_HEIGHT 3312 > +#define STREAM_MAX_SP_RSZ_OUTPUT_WIDTH 1920 > +#define STREAM_MAX_SP_RSZ_OUTPUT_HEIGHT 1920 > +#define STREAM_MIN_RSZ_OUTPUT_WIDTH 32 > +#define STREAM_MIN_RSZ_OUTPUT_HEIGHT 16 > + > +#define STREAM_MAX_MP_SP_INPUT_WIDTH STREAM_MAX_MP_RSZ_OUTPUT_WIDTH > +#define STREAM_MAX_MP_SP_INPUT_HEIGHT STREAM_MAX_MP_RSZ_OUTPUT_HEIGHT > +#define STREAM_MIN_MP_SP_INPUT_WIDTH 32 > +#define STREAM_MIN_MP_SP_INPUT_HEIGHT32 > + > +/* Get xsubs and ysubs for fourcc formats > + * > + * @xsubs: horizontal color samples in a 4*4 matrix, for yuv > + * @ysubs: vertical color samples in a 4*4 matrix, for yuv > + */ > +static
Re: [PATCH 2/2] Revert "media: vimc: propagate pixel format in the stream"
Hello Helen, On 7/9/19 4:43 PM, Helen Koike wrote: > This reverts commit b6c61a6c37317efd7327199bfe24770af3d7e799. > > The requested pixelformat is being propagated from the capture to the > tpg in the sensor. > > This was a bad design choice, as we start having the following issues: > > * We set a pixelformat in the capture; > * We set matching media bus formats in the subdevices pads; > * Link validate looks fine (sizes matches, media bus formats matches); > * Issue: if some of the subdevice doesn't know how to generate the > requested pixelformat in the capture, then stream_on fails. This is bad > because capture says it supports that pixelformat, everything looks > fine, but it is not, and there is no way to find it out through the > links. > > This patch was implemented so we could request any pixelformat from the > pipeline regardeless of the media bus format configured between pads. > Not all pixelformat can be mapped into a media bus code (e.g. > multiplanar formats), so with this patch we could request those > pixelformats from the tpg. > > Solution: map pixelformats to media bus codes as before, and implement > conversions to other pixelformats in the capture to support multiplanar. > > So first step to this solution is to revert this patch. > > Signed-off-by: Helen Koike The commit title of [PATCH 1/2] starts with "media: Revert..." and [PATCH 2/2] starts with "Revert...", not sure if this is relevant. I've applied this changes at media-master and `test-media vimc` (from v4l-utils) got 0 errors and 0 warnings. I've also tested with qv4l2, and the streaming went as expected for all capture devices. Tested-by: André Almeida > --- > drivers/media/platform/vimc/vimc-capture.c | 76 ++--- > drivers/media/platform/vimc/vimc-common.c | 309 > drivers/media/platform/vimc/vimc-common.h | 58 ++-- > drivers/media/platform/vimc/vimc-debayer.c | 83 ++ > drivers/media/platform/vimc/vimc-scaler.c | 63 ++-- > drivers/media/platform/vimc/vimc-sensor.c | 51 +++- > drivers/media/platform/vimc/vimc-streamer.c | 2 - > drivers/media/platform/vimc/vimc-streamer.h | 6 - > 8 files changed, 342 insertions(+), 306 deletions(-) > > diff --git a/drivers/media/platform/vimc/vimc-capture.c > b/drivers/media/platform/vimc/vimc-capture.c > index 664855708fdf..1d56b91830ba 100644 > --- a/drivers/media/platform/vimc/vimc-capture.c > +++ b/drivers/media/platform/vimc/vimc-capture.c > @@ -18,32 +18,6 @@ > > #define VIMC_CAP_DRV_NAME "vimc-capture" > > -static const u32 vimc_cap_supported_pixfmt[] = { > - V4L2_PIX_FMT_BGR24, > - V4L2_PIX_FMT_RGB24, > - V4L2_PIX_FMT_ARGB32, > - V4L2_PIX_FMT_SBGGR8, > - V4L2_PIX_FMT_SGBRG8, > - V4L2_PIX_FMT_SGRBG8, > - V4L2_PIX_FMT_SRGGB8, > - V4L2_PIX_FMT_SBGGR10, > - V4L2_PIX_FMT_SGBRG10, > - V4L2_PIX_FMT_SGRBG10, > - V4L2_PIX_FMT_SRGGB10, > - V4L2_PIX_FMT_SBGGR10ALAW8, > - V4L2_PIX_FMT_SGBRG10ALAW8, > - V4L2_PIX_FMT_SGRBG10ALAW8, > - V4L2_PIX_FMT_SRGGB10ALAW8, > - V4L2_PIX_FMT_SBGGR10DPCM8, > - V4L2_PIX_FMT_SGBRG10DPCM8, > - V4L2_PIX_FMT_SGRBG10DPCM8, > - V4L2_PIX_FMT_SRGGB10DPCM8, > - V4L2_PIX_FMT_SBGGR12, > - V4L2_PIX_FMT_SGBRG12, > - V4L2_PIX_FMT_SGRBG12, > - V4L2_PIX_FMT_SRGGB12, > -}; > - > struct vimc_cap_device { > struct vimc_ent_device ved; > struct video_device vdev; > @@ -117,25 +91,29 @@ static int vimc_cap_try_fmt_vid_cap(struct file *file, > void *priv, > struct v4l2_format *f) > { > struct v4l2_pix_format *format = &f->fmt.pix; > + const struct vimc_pix_map *vpix; > > format->width = clamp_t(u32, format->width, VIMC_FRAME_MIN_WIDTH, > VIMC_FRAME_MAX_WIDTH) & ~1; > format->height = clamp_t(u32, format->height, VIMC_FRAME_MIN_HEIGHT, >VIMC_FRAME_MAX_HEIGHT) & ~1; > > - vimc_colorimetry_clamp(format); > + /* Don't accept a pixelformat that is not on the table */ > + vpix = vimc_pix_map_by_pixelformat(format->pixelformat); > + if (!vpix) { > + format->pixelformat = fmt_default.pixelformat; > + vpix = vimc_pix_map_by_pixelformat(format->pixelformat); > + } > + /* TODO: Add support for custom bytesperline values */ > + format->bytesperline = format->width * vpix->bpp; > + format->sizeimage = format->bytesperline * format->height; > > if (format->field == V4L2_FIELD_ANY) > format->field = fmt_default.field; &g
Re: [PATCH 5/8] media: vimc: stream: cleanup frame field from struct vimc_stream
Hello, I've tested the stream (with the default media bus format and using an alternative one) using a custom user space application and tested all capture entities using qv4l2 -d /dev/videoX (where X is each of the capture entities). Also, I have checked v4l2-compliance -d /dev/videoX and with v4l2-compliance --streaming=5 -d /dev/videoX and no regressions where found. On 3/6/19 7:42 PM, Helen Koike wrote: There is no need to have the frame field in the vimc_stream struct. Signed-off-by: Helen Koike Tested-by: André Almeida --- drivers/media/platform/vimc/vimc-streamer.c | 10 -- drivers/media/platform/vimc/vimc-streamer.h | 1 - 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.c b/drivers/media/platform/vimc/vimc-streamer.c index 392754c18046..b7c1fdef5f0d 100644 --- a/drivers/media/platform/vimc/vimc-streamer.c +++ b/drivers/media/platform/vimc/vimc-streamer.c @@ -117,6 +117,7 @@ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, static int vimc_streamer_thread(void *data) { struct vimc_stream *stream = data; + u8 *frame = NULL; int i; set_freezable(); @@ -127,12 +128,9 @@ static int vimc_streamer_thread(void *data) break; for (i = stream->pipe_size - 1; i >= 0; i--) { - stream->frame = stream->ved_pipeline[i]->process_frame( - stream->ved_pipeline[i], - stream->frame); - if (!stream->frame) - break; - if (IS_ERR(stream->frame)) + frame = stream->ved_pipeline[i]->process_frame( + stream->ved_pipeline[i], frame); + if (!frame || IS_ERR(frame)) break; } //wait for 60hz diff --git a/drivers/media/platform/vimc/vimc-streamer.h b/drivers/media/platform/vimc/vimc-streamer.h index 752af2e2d5a2..dc1d0be431cb 100644 --- a/drivers/media/platform/vimc/vimc-streamer.h +++ b/drivers/media/platform/vimc/vimc-streamer.h @@ -19,7 +19,6 @@ struct vimc_stream { struct media_pipeline pipe; struct vimc_ent_device *ved_pipeline[VIMC_STREAMER_PIPELINE_MAX_SIZE]; unsigned int pipe_size; - u8 *frame; struct task_struct *kthread; };
Re: [PATCH 7/8] media: vimc: stream: init/terminate the first entity
On 3/6/19 7:42 PM, Helen Koike wrote: The s_stream callback was not being called for the first entity in the stream pipeline array. Instead of verifying the type of the node (video or subdevice) and calling s_stream from the second entity in the pipeline, do this process for all the entities in the pipeline for consistency. The previous code was not a problem because the first entity is a video device and not a subdevice, but this patch prepares vimc to allow setting some configuration in the entity before calling s_stream. Signed-off-by: Helen Koike Hello, I've tested the stream (with the default media bus format and using an alternative one) using a custom user space application and tested all capture entities using qv4l2 -d /dev/videoX (where X is each of the capture entities). Also, I have checked v4l2-compliance -d /dev/videoX and with v4l2-compliance --streaming=5 -d /dev/videoX and no regressions where found. Tested-by: André Almeida --- drivers/media/platform/vimc/vimc-streamer.c | 25 - 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-streamer.c b/drivers/media/platform/vimc/vimc-streamer.c index b7c1fdef5f0d..5a3bda62fbc8 100644 --- a/drivers/media/platform/vimc/vimc-streamer.c +++ b/drivers/media/platform/vimc/vimc-streamer.c @@ -46,19 +46,18 @@ static struct media_entity *vimc_get_source_entity(struct media_entity *ent) */ static void vimc_streamer_pipeline_terminate(struct vimc_stream *stream) { - struct media_entity *entity; + struct vimc_ent_device *ved; struct v4l2_subdev *sd; while (stream->pipe_size) { stream->pipe_size--; - entity = stream->ved_pipeline[stream->pipe_size]->ent; - entity = vimc_get_source_entity(entity); + ved = stream->ved_pipeline[stream->pipe_size]; stream->ved_pipeline[stream->pipe_size] = NULL; - if (!is_media_entity_v4l2_subdev(entity)) + if (!is_media_entity_v4l2_subdev(ved->ent)) continue; - sd = media_entity_to_v4l2_subdev(entity); + sd = media_entity_to_v4l2_subdev(ved->ent); v4l2_subdev_call(sd, video, s_stream, 0); } } @@ -89,18 +88,24 @@ static int vimc_streamer_pipeline_init(struct vimc_stream *stream, } stream->ved_pipeline[stream->pipe_size++] = ved; + if (is_media_entity_v4l2_subdev(ved->ent)) { + sd = media_entity_to_v4l2_subdev(ved->ent); + ret = v4l2_subdev_call(sd, video, s_stream, 1); + if (ret && ret != -ENOIOCTLCMD) { + pr_err("subdev_call error %s\n", ved->ent->name); + vimc_streamer_pipeline_terminate(stream); + return ret; + } + } + entity = vimc_get_source_entity(ved->ent); /* Check if the end of the pipeline was reached*/ if (!entity) return 0; + /* Get the next device in the pipeline */ if (is_media_entity_v4l2_subdev(entity)) { sd = media_entity_to_v4l2_subdev(entity); - ret = v4l2_subdev_call(sd, video, s_stream, 1); - if (ret && ret != -ENOIOCTLCMD) { - vimc_streamer_pipeline_terminate(stream); - return ret; - } ved = v4l2_get_subdevdata(sd); } else { vdev = container_of(entity,
Re: [PATCH 8/8] media: vimc: propagate pixel format in the stream
On 3/6/19 7:42 PM, Helen Koike wrote: Media bus codes were being mapped to pixelformats, which causes a limitation on vimc because not all pixelformats can be mapped to media bus codes. Also, media bus codes are an internal configuration from the device. Userspace only assures media bus codes matches between pads and expects the image in a given pixelformat. So we can allow almost any media bus format to be configured between pads, except for debayer that expects a media bus code of type bayer in the sink pad. Signed-off-by: Helen Koike Hello, I've tested the stream (with the default media bus format and using an alternative one) using a custom user space application and tested all capture entities using qv4l2 -d /dev/videoX (where X is each of the capture entities). Also, I have checked v4l2-compliance -d /dev/videoX and with v4l2-compliance --streaming=5 -d /dev/videoX and no regressions where found. Tested-by: André Almeida --- drivers/media/platform/vimc/vimc-capture.c | 76 +++-- drivers/media/platform/vimc/vimc-common.c | 307 drivers/media/platform/vimc/vimc-common.h | 13 + drivers/media/platform/vimc/vimc-debayer.c | 78 +++-- drivers/media/platform/vimc/vimc-scaler.c | 60 ++-- drivers/media/platform/vimc/vimc-sensor.c | 48 +-- drivers/media/platform/vimc/vimc-streamer.c | 2 + drivers/media/platform/vimc/vimc-streamer.h | 6 + 8 files changed, 281 insertions(+), 309 deletions(-) diff --git a/drivers/media/platform/vimc/vimc-capture.c b/drivers/media/platform/vimc/vimc-capture.c index e976a9d6b460..6377974879d7 100644 --- a/drivers/media/platform/vimc/vimc-capture.c +++ b/drivers/media/platform/vimc/vimc-capture.c @@ -28,6 +28,32 @@ #define VIMC_CAP_DRV_NAME "vimc-capture" +static const u32 vimc_cap_supported_pixftm[] = { + V4L2_PIX_FMT_BGR24, + V4L2_PIX_FMT_RGB24, + V4L2_PIX_FMT_ARGB32, + V4L2_PIX_FMT_SBGGR8, + V4L2_PIX_FMT_SGBRG8, + V4L2_PIX_FMT_SGRBG8, + V4L2_PIX_FMT_SRGGB8, + V4L2_PIX_FMT_SBGGR10, + V4L2_PIX_FMT_SGBRG10, + V4L2_PIX_FMT_SGRBG10, + V4L2_PIX_FMT_SRGGB10, + V4L2_PIX_FMT_SBGGR10ALAW8, + V4L2_PIX_FMT_SGBRG10ALAW8, + V4L2_PIX_FMT_SGRBG10ALAW8, + V4L2_PIX_FMT_SRGGB10ALAW8, + V4L2_PIX_FMT_SBGGR10DPCM8, + V4L2_PIX_FMT_SGBRG10DPCM8, + V4L2_PIX_FMT_SGRBG10DPCM8, + V4L2_PIX_FMT_SRGGB10DPCM8, + V4L2_PIX_FMT_SBGGR12, + V4L2_PIX_FMT_SGBRG12, + V4L2_PIX_FMT_SGRBG12, + V4L2_PIX_FMT_SRGGB12, +}; + struct vimc_cap_device { struct vimc_ent_device ved; struct video_device vdev; @@ -101,29 +127,25 @@ static int vimc_cap_try_fmt_vid_cap(struct file *file, void *priv, struct v4l2_format *f) { struct v4l2_pix_format *format = &f->fmt.pix; - const struct vimc_pix_map *vpix; format->width = clamp_t(u32, format->width, VIMC_FRAME_MIN_WIDTH, VIMC_FRAME_MAX_WIDTH) & ~1; format->height = clamp_t(u32, format->height, VIMC_FRAME_MIN_HEIGHT, VIMC_FRAME_MAX_HEIGHT) & ~1; - /* Don't accept a pixelformat that is not on the table */ - vpix = vimc_pix_map_by_pixelformat(format->pixelformat); - if (!vpix) { - format->pixelformat = fmt_default.pixelformat; - vpix = vimc_pix_map_by_pixelformat(format->pixelformat); - } - /* TODO: Add support for custom bytesperline values */ - format->bytesperline = format->width * vpix->bpp; - format->sizeimage = format->bytesperline * format->height; + vimc_colorimetry_clamp(format); if (format->field == V4L2_FIELD_ANY) format->field = fmt_default.field; - vimc_colorimetry_clamp(format); + /* TODO: Add support for custom bytesperline values */ - return 0; + /* Don't accept a pixelformat that is not on the table */ + if (!v4l2_format_info(format->pixelformat)) + format->pixelformat = fmt_default.pixelformat; + + return v4l2_fill_pixfmt(format, format->pixelformat, + format->width, format->height); } static int vimc_cap_s_fmt_vid_cap(struct file *file, void *priv, @@ -159,27 +181,31 @@ static int vimc_cap_s_fmt_vid_cap(struct file *file, void *priv, static int vimc_cap_enum_fmt_vid_cap(struct file *file, void *priv, struct v4l2_fmtdesc *f) { - const struct vimc_pix_map *vpix = vimc_pix_map_by_index(f->index); - - if (!vpix) + if (f->index >= ARRAY_SIZE(vimc_cap_supported_pixftm)) return -EINVAL; - f->pixelformat = vpix->pixelformat; + f->pixelformat = vimc_cap_supported_pixftm[f->index]; return 0; } +static bool vimc_cap_is_pix
[PATCH 0/4] futex: Minor code improvements
Hello, This series aims to make some small code improvements that I found in futex.c, removing some lines and trying to make the code easier to read and understand. All commits tested with futex tests from kselftest. André Almeida (4): futex: Remove put_futex_key() futex: Remove needless goto's futex: Remove unused or redundant includes futex: Consistently use fshared as boolean kernel/futex.c | 115 +++-- 1 file changed, 26 insertions(+), 89 deletions(-) -- 2.26.2
[PATCH 1/4] futex: Remove put_futex_key()
Since 4b39f99c ("futex: Remove {get,drop}_futex_key_refs()"), function put_futex_key() is empty. Remove all references for this function and redundant labels. Signed-off-by: André Almeida --- kernel/futex.c | 61 ++ 1 file changed, 12 insertions(+), 49 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index b59532862bc0..1f0287a51dce 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -674,10 +674,6 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a return err; } -static inline void put_futex_key(union futex_key *key) -{ -} - /** * fault_in_user_writeable() - Fault in user address and verify RW access * @uaddr: pointer to faulting user space address @@ -1614,7 +1610,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) /* Make sure we really have tasks to wakeup */ if (!hb_waiters_pending(hb)) - goto out_put_key; + goto out; spin_lock(&hb->lock); @@ -1637,8 +1633,6 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset) spin_unlock(&hb->lock); wake_up_q(&wake_q); -out_put_key: - put_futex_key(&key); out: return ret; } @@ -1709,7 +1703,7 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, goto out; ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE); if (unlikely(ret != 0)) - goto out_put_key1; + goto out; hb1 = hash_futex(&key1); hb2 = hash_futex(&key2); @@ -1727,13 +1721,13 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, * an MMU, but we might get them from range checking */ ret = op_ret; - goto out_put_keys; + goto out; } if (op_ret == -EFAULT) { ret = fault_in_user_writeable(uaddr2); if (ret) - goto out_put_keys; + goto out; } if (!(flags & FLAGS_SHARED)) { @@ -1741,8 +1735,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, goto retry_private; } - put_futex_key(&key2); - put_futex_key(&key1); cond_resched(); goto retry; } @@ -1778,10 +1770,6 @@ futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2, out_unlock: double_unlock_hb(hb1, hb2); wake_up_q(&wake_q); -out_put_keys: - put_futex_key(&key2); -out_put_key1: - put_futex_key(&key1); out: return ret; } @@ -1993,7 +1981,7 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, requeue_pi ? FUTEX_WRITE : FUTEX_READ); if (unlikely(ret != 0)) - goto out_put_key1; + goto out; /* * The check above which compares uaddrs is not sufficient for @@ -2001,7 +1989,7 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, */ if (requeue_pi && match_futex(&key1, &key2)) { ret = -EINVAL; - goto out_put_keys; + goto out; } hb1 = hash_futex(&key1); @@ -2022,13 +2010,11 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, ret = get_user(curval, uaddr1); if (ret) - goto out_put_keys; + goto out; if (!(flags & FLAGS_SHARED)) goto retry_private; - put_futex_key(&key2); - put_futex_key(&key1); goto retry; } if (curval != *cmpval) { @@ -2087,8 +2073,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, case -EFAULT: double_unlock_hb(hb1, hb2); hb_waiters_dec(hb2); - put_futex_key(&key2); - put_futex_key(&key1); ret = fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -2103,8 +2087,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags, */ double_unlock_hb(hb1, hb2); hb_waiters_dec(hb2); - put_futex_key(&key2); -
[PATCH 3/4] futex: Remove unused or redundant includes
Since 82af7aca ("Removal of FUTEX_FD"), some includes related to file operations aren't needed anymore. More investigation around the includes showed that a lot of includes aren't required for compilation, possible due to redundant includes. Simplify the code by removing unused includes. Signed-off-by: André Almeida --- To test this code, I compiled with different configurations (x86_64, i386, with x32 ABI supported enabled/disabled), and ran futex selftests. --- kernel/futex.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index ec07de620d1e..7bc5340396aa 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -31,31 +31,12 @@ * "The futexes are also cursed." * "But they come in a choice of three flavours!" */ -#include -#include -#include -#include -#include #include -#include -#include -#include -#include #include -#include -#include -#include -#include -#include -#include -#include -#include -#include #include #include #include #include -#include #include -- 2.26.2
[PATCH 4/4] futex: Consistently use fshared as boolean
Since fshared is meant to true/false values, declare it as bool. If the code ever reaches the code beneath again label, we are sure that the futex is shared, so we can use the true value instead of the variable. Signed-off-by: André Almeida --- kernel/futex.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 7bc5340396aa..a04f3793114f 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -457,7 +457,7 @@ static u64 get_inode_sequence_number(struct inode *inode) /** * get_futex_key() - Get parameters which are the keys for a futex * @uaddr: virtual address of the futex - * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED + * @fshared: false for a PROCESS_PRIVATE futex, true for PROCESS_SHARED * @key: address where result is stored. * @rw:mapping needs to be read/write (values: FUTEX_READ, * FUTEX_WRITE) @@ -479,7 +479,8 @@ static u64 get_inode_sequence_number(struct inode *inode) * lock_page() might sleep, the caller should not hold a spinlock. */ static int -get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_access rw) +get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key, + enum futex_access rw) { unsigned long address = (unsigned long)uaddr; struct mm_struct *mm = current->mm; @@ -516,7 +517,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a again: /* Ignore any VERIFY_READ mapping (futex common case) */ - if (unlikely(should_fail_futex(fshared))) + if (unlikely(should_fail_futex(true))) return -EFAULT; err = get_user_pages_fast(address, 1, FOLL_WRITE, &page); @@ -604,7 +605,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, enum futex_a * A RO anonymous page will never change and thus doesn't make * sense for futex operations. */ - if (unlikely(should_fail_futex(fshared)) || ro) { + if (unlikely(should_fail_futex(true)) || ro) { err = -EFAULT; goto out; } -- 2.26.2