On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Mon, Aug 15, 2022 at 10:39 PM John Naylor > <john.nay...@enterprisedb.com> wrote: > > > > bool, buth = and <=. Should be pretty close. Also, i believe if you > > left this for last as a possible refactoring, it might save some work.
v6 demonstrates why this should have been put off towards the end. (more below) > > In any case, I'll take a look at the latest patch next month. Since the CF entry said "Needs Review", I began looking at v5 again this week. Hopefully not too much has changed, but in the future I strongly recommend setting to "Waiting on Author" if a new version is forthcoming. I realize many here share updated patches at any time, but I'd like to discourage the practice especially for large patches. > I've updated the radix tree patch. It's now separated into two patches. > > 0001 patch introduces pg_lsearch8() and pg_lsearch8_ge() (we may find > better names) that are similar to the pg_lfind8() family but they > return the index of the key in the vector instead of true/false. The > patch includes regression tests. I don't want to do a full review of this just yet, but I'll just point out some problems from a quick glance. +/* + * Return the index of the first element in the vector that is greater than + * or eual to the given scalar. Return sizeof(Vector8) if there is no such + * element. That's a bizarre API to indicate non-existence. + * + * Note that this function assumes the elements in the vector are sorted. + */ That is *completely* unacceptable for a general-purpose function. +#else /* USE_NO_SIMD */ + Vector8 r = 0; + uint8 *rp = (uint8 *) &r; + + for (Size i = 0; i < sizeof(Vector8); i++) + rp[i] = (((const uint8 *) &v1)[i] == ((const uint8 *) &v2)[i]) ? 0xFF : 0; I don't think we should try to force the non-simd case to adopt the special semantics of vector comparisons. It's much easier to just use the same logic as the assert builds. +#ifdef USE_SSE2 + return (uint32) _mm_movemask_epi8(v); +#elif defined(USE_NEON) + static const uint8 mask[16] = { + 1 << 0, 1 << 1, 1 << 2, 1 << 3, + 1 << 4, 1 << 5, 1 << 6, 1 << 7, + 1 << 0, 1 << 1, 1 << 2, 1 << 3, + 1 << 4, 1 << 5, 1 << 6, 1 << 7, + }; + + uint8x16_t masked = vandq_u8(vld1q_u8(mask), (uint8x16_t) vshrq_n_s8(v, 7)); + uint8x16_t maskedhi = vextq_u8(masked, masked, 8); + + return (uint32) vaddvq_u16((uint16x8_t) vzip1q_u8(masked, maskedhi)); For Arm, we need to be careful here. This article goes into a lot of detail for this situation: https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon Here again, I'd rather put this off and focus on getting the "large details" in good enough shape so we can got towards integrating with vacuum. > In addition to two patches, I've attached the third patch. It's not > part of radix tree implementation but introduces a contrib module > bench_radix_tree, a tool for radix tree performance benchmarking. It > measures loading and lookup performance of both the radix tree and a > flat array. Excellent! This was high on my wish list. -- John Naylor EDB: http://www.enterprisedb.com