Re: [PoC] Improve dead tuple storage for lazy vacuum

John Naylor Fri, 16 Sep 2022 00:54:39 -0700

On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada <[email protected]> wrote:
>
> On Mon, Aug 15, 2022 at 10:39 PM John Naylor
> <[email protected]> wrote:
> >
> > bool, buth = and <=. Should be pretty close. Also, i believe if you
> > left this for last as a possible refactoring, it might save some work.


v6 demonstrates why this should have been put off towards the end. (more below)

> > In any case, I'll take a look at the latest patch next month.

Since the CF entry said "Needs Review", I began looking at v5 again
this week. Hopefully not too much has changed, but in the future I
strongly recommend setting to "Waiting on Author" if a new version is
forthcoming. I realize many here share updated patches at any time,
but I'd like to discourage the practice especially for large patches.

> I've updated the radix tree patch. It's now separated into two patches.
>
> 0001 patch introduces pg_lsearch8() and pg_lsearch8_ge() (we may find
> better names) that are similar to the pg_lfind8() family but they
> return the index of the key in the vector instead of true/false. The
> patch includes regression tests.

I don't want to do a full review of this just yet, but I'll just point
out some problems from a quick glance.

+/*
+ * Return the index of the first element in the vector that is greater than
+ * or eual to the given scalar. Return sizeof(Vector8) if there is no such
+ * element.

That's a bizarre API to indicate non-existence.

+ *
+ * Note that this function assumes the elements in the vector are sorted.
+ */

That is *completely* unacceptable for a general-purpose function.

+#else /* USE_NO_SIMD */
+ Vector8 r = 0;
+ uint8 *rp = (uint8 *) &r;
+
+ for (Size i = 0; i < sizeof(Vector8); i++)
+ rp[i] = (((const uint8 *) &v1)[i] == ((const uint8 *) &v2)[i]) ? 0xFF : 0;

I don't think we should try to force the non-simd case to adopt the
special semantics of vector comparisons. It's much easier to just use
the same logic as the assert builds.

+#ifdef USE_SSE2
+ return (uint32) _mm_movemask_epi8(v);
+#elif defined(USE_NEON)
+ static const uint8 mask[16] = {
+        1 << 0, 1 << 1, 1 << 2, 1 << 3,
+        1 << 4, 1 << 5, 1 << 6, 1 << 7,
+        1 << 0, 1 << 1, 1 << 2, 1 << 3,
+        1 << 4, 1 << 5, 1 << 6, 1 << 7,
+      };
+
+    uint8x16_t masked = vandq_u8(vld1q_u8(mask), (uint8x16_t)
vshrq_n_s8(v, 7));
+    uint8x16_t maskedhi = vextq_u8(masked, masked, 8);
+
+    return (uint32) vaddvq_u16((uint16x8_t) vzip1q_u8(masked, maskedhi));

For Arm, we need to be careful here. This article goes into a lot of
detail for this situation:

https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon

Here again, I'd rather put this off and focus on getting the "large
details" in good enough shape so we can got towards integrating with
vacuum.

> In addition to two patches, I've attached the third patch. It's not
> part of radix tree implementation but introduces a contrib module
> bench_radix_tree, a tool for radix tree performance benchmarking. It
> measures loading and lookup performance of both the radix tree and a
> flat array.

Excellent! This was high on my wish list.

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to