Hi, Tom (cc:'ed) recently pointed out [1] that adt/varlena.c uses common logic for sorting string types and bytea. There are several problems with this.
Firstly, this is difficult to reason about. For instance, you have to keep in mind that when "C" locale is used you might be sorting not strings but rather bytea for which it is legal to have internal NUL bytes. Secondly, this is error-prone. Changing logic for string types may affect bytea logic and vice versa. Lastly, the performance and memory consumption could be optimized for a bytea case. The win is arguably small if noticeable at all, but still. Attached is a PoC patch that fixes this. There are some TODOs and FIXMEs but all in all it works and passes the tests. The code becomes longer but the new code is simple and it's easier to understand. If we agree on this refactoring we could decompose adt/varlena.c into varlena.c and bytea.c - also something proposed by Tom. IMO it would be a good move but this is not implemented in the patch. Thoughts? [1]: https://postgr.es/m/1502394.1725398354%40sss.pgh.pa.us -- Best regards, Aleksander Alekseev
v1-0001-Refactor-bytea_sortsupport.patch
Description: Binary data