Re: NAMEDATALEN increase because of non-latin languages

2022-08-10 Thread John Naylor
I wrote: > The syscache use of GETSTRUCT still uses a simple cast of the tuple (for > pg_cast those calls live in parse_coerce.c, which is unchanged from master in > v3). Next step I think is to see about the syscache piece -- teaching a > syscache miss to deform the entire tuple into a struct

Re: NAMEDATALEN increase because of non-latin languages

2022-07-22 Thread John Naylor
On Tue, Jul 19, 2022 at 10:57 PM Andres Freund wrote: > > Hi, > > On 2022-07-19 14:30:34 +0700, John Naylor wrote: > > I'm thinking where the first few attributes are fixed length, not null, and > > (because of AIX) not double-aligned, we can do a single memcpy on multiple > > columns at once. Tha

Re: NAMEDATALEN increase because of non-latin languages

2022-07-19 Thread Andres Freund
Hi, On 2022-07-19 14:30:34 +0700, John Naylor wrote: > I wrote: > > > On Mon, Jul 18, 2022 at 9:58 AM Andres Freund wrote: > > > Hm. Wouldn't it make sense to just use the normal tuple deforming > routines and > > > then map the results to the structs? > > > > I wasn't sure if they'd be suitable

Re: NAMEDATALEN increase because of non-latin languages

2022-07-19 Thread John Naylor
I wrote: > On Mon, Jul 18, 2022 at 9:58 AM Andres Freund wrote: > > Hm. Wouldn't it make sense to just use the normal tuple deforming routines and > > then map the results to the structs? > > I wasn't sure if they'd be suitable for this, but if they are, that'd make this easier and more maintaina

Re: NAMEDATALEN increase because of non-latin languages

2022-07-18 Thread John Naylor
On Mon, Jul 18, 2022 at 9:58 AM Andres Freund wrote: > > 0001 is just boilerplate, same as v1 > > If we were to go for this, I wonder if we should backpatch the cast containing > version of GESTRUCT for less pain backpatching bugfixes. That'd likely require > using a different name for the cast c

Re: NAMEDATALEN increase because of non-latin languages

2022-07-17 Thread Andres Freund
Hi, On 2022-07-18 09:46:44 +0700, John Naylor wrote: > I've made a small step in this direction. Thanks for working on this! > 0001 is just boilerplate, same as v1 If we were to go for this, I wonder if we should backpatch the cast containing version of GESTRUCT for less pain backpatching bugf

Re: NAMEDATALEN increase because of non-latin languages

2022-06-27 Thread Julien Rouhaud
On Sat, Jun 25, 2022 at 08:00:04PM -0700, Andres Freund wrote: > > On 2022-06-26 10:48:24 +0800, Julien Rouhaud wrote: > > Anyway, per the nearby discussions I don't see much interest, especially > > not in > > the context of varlena identifiers (I should have started a different > > thread, > >

Re: NAMEDATALEN increase because of non-latin languages

2022-06-25 Thread Andres Freund
Hi, On 2022-06-26 10:48:24 +0800, Julien Rouhaud wrote: > Anyway, per the nearby discussions I don't see much interest, especially not > in > the context of varlena identifiers (I should have started a different thread, > sorry about that), so I don't think it's worth investing more efforts into

Re: NAMEDATALEN increase because of non-latin languages

2022-06-25 Thread Julien Rouhaud
On Thu, Jun 23, 2022 at 10:19:44AM -0400, Robert Haas wrote: > On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud wrote: > > > And should record_in / record_out use the logical position, as in: > > SELECT ab::text FROM ab / SELECT (a, b)::ab; > > > > I would think not, as relying on a possibly dynamic

Re: NAMEDATALEN increase because of non-latin languages

2022-06-24 Thread Tom Lane
Robert Haas writes: > I don't know whether we can or should move all the "name" columns to > the end of the catalog. It would be user-visible and probably not > user-desirable, I'm a strong -1 on changing that if we're not absolutely forced to. > but it would save something in terms of tuple > d

Re: NAMEDATALEN increase because of non-latin languages

2022-06-24 Thread Robert Haas
On Thu, Jun 23, 2022 at 11:11 PM John Naylor wrote: > Hmm, I must have misunderstood this aspect. In my mind I was thinking > that if a varlen attribute were at the end, these functions would make > it easier to access them quickly. But from this and the follow-on > responses, these would be used

Re: NAMEDATALEN increase because of non-latin languages

2022-06-24 Thread Robert Haas
On Thu, Jun 23, 2022 at 6:43 PM Tom Lane wrote: > Nonetheless, the presence of GETSTRUCT calls should be a good guide > to where we need to do something. Indubitably. -- Robert Haas EDB: http://www.enterprisedb.com

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread John Naylor
On Thu, Jun 23, 2022 at 9:17 PM Andres Freund wrote: > > Hi, > > On 2022-06-03 13:28:16 +0700, John Naylor wrote: > > 1. That would require putting the name physically closer to the end of > > the column list. To make this less annoying for users, we'd need to > > separate physical order from disp

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Tom Lane
Robert Haas writes: > On Thu, Jun 23, 2022 at 5:49 PM Andres Freund wrote: >> I was thinking we'd basically do it wherever we do a GETSTRUCT() today. > That seems a little fraught, because you'd be turning what's now > basically a trivial operation into a non-trivial operation involving > memory

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas
On Thu, Jun 23, 2022 at 5:49 PM Andres Freund wrote: > I was thinking we'd basically do it wherever we do a GETSTRUCT() today. > > A first step could be to transform code like > (Form_pg_attribute) GETSTRUCT(tuple) > into >GETSTRUCT(pg_attribute, tuple) > > then, in a subsequent step, we'd

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Andres Freund
Hi, On 2022-06-23 14:42:17 -0400, Robert Haas wrote: > On Thu, Jun 23, 2022 at 2:07 PM Tom Lane wrote: > > The extra cost of the deforming step could also be repaid, in some > > cases, by not having to use SysCacheGetAttr etc later on to fetch > > variable-length fields. That is, I'm imagining t

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas
On Thu, Jun 23, 2022 at 2:07 PM Tom Lane wrote: > Sounds worth investigating, anyway. It'd also get us out from under > C-struct-related problems such as the nearby AIX alignment issue. Yeah. > The extra cost of the deforming step could also be repaid, in some > cases, by not having to use SysC

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Tom Lane
Robert Haas writes: > On Thu, Jun 23, 2022 at 10:17 AM Andres Freund wrote: >> FWIW, I don't agree that this is a reasonable way to tackle changing >> NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small >> fraction of the problem of making Names variable length. You'll still h

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas
On Thu, Jun 23, 2022 at 10:17 AM Andres Freund wrote: > > This would require: > > > > - changing star expansion in SELECTs (expandRTE etc) > > - adjusting pg_dump, \d, etc > > > > That much seems clear and agreed upon. > > FWIW, I don't agree that this is a reasonable way to tackle changing > NAME

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas
On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud wrote: > While some problem wouldn't happen if we restricted the feature to system > catalogs only (e.g. with renamed / dropped attributes, inheritance...), a lot > would still exist and would have to be dealt with initially. However I'm not > sure w

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Andres Freund
Hi, On 2022-06-03 13:28:16 +0700, John Naylor wrote: > 1. That would require putting the name physically closer to the end of > the column list. To make this less annoying for users, we'd need to > separate physical order from display order (at least/at first only for > system catalogs). FWIW, it

Re: NAMEDATALEN increase because of non-latin languages

2022-06-02 Thread John Naylor
Hi, I wanted to revive this thread to summarize what was discussed and get a sense of next steps we could take. The idea that gained the most traction is to make identifiers variable-length in the catalogs, which has the added benefit of reducing memory in syscaches in the common case. That prese

Re: NAMEDATALEN increase because of non-latin languages

2021-08-20 Thread Matthias van de Meent
On Thu, 19 Aug 2021 at 14:58, Andres Freund wrote: > > Hi, > > On 2021-08-19 14:47:42 +0200, Matthias van de Meent wrote: > > I tried to implement this 'compact attribute access descriptor' a few > > months ago in my effort to improve btree index performance. > > cool > > > > The patch allocates a

Re: NAMEDATALEN increase because of non-latin languages

2021-08-19 Thread Andres Freund
Hi, On 2021-08-19 14:47:42 +0200, Matthias van de Meent wrote: > I tried to implement this 'compact attribute access descriptor' a few > months ago in my effort to improve btree index performance. cool > The patch allocates an array of 'TupleAttrAlignData'-structs at the > end of the attrs-arra

Re: NAMEDATALEN increase because of non-latin languages

2021-08-19 Thread Matthias van de Meent
On Thu, 19 Aug 2021 at 13:44, Andres Freund wrote: > > > Another fun thing --- and, I fear, another good argument against just > > raising NAMEDATALEN --- is what about TupleDescs, which last I checked > > used an array of fixed-width pg_attribute images. But maybe we could > > replace that with

Re: NAMEDATALEN increase because of non-latin languages

2021-08-19 Thread Andres Freund
Hi, On 2021-08-18 10:21:03 -0400, Tom Lane wrote: > Anyway, this whole argument could be rendered moot if we could convert > name to a variable-length type. That would satisfy *both* sides of > the argument, since those who need long names could have them, while > those who don't would see net re

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Julien Rouhaud
On Thu, Aug 19, 2021 at 12:12 AM Tom Lane wrote: > > Yeah, exactly: conceptually that's simple, but flushing all the bugs > out would be a years-long nightmare. It'd make all the fun we had > with missed attisdropped checks look like a walk in the park. Unless > somebody can figure out a way to

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Andrew Dunstan
On 8/18/21 12:39 PM, Alvaro Herrera wrote: > On 2021-Aug-18, Tom Lane wrote: > >> I wonder though if we could fix the immediate problem with something >> less ambitious. The hard part of the full proposal, I think, is >> separating permanent identity from physical position. If we were to >> spl

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Alvaro Herrera
On 2021-Aug-18, Tom Lane wrote: > I wonder though if we could fix the immediate problem with something > less ambitious. The hard part of the full proposal, I think, is > separating permanent identity from physical position. If we were to > split out *only* the display order from that, the patch

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Tom Lane
Andrew Dunstan writes: > On 8/18/21 10:53 AM, Tom Lane wrote: >> Yeah, it would annoy the heck out of me too. Again there's a potential >> technical solution, which is to decouple the user-visible column order >> from the storage order. However, multiple people have tilted at that >> windmill wi

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Andrew Dunstan
On 8/18/21 10:53 AM, Tom Lane wrote: > Julien Rouhaud writes: >> On Wed, Aug 18, 2021 at 10:21 PM Tom Lane wrote: >>> I wonder if we'd get complaints from changing the catalog column layouts >>> that much. People are used to the name at the front, I think. OTOH, >>> I expected a lot of bleati

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Tom Lane
Julien Rouhaud writes: > On Wed, Aug 18, 2021 at 10:21 PM Tom Lane wrote: >> I wonder if we'd get complaints from changing the catalog column layouts >> that much. People are used to the name at the front, I think. OTOH, >> I expected a lot of bleating about the OID column becoming frontmost, >

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Peter Eisentraut
On 18.08.21 13:33, Julien Rouhaud wrote: Agreed, but I don't have access to such hardware. However this won't influence the memory overhead part, and there is already frequent problems with that, especially since declarative partitioning, On the flip side, with partitioning you need room for l

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Julien Rouhaud
On Wed, Aug 18, 2021 at 10:21 PM Tom Lane wrote: > > Anyway, this whole argument could be rendered moot if we could convert > name to a variable-length type. That would satisfy *both* sides of > the argument, since those who need long names could have them, while > those who don't would see net r

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Tom Lane
John Naylor writes: > The main thing I'm worried about is the fact that a name would no longer > fit in a Datum. The rest I think we can mitigate in some way. Not sure what you mean by that? name is a pass-by-ref data type. Anyway, this whole argument could be rendered moot if we could convert

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Ranier Vilela
Em qua., 18 de ago. de 2021 às 09:33, Laurenz Albe escreveu: > On Wed, 2021-08-18 at 08:16 -0300, Ranier Vilela wrote: > > Em qua., 18 de ago. de 2021 às 08:08, Денис Романенко < > deromane...@gmail.com> escreveu: > > > Hello dear hackers. I understand the position of the developers > community a

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Laurenz Albe
On Wed, 2021-08-18 at 08:16 -0300, Ranier Vilela wrote: > Em qua., 18 de ago. de 2021 às 08:08, Денис Романенко > escreveu: > > Hello dear hackers. I understand the position of the developers community > > about > > NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but only > >

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Julien Rouhaud
On Wed, Aug 18, 2021 at 8:03 PM Hannu Krosing wrote: > > Also - have we checked that at least the truncation does not cut utf-8 > characters in half ? Yes, same for all other places that can truncate text (like the query text in pg_stat_activity and similar). See usage of pg_mbcliplen() in trunc

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Julien Rouhaud
On Wed, Aug 18, 2021 at 8:04 PM John Naylor wrote: > > > > > Agreed, but I don't have access to such hardware. However this won't > > Well, by "recent" I had in mind something more recent than 2002, which is the > time where I see a lot of hits in the archives if you search for this topic. Yeah

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread John Naylor
On Wed, Aug 18, 2021 at 8:03 AM Hannu Krosing wrote: > > Could we just make the limitation to be 64 (or 128) _characters_ not _bytes_ ? That couldn't work because characters are variable length. The limit has to be a fixed length in bytes so we can quickly compute offsets in the attribute tuple.

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread John Naylor
On Wed, Aug 18, 2021 at 7:33 AM Julien Rouhaud wrote: > > Some actual numbers on recent hardware would show what kind of tradeoff is involved. No one has done that for a long time that I recall. > > Agreed, but I don't have access to such hardware. However this won't Well, by "recent" I had in m

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Hannu Krosing
Could we just make the limitation to be 64 (or 128) _characters_ not _bytes_ ? Memory sizes and processor speeds have grown by order(s) of magnitude since the 64 byte limit was decided and supporting non-ASCII charsets properly seems like a prudent thing to do. Also - have we checked that at leas

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Денис Романенко
I don't very close with PG testing methodology, but I can pay for a server (virtual or dedicated, DO maybe) and give access to it, if anyone has time for that. Or if someone describes to me steps and shows where to look - I can do it by myself.

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Julien Rouhaud
On Wed, Aug 18, 2021 at 7:27 PM John Naylor wrote: > > On Wed, Aug 18, 2021 at 7:15 AM Julien Rouhaud wrote: > > > > Unfortunately, the problem isn't really the additional disk space it > > would require. The problem is the additional performance hit and > > memory overhead, as the catalog names

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread John Naylor
On Wed, Aug 18, 2021 at 7:15 AM Julien Rouhaud wrote: > > Unfortunately, the problem isn't really the additional disk space it > would require. The problem is the additional performance hit and > memory overhead, as the catalog names are part of the internal > syscache. Some actual numbers on re

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Ranier Vilela
Em qua., 18 de ago. de 2021 às 08:08, Денис Романенко escreveu: > Hello dear hackers. I understand the position of the developers community > about NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but > only if we speak about latin languages. > > Postgresql has wonderful support

Re: NAMEDATALEN increase because of non-latin languages

2021-08-18 Thread Julien Rouhaud
On Wed, Aug 18, 2021 at 7:08 PM Денис Романенко wrote: > > Hello dear hackers. I understand the position of the developers community > about NAMEDATALEN length - and, in fact, 63 bytes is more than enough - but > only if we speak about latin languages. > > Postgresql has wonderful support for un