On Wed, Jan 29, 2020 at 3:13 AM Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: > But I heard only two arguments: > > 1. Concurrent building of indexes by all backends may consume much memory > (n_backends * maintenance_work_mem) and consume a lot of disk/CPU resources. > 2. GTT in one session can contains large amount of data and we need index for > it, but small amount of data in another session and we do not need index for > it.
You seem to be ignoring the fact that two committers told you this probably wasn't safe. Perhaps your view is that those people made no argument, and therefore you don't have to respond to it. But the onus is not on somebody else to tell you why a completely novel idea is not safe. The onus is on you to analyze it in detail and prove that it is safe. What you need to show is that there is no code anywhere in the system which will be confused by an index springing into existence at whatever time you're creating it. One problem is that there are various backend-local data structures in the relcache, the planner, and the executor that remember information about indexes, and that may not respond well to having more indexes show up unexpectedly. On the one hand, they might crash; on the other hand, they might ignore the new index when they shouldn't. Another problem is that the code which creates indexes might fail or misbehave when run in an environment different from the one in which it currently runs. I haven't really studied your code, so I don't know exactly what it does, but for example it would be really bad to try to build an index while holding a buffer lock, both because it might cause (low-probability) undetected deadlocks and also because it might block another process that wants that buffer lock in a non-interruptible wait state for a long time. Now, maybe you can make an argument that you only create indexes at points in the query that are "safe." But I am skeptical, because of this example: rhaas=# create table foo (a int primary key, b text, c text, d text); CREATE TABLE rhaas=# create function blump() returns trigger as $$begin create index on foo (b); return new; end$$ language plpgsql; CREATE FUNCTION rhaas=# create trigger thud before insert on foo execute function blump(); CREATE TRIGGER rhaas=# insert into foo (a) select generate_series(1,10); ERROR: cannot CREATE INDEX "foo" because it is being used by active queries in this session CONTEXT: SQL statement "create index on foo (b)" PL/pgSQL function blump() line 1 at SQL statement That prohibition is there for some reason. Someone did not just decide to arbitrarily prohibit it. A CREATE INDEX command run in that context won't run afoul of many of the things that might be problems in other places -- e.g. there won't be a buffer lock held. Yet, despite the fact that a trigger context is safe for executing a wide variety of user-defined code, this particular operation is not allowed here. That is the sort of thing that should worry you. At any rate, even if this somehow were or could be made safe, on-the-fly index creation is a feature that cannot and should not be combined with a patch to implement global temporary tables. Surely, it will require a lot of study and work to get the details right. And so will GTT. As I said in the other email I wrote, this feature is hard enough without adding this kind of thing to it. There's a reason why I never got around to implementing this ten years ago when I did unlogged tables; I was intending that to be a precursor to the GTT work. I found that it was too hard and I gave up. I'm glad to see people trying again, but the idea that we can afford to add in extra features, or frankly that either of the dueling patches on this thread are close to committable, is just plain wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company