> > > >> The planner should recognize this situation and avoid use of hash >> join in such cases, but maybe the statistics aren't reflecting the >> problem, or maybe there's something wrong with the logic specific >> to parallel hash join. You've not really provided enough information >> to diagnose why the poor choice of plan. >> >> regards, tom lane >> > > Thanks for looking into this. I'm not sure what information would be > needed to look at the choice of plan. > The statistics for the join conditions in the query would be: > join_condition | min_count | max_count | avg_count > ----------------+-----------+-----------+---------------------------- > snd_tro | 0 | 0 | 0.000000000000000000000000 > rpl_rec_tro | 0 | 2 | 0.99869222814474470477 > rec_tro | 0 | 2 | 0.99869222814474470477 > rpl_snd_tro | 0 | 0 | 0.000000000000000000000000 > r | 0 | 1 | 0.49850916663490161653 > > > The relevant columns for the tables are: > postgres=# \d inputrequest > Table "public.inputrequest" > Column | Type | Collation | > Nullable | Default > > --------------------------+-----------------------------+-----------+----------+--------- > input_sequence | bigint | | not > null | > msg_type | character varying(8) | | not > null | > msg_content | text | | not > null | > msg_reference | character varying(35) | | > | > originalrequest_id | bigint | | > | > receive_time | timestamp without time zone | | not > null | > related_output_sequence | bigint | | > | > msg_status | character varying(15) | | > | > > Indexes: > "inputrequest_pkey" PRIMARY KEY, btree (input_sequence) > "inputrequest_originalrequest_id_idx" btree (originalrequest_id) > > postgres=# \d outputrequest > Table "public.outputrequest" > Column | Type | Collation | > Nullable | Default > > ------------------------+-----------------------------+-----------+----------+--------- > output_sequence | bigint | | not > null | > input_sequence | bigint | | > | > msg_type | character varying(8) | | > | > msg_content | text | | not > null | > msg_reference | character varying(35) | | > | > reply_input_sequence | bigint | | > | > status | integer | | not > null | > related_input_sequence | bigint | | > | > Indexes: > "outputrequest_pkey" PRIMARY KEY, btree (output_sequence) > "outputrequest_input_sequence_idx" btree (input_sequence) > "outputrequest_reply_input_sequence_idx" btree (reply_input_sequence) > > I wonder if our choice of primary keys (input_sequence and output_sequence) has something to do with the skew in the hash bucket distribution. We use the following format: yyyymmdd????????xx , where ???????? is more or less a sequence and xx is the node generating the id, i.e. 01,02,etc (with only one or two values in the dataset).
I wonder if it would be difficult to have an upper limit on the private memory that can be allocated by one process (or all processes similar to Oracle's pga_aggregate_limit). I would rather have one query failing with an error message instead of postgres eating up all memory and swap on the server. Best regards, Radu