Re: Index scan is not pushed down to union all subquery
On Thu, Oct 5, 2023 at 4:25 PM Lauri Kajan wrote: > In my dreams the plan would be something like this: > Nested Loop > -> Index Scan using dealers_pkey on dealers > Index Cond: (id = ANY ('{54,12,456,315,468}'::integer[])) > -> Append > -> Bitmap Heap Scan on bikes > Recheck Cond: (dealer_name = dealers.name) > Filter: (frame_size = 52) > Rows Removed by Filter: 91 > -> Bitmap Index Scan on bikes_dealer_name_idx > Index Cond: (dealer_name = dealers.name) > -> Bitmap Heap Scan on cars > Recheck Cond: (dealer_name = dealers.name) > -> Bitmap Index Scan on cars_dealer_name_idx > Index Cond: (dealer_name = dealers.name) > OK, I'm getting pretty close. With the following query where I select the filtered column in the union all and add the where clause to the top level query I get exactly the query plan I want. EXPLAIN (ANALYZE) WITH targets as ( select 'bike' vehicle, id, dealer_name, frame_size as filter FROM bikes union all select 'car' vehicle, id, dealer_name, null as filter FROM cars -- In the real use case I have here dozens of tables ) SELECT dealers.name dealer, targets.vehicle, targets.id FROM dealers JOIN targets ON dealers.name = targets.dealer_name WHERE dealers.id in (54,12,456,887,468) and (filter is null or filter = 52) But! This is not quite usable since it is tedious to write the query when there are filters in multiple tables and all different columns must be added to all the subqueries. Regardless of that this kind of proves that the desired plan is possible to run with Postgres. So maybe this is just a missing feature in the Optimizer. -Lauri
RE: Ask about Foreign Table Plug-in on Windows Server.
Tyvvm, will try Best Regards, Anuwat Sagulmontreechai (Max) Soft Square Group of Companies Tel. 02-997-2000 ext.2305 -Original Message- From: Laurenz Albe Sent: Friday, October 6, 2023 2:43 AM To: Anuwat Sagulmontreechai ; pgsql-general@lists.postgresql.org Cc: jakari...@softsquaregroup.com; phasi...@softsquaregroup.com Subject: Re: Ask about Foreign Table Plug-in on Windows Server. On Thu, 2023-10-05 at 10:39 +, Anuwat Sagulmontreechai wrote: > Our customer install PG on Windows Server 2022 and need to use Foreign > Table to connect to SQL Server on WS2022 too, so we tried to find the > plug-in but unfortunately we found only Linux version. Could you please > advise us? There are no official releases of tds_fdw for Windows. But in this issue (https://github.com/tds-fdw/tds_fdw/issues/53) somebody claims to have build the extension on Windows. You could try the same. Yours, Laurenz Albe
GIST index and ORDER BY
Hi All, Maybe somebody could help come up with an idea on how to resolve this. The situation is as follows: 1. We have a huge (10 billion rows) table with customer operations 2. It needs to be searched by several criteria at the same time - one of them is text similarity (and we decided to try pg_trgm) 3. For the purposes of this question the important columns are: - customer account id (customer can have multiple accounts) - operation date - operation pk (autonumber) 4. We want to partition the table by range on operation date for easier archiving 5. The queries are always in the form of TOP N ordered by operation date DESC (Ie. latest operations meeting search criteria) I’ve decided to have a single “universal” index that will cover all search criteria: multicolumn GIST with btree_gist extension. (Multiple indexes don’t work well because selectivity of subsets of columns is too low) To workaround the lack of GIST support for ORDER BY and implement paging (ie. TOP n) I use the following trick: SELECT … WHERE customer_id = $cust AND operation_date = $date AND operation_id < $id AND …other criteria ORDER BY (‘2200-01-01' <-> operation_date), (MAX_BIGINT <-> operation_id) UNION ALL SELECT … WHERE customer_id = $cust AND operation_date < $date AND …other criteria ORDER BY (‘2200-01-01' <-> operation_date), (MAX_BIGINT <-> operation_id) LIMIT $limit It all works really well for a single partition - it is a straight forward index scan with limit. The problem is with partition pruning based on ORDER BY because sort criteria are different than partition criteria - so the plan ends up with scanning all partitions and merge joining them. I tried to partition by (‘2200-01-01' <-> operation_date) expression instead and it helps with partition pruning for order by but disables partition pruning based on search criteria. Artificially adding this expression to search criteria does not help because this criteria is not covered by index and triggers additional filter in index scan. I also tried to - instead of indxing operation_date - index (‘2200-01-01' <-> operation_date). But it actually does not solve the problem because sort criteria the becomes: ( MAX_INT <-> indexed_expression ) And the issue is actually the same. The only solution I could come up with is manual sorting by partition ranges using a series of UNION ALL: SELECT * FROM part_100 … ORDER BY ... UNION ALL SELECT * FROM part_99 … ORDER BY … … Does anyone have any other idea? The problem seems to be fundamentally caused by the fact that btree_gist requires different expressions for searching and ordering. Possibly worth posting it to pgsql-hackers to find out if there are fundamental reasons why GIST does not support ORDER BY _even_ for btree_gist (where you could implement the above trick internally). Thanks for help! — Michal
Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
Hello PostgreSQL Community, I am writing to propose an alternative compilation of PostgreSQL that allows for a 256-byte identifier length limit, alongside the existing 64-byte version. *Problem:* The current limit of 63 bytes can be quite restrictive, especially for databases that use multi-byte character sets like UTF-8. In such cases, a Chinese character takes up 3 bytes, limiting the name to just 21 characters. *Use-Case:* In specific use-cases, table names like "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed the current limit. *Proposed Solution:* I propose that we offer an alternative compilation of PostgreSQL that increases the NAMEDATALEN constant to allow for 256-byte identifiers. This would be particularly useful for databases that make extensive use of multi-byte character sets like UTF-8. *Dual Compilation:* To maintain backward compatibility and to offer more flexibility, I suggest compiling this 256-byte version alongside the existing 64-byte version. Users could then choose which version best suits their needs. *Backward Compatibility:* I understand that this change could affect backward compatibility. Offering this as an alternative compilation could be a way to mitigate those concerns. I am open to suggestions on how this could be implemented to minimize disruptions. I look forward to your feedback and hope we can make PostgreSQL even more adaptable to various needs. Best regards, Chuxiong
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
On Tue, 2023-10-10 at 11:49 +0800, David HJ wrote: > I am writing to propose an alternative compilation of PostgreSQL that allows > for a 256-byte identifier length limit, alongside the existing 64-byte > version. > > Problem: > The current limit of 63 bytes can be quite restrictive, especially for > databases > that use multi-byte character sets like UTF-8. In such cases, a Chinese > character > takes up 3 bytes, limiting the name to just 21 characters. > > Use-Case: > In specific use-cases, table names like > "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and > "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed > the current limit. I have little understanding for people who name their tables energy_consumption_statistics_oil_and_natural_gas_usage_in_major_countries_around_the_world_annual_carbon_emissions_and_renewable_energy_ratio_table. The table could just as well be named "能源统计排放比". The exact description could go in the table comment. Apart from that, it is a good idea to use table names that are standard SQL identifiers, so that you don't have to double quote them all the time. Anyway, you are not the first person to hit the limit, so there is clearly a pain that some people feel. > Proposed Solution: > I propose that we offer an alternative compilation of PostgreSQL that > increases > the NAMEDATALEN constant to allow for 256-byte identifiers. This would be > particularly useful for databases that make extensive use of multi-byte > character > sets like UTF-8. Wouldn't it be a good solution if we promote the #define to a configure option, like "./configure --identifier-length-limit=256"? Yours, Laurenz Albe
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
On Tue, Oct 10, 2023 at 08:22:55AM +0200, Laurenz Albe wrote: > On Tue, 2023-10-10 at 11:49 +0800, David HJ wrote: > > I am writing to propose an alternative compilation of PostgreSQL that allows > > for a 256-byte identifier length limit, alongside the existing 64-byte > > version. > > > > Problem: > > The current limit of 63 bytes can be quite restrictive, especially for > > databases > > that use multi-byte character sets like UTF-8. In such cases, a Chinese > > character > > takes up 3 bytes, limiting the name to just 21 characters. > > Anyway, you are not the first person to hit the limit, so there is clearly a > pain that some people feel. > > > Proposed Solution: > > I propose that we offer an alternative compilation of PostgreSQL that > > increases > > the NAMEDATALEN constant to allow for 256-byte identifiers. This would be > > particularly useful for databases that make extensive use of multi-byte > > character > > sets like UTF-8. > > Wouldn't it be a good solution if we promote the #define to a configure > option, > like "./configure --identifier-length-limit=256"? Note that there was some thread recently [1] where the possibility of having some kind of compilation matrix to generate multiple set of binaries with various compile-time values was discussed, so I guess it could fit well with that approach. [1] https://www.postgresql.org/message-id/20230630211153.kbysulcjedxa5...@awork3.anarazel.de and following messages
Re: Proposal to Compile a 256-Byte Identifier Length Version Alongside the Current 64-Byte Version
> Hello PostgreSQL Community, > > I am writing to propose an alternative compilation of PostgreSQL that > allows for a 256-byte identifier length limit, alongside the existing > 64-byte version. > > *Problem:* > The current limit of 63 bytes can be quite restrictive, especially for > databases that use multi-byte character sets like UTF-8. In such cases, a > Chinese character takes up 3 bytes, limiting the name to just 21 characters. > > *Use-Case:* > In specific use-cases, table names like > "能源消耗统计_全球主要国家石油与天然气使用量_年度碳排放与可再生能源比例表" and > "气候变化研究_全球主要地区极端天气事件频率_年度灾害损失与应对措施表" can far exceed the current limit. > > *Proposed Solution:* > I propose that we offer an alternative compilation of PostgreSQL that > increases the NAMEDATALEN constant to allow for 256-byte identifiers. This > would be particularly useful for databases that make extensive use of > multi-byte character sets like UTF-8. Another solution would be, letting the meaning of NAMEDATALEN to be number of *characters*, not the number of bytes. This way, you can use up to 64 UTF-8 characters. In my understanding MySQL already does this way. I know this requires non trivial code modifications to PostgreSQL but would be better than to make binaries with random NAMEDATALEN values. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp