Re: [GENERAL] Optimizing queries that use multiple tables and many order by columns
On 2010-08-25, Joshua Berry wrote: > --Here's what explain analyze says for the query > explain analyze > declare "SQL_CUR0453D910" cursor with hold for > select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, > JOB.CompanyCode, Anl.SampleName > from analysis anl join job on anl.job = job.job > order by job.companycode, anl.job, anl.lab; > > Sort (cost=38047.92..38495.65 rows=179095 width=32) (actual > time=1890.796..2271.248 rows=178979 loops=1) > Sort Key: job.companycode, anl.job, anl.lab > Sort Method: external merge Disk: 8416kB > -> Hash Join (cost=451.20..18134.05 rows=179095 width=32) > (actual time=8.239..260.848 rows=178979 loops=1) > Hash Cond: (anl.job = job.job) -> Seq Scan on analysis anl > (cost=0.00..14100.95 rows=179095 width=23) (actual > time=0.026..91.602 rows=178979 loops=1) -> Hash > (cost=287.20..287.20 rows=13120 width=17) > (actual time=8.197..8.197 rows=13120 loops=1) > -> Seq Scan on job (cost=0.00..287.20 > rows=13120 width=17) (actual time=0.007..4.166 rows=13120 loops=1) > Total runtime: 2286.224 ms > > > > Maybe, the planner decides for a Sort Join, if there > are sorted indexes > > for anl.job and job.job. But the speed-up may vary > depending on the > data. > > > > It seems to be reading the entire dataset, then sorting, > right? There's not much more that could be done to improve > such queries, aside from increasing memory and IO bandwidth. > It has to, because it has to start with the smallest row in the resultset, which may be the last one read, if it cannot read the tuples in sorted order. > But now that I've said that, there's the following query that > deals with exactly the same set of data, but the ordering > involves only one of the two joined tables. > > explain analyze > declare "SQL_CUR0453D910" cursor with hold for > select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, > JOB.CompanyCode, Anl.SampleName > from analysis anl join job on anl.job = job.job > order by job.companycode --, anl.job, anl.lab; --Only order > by indexed columns from job. > > Nested Loop (cost=0.00..65305.66 rows=179095 width=32) > (actual time=0.084..288.976 rows=178979 loops=1) > -> Index Scan using job_companycode on job > (cost=0.00..972.67 rows=13120 width=17) (actual > time=0.045..7.328 rows=13120 loops=1) > -> Index Scan using analysis_job_lab on analysis anl > (cost=0.00..4.63 rows=22 width=23) (actual time=0.006..0.015 > rows=14 loops=13120) > Index Cond: (anl.job = job.job) > Total runtime: 303.230 ms > > If I order by columns from the other table, analysis only, I > get the follow query and results: > explain analyze > declare "SQL_CUR0453D910" cursor with hold for > select Anl.Priority, Anl.Lab, Anl.Job, JOB.DateIn, > JOB.CompanyCode, Anl.SampleName > from analysis anl join job on anl.job = job.job > order by --job.companycode, > anl.job, anl.lab; --Only order by indexed columns from analysis. > > Merge Join (cost=0.56..44872.45 rows=179095 width=32) > (actual time=0.078..368.620 rows=178979 loops=1) > Merge Cond: (anl.job = job.job) > -> Index Scan using analysis_job_lab on analysis anl > (cost=0.00..35245.47 rows=179095 width=23) (actual > time=0.035..128.460 rows=178979 loops=1) > -> Index Scan using job_job_pk on job (cost=0.00..508.53 > rows=13120 width=17) (actual time=0.039..53.733 rows=179005 loops=1) > Total runtime: 388.884 ms > > > Notice that in these cases the query completes in <400 ms and > the other query that involves ordering on columns from both > of the joined tables completes in >2300ms. > Because, these queries don't need to sort the result, they can read it in order. What I don't really get is, that you compare queries with different sort orders, especially with a different number of sort keys. Of course, that has a big influence on the time needed for sorting. While your first query, which sorts on three keys really does a sort on the result, the latter two don't need that, because they can read the tuples in the correct order from the indexes. If I read the first plan correctly, that sort costs you about 2 sec in query execution time, because the Hash Join is done after 260ms. Do you really have the requirement to sort anything? Or let me ask it the other way round: Assuming you have too much data, to sort it on the application side, which user can read all this from one single table in the user interface? > In the application here, these queries are used by a client > application to fill a window's listbox that can be scrolled > up or down. If the user changes direction of the scroll, it > initiates a new cursor and query to fetch a page of results. > If the scrolling motion is in the same direction, it simply > continues to fetch more results from the cursor. But each > time the direction of movement changes, there can be a > significant lag. > Then, obviously you shouldn't create a new cursor. You can create backwar
[GENERAL] Using FULLTEXT search with different weights for various fields
I'm trying to make some improvements to my search results by taking advantage of Postgres' setweight function, but am having a lot of problems getting a query to run.. Here's the query that I run now (I've removed some parts that just make it more complicated than you need to be bothered with).. SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating FROM Recipes R WHERE (to_tsvector('english', title || ' ' || coalesce(description, '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery(:search)) ORDER BY R.Rating DESC LIMIT 100; :search will be something the user types in, such as "cookies eggs". This runs well, since I have an index on that vector expression. However, I'd like to do the following: title should have a weight of A. description should have a weight of B. steps should have a weight of C. I've tried a few things based on the documentation at http://www.postgresql.org/docs/8.4/static/textsearch-controls.html, but am not having a lot of luck. Here's what I've come up with some far: SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, setweight(to_tsvector(title), 'A') || setweight(to_tsvector(coalesce(description, '')), 'B') || setweight(to_tsvector(coalesce(steps, '')), 'C') as vector, ts_rank_cd(vector, query) as rank FROM Recipes R, plainto_tsquery('cookies eggs') query WHERE vector @@ query ORDER BY rank desc LIMIT 100; This doesn't work due to the error: column 'vector' does not exist, which I guess is a valid point the way the query is parsed. The examples basically assume you have a column called textsearch which is a tsvector, and you use a trigger or something to update it. I'm trying to avoid modifying my table schema for now. Is there a way to express this query without pre-computing the tsvector on the table? Also, is having a tsvector in the table basically the standard approach and something I should just get used to doing? Maybe I can use a view that computers the tsvector and index that? Sorry, this is probably a totally brain dead fulltext question, but I'm new to this whole concept. I make pretty web pages, and am not as smart as the people on this list. Mike -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] IBATIS support for postgres cursors
Hi Guy, Thanks for your reply. We were able to work out with type OTHER. Many thanks for support. Regards, Atul Goel -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Guy Rouillier Sent: 25 August 2010 08:03 To: pgsql-general@postgresql.org Subject: Re: [GENERAL] IBATIS support for postgres cursors On 8/23/2010 12:11 PM, atul.g...@globaldatapoint.com wrote: > Hi, > > Can cursors be returned to IBATIS. If so how. This is what I have written. > > > > javaType=/"java.sql.ResultSet"/ mode=/"OUT"/ resultMap=/"allUsersResult"//> > > javaType=/"java.lang.String"/ mode=/"OUT"//> > > > > This does not work. Please help. There is a dedicated mailing list for MyBatis (renamed from iBATIS) over at http://www.mybatis.org. This question would probably be better addressed there. I use both PG and MyBatis so I saw your question. REF is not an acceptable MyBatis jdbcType. I'm trying with type OTHER as documented here: http://jdbc.postgresql.org/documentation/83/callproc.html#callproc-resultset-setof But I'm running into an issue which I'll pursue on the MyBatis mailing list. Note that with the current refcursor implementation, there is no advantage over returning SETOF from your function. Both materialize the entire result set before returning to the caller. So, if you can't get refcursor to work, I'd suggest switching to SETOF. I'll get back to you when I find out why OTHER is not working. Or else you can join the MyBatis mailing list. -- Guy Rouillier -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global DataPoint Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means of e-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using FULLTEXT search with different weights for various fields
The simple answer is this: SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, setweight(to_tsvector(title), 'A') || setweight(to_tsvector(coalesce(description, '')), 'B') || setweight(to_tsvector(coalesce(steps, '')), 'C') as vector, ts_rank_cd( setweight(to_tsvector(title), 'A') || setweight(to_tsvector(coalesce(description, '')), 'B') || setweight(to_tsvector(coalesce(steps, '')), 'C') , query ) as rank FROM Recipes R, plainto_tsquery('cookies eggs') query WHERE vector @@ query ORDER BY rank desc LIMIT 100; In the end I declared a function. Then you get: CREATE INDEX ... ON f(title, description, steps); SELECT C R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, f(title, description, steps) as vector, ts_rank_cd(f(title, description, steps), query) as rank FROM Recipes R, plainto_tsquery('cookies eggs') query WHERE vector @@ query ORDER BY rank desc LIMIT 100; On Thu, Aug 26, 2010 at 10:44 AM, Mike Christensen wrote: > I'm trying to make some improvements to my search results by taking > advantage of Postgres' setweight function, but am having a lot of > problems getting a query to run.. Here's the query that I run now > (I've removed some parts that just make it more complicated than you > need to be bothered with).. > > SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating FROM > Recipes R > WHERE (to_tsvector('english', title || ' ' || coalesce(description, > '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery(:search)) > ORDER BY R.Rating DESC LIMIT 100; > > :search will be something the user types in, such as "cookies eggs". > This runs well, since I have an index on that vector expression. > However, I'd like to do the following: > > title should have a weight of A. > > description should have a weight of B. > > steps should have a weight of C. > > I've tried a few things based on the documentation at > http://www.postgresql.org/docs/8.4/static/textsearch-controls.html, > but am not having a lot of luck. Here's what I've come up with some > far: > > SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, > setweight(to_tsvector(title), 'A') || > setweight(to_tsvector(coalesce(description, '')), 'B') || > setweight(to_tsvector(coalesce(steps, '')), 'C') as vector, > ts_rank_cd(vector, query) as rank > FROM Recipes R, > plainto_tsquery('cookies eggs') query > WHERE > vector @@ query > ORDER BY rank desc LIMIT 100; > > This doesn't work due to the error: column 'vector' does not exist, > which I guess is a valid point the way the query is parsed. The > examples basically assume you have a column called textsearch which is > a tsvector, and you use a trigger or something to update it. I'm > trying to avoid modifying my table schema for now. Is there a way to > express this query without pre-computing the tsvector on the table? > Also, is having a tsvector in the table basically the standard > approach and something I should just get used to doing? Maybe I can > use a view that computers the tsvector and index that? Sorry, this is > probably a totally brain dead fulltext question, but I'm new to this > whole concept. I make pretty web pages, and am not as smart as the > people on this list. > > Mike > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Re: [GENERAL] Using FULLTEXT search with different weights for various fields
Thanks, the function idea is a good one. It makes the query look a lot better. My next question is are there any obvious advantages of making the tsvector part of the table and using a trigger to update it (or refresh the vectors every night or something). Thanks! Mike On Thu, Aug 26, 2010 at 3:16 AM, Arjen Nienhuis wrote: > The simple answer is this: > SELECT > R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, > setweight(to_tsvector(title), 'A') || > setweight(to_tsvector(coalesce(description, '')), 'B') || > setweight(to_tsvector(coalesce(steps, '')), 'C') as vector, > ts_rank_cd( > setweight(to_tsvector(title), 'A') || > setweight(to_tsvector(coalesce(description, '')), 'B') || > setweight(to_tsvector(coalesce(steps, '')), 'C') > , > query > ) as rank > FROM Recipes R, > plainto_tsquery('cookies eggs') query > WHERE > vector @@ query > ORDER BY rank desc LIMIT 100; > > In the end I declared a function. Then you get: > CREATE INDEX ... ON f(title, description, steps); > SELECT C > R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, > f(title, description, steps) as vector, > ts_rank_cd(f(title, description, steps), query) as rank > FROM Recipes R, > plainto_tsquery('cookies eggs') query > WHERE > vector @@ query > ORDER BY rank desc LIMIT 100; > On Thu, Aug 26, 2010 at 10:44 AM, Mike Christensen > wrote: >> >> I'm trying to make some improvements to my search results by taking >> advantage of Postgres' setweight function, but am having a lot of >> problems getting a query to run.. Here's the query that I run now >> (I've removed some parts that just make it more complicated than you >> need to be bothered with).. >> >> SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating FROM >> Recipes R >> WHERE (to_tsvector('english', title || ' ' || coalesce(description, >> '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery(:search)) >> ORDER BY R.Rating DESC LIMIT 100; >> >> :search will be something the user types in, such as "cookies eggs". >> This runs well, since I have an index on that vector expression. >> However, I'd like to do the following: >> >> title should have a weight of A. >> >> description should have a weight of B. >> >> steps should have a weight of C. >> >> I've tried a few things based on the documentation at >> http://www.postgresql.org/docs/8.4/static/textsearch-controls.html, >> but am not having a lot of luck. Here's what I've come up with some >> far: >> >> SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating, >> setweight(to_tsvector(title), 'A') || >> setweight(to_tsvector(coalesce(description, '')), 'B') || >> setweight(to_tsvector(coalesce(steps, '')), 'C') as vector, >> ts_rank_cd(vector, query) as rank >> FROM Recipes R, >> plainto_tsquery('cookies eggs') query >> WHERE >> vector @@ query >> ORDER BY rank desc LIMIT 100; >> >> This doesn't work due to the error: column 'vector' does not exist, >> which I guess is a valid point the way the query is parsed. The >> examples basically assume you have a column called textsearch which is >> a tsvector, and you use a trigger or something to update it. I'm >> trying to avoid modifying my table schema for now. Is there a way to >> express this query without pre-computing the tsvector on the table? >> Also, is having a tsvector in the table basically the standard >> approach and something I should just get used to doing? Maybe I can >> use a view that computers the tsvector and index that? Sorry, this is >> probably a totally brain dead fulltext question, but I'm new to this >> whole concept. I make pretty web pages, and am not as smart as the >> people on this list. >> >> Mike >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Feature proposal
On Wed, Aug 25, 2010 at 08:47:10PM +0200, Wojciech Strzaaaka wrote: > The data set is 9mln rows - about 250 columns 250 columns sounds very strange to me as well! I start to getting worried when I hit a tenth of that. > CPU utilization - 1,2% (half of the one core) > iostat shows writes ~6MB/s, 20% busy > when I run 2 loads in parallel the CPU is split to 2*0,6%, hdd write > ~7MB (almost the same) If you've got indexes set up on the table then I'd expect this sort of behavior, you could try dropping them before the copy and then recreating them afterward. It would be great if PG could do these sorts of bulk index updates automatically! Maybe run the first few tens/hundred changes in the main index and then start logging the rows that will need indexing and bulk process and merge them at the end. Concurrent access seems a bit more complicated, but shouldn't be too bad. The case of a UNIQUE index seems to require a change in behavior. For example, the following are executed concurrently: Client A: COPY foo (id) FROM stdin; Client B: INSERT INTO foo (id) VALUES (1); with A starting before and finishing after B, and A sends a row with id=1. At the moment the behavior would be for A's data to be indexed immediately and hence B's conflicting change would fail. If PG did bulk index merging at the end, this would change to B's succeeding and A's failing when the index was brought up to date. These semantics are still compatible with SQL, just different from before so some code may be (incorrectly) relying on this. I've read discussions from: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php and http://archives.postgresql.org/pgsql-general/2008-01/msg01048.php but not found much recent. It seems to hold together better than the first suggestion. Second post notes that you may be better off working in work_mem batches to help preventing spilling to disk. Sounds reasonable, and if it's OK to assume the new rows will be physically close to each other then they can be recorded as ranges/run length encoded to reduce the chance of spilling to disk for even very large inserts. As per the second post, I'm struggling with BEFORE INSERT triggers as well, their semantics seem to preclude most optimizations. > what's also interesting - table is empty when I start (by truncate) > but while the COPY is working, I see it grows (by \d+ or > pg_total_relation_size) about 1MB per second > what I'd expect it should grow at checkpoints only, not all the > time - am I wrong? AFAIU, it'll constantly grow. -- Sam http://samason.me.uk/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert a binary filed to an integer field?
On Thu, Aug 26, 2010 at 12:28 AM, wrote: > You are right. I typed the message in hurry before I left home this morning. > It is the boolean type. Thanks for your suggestion. The NULL value may not > work for jdbc. On the application level, a fixed set of constants is used to > represent the three status, which are converted into an integer. TBH, you are probably best off taking those constants and writing them directly into the database (unless they happen to be dictionary pages), and applying a check constraint or using referential integrity to match against the known list. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Feature proposal
On Wed, Aug 25, 2010 at 8:48 PM, Craig Ringer wrote: > synchronous_commit also has effects on data safety. It permits the loss of > transactions committed within the commit delay interval if the server > crashes. If you turn it on, you need to decide how much recent work you can > afford to lose if the database crashes. Not sure if it can be applied with a > reload or whether it requires a full server restart. > I routinely set synchronous_commit = off on a per-connection or per-transaction basis. The beauty of it is that it still honors transaction boundaries. That is, if there is a server crash the transaction will be either there or not as a whole; it will not be partially applied. This works great for bulk imports and changes to the DB for me, since I can always just re-run my programs on such failure and everything will pick up where it left off. It takes some planning but is worth it. > So: if you don't know exactly what you're doing, leave fsync alone. I agree -- leave fsync alone. You get benefit from synchronous_commit without the corruption risk. The other advice on boosting checkpoint segments and timeout are spot on. Make them pretty big and it will make your import go way faster. If you have a spare disk on which to move the checkpoint segments so that you eliminate the seek time on them, move them to get even more speed. After your import, you can make the number of segments smaller again if that suits your workload. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert a binary filed to an integer field?
On Wed, Aug 25, 2010 at 9:29 PM, wrote: > I have an application in the product. Now, one status field needs to have > three statuses instead of two. How to make a such change in PostgreSQL? > ALTER TABLE product ALTER status TYPE int USING status::integer;
[GENERAL] Is TRUNCATE a DML statement?
Hi, Can we say that TRUNCATE belongs/falls under DML statement? TRUNCATE: http://www.postgresql.org/docs/8.2/interactive/sql-truncate.html I also understand that "There is no TRUNCATE command in the SQL standard." Regards, Gnanam -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is TRUNCATE a DML statement?
Hello 2010/8/26 Gnanakumar : > Hi, > > Can we say that TRUNCATE belongs/falls under DML statement? > > TRUNCATE: http://www.postgresql.org/docs/8.2/interactive/sql-truncate.html > > I also understand that "There is no TRUNCATE command in the SQL standard." > this is old information - now TRUNCATE is part of SQL standard ANSI SQL 2008 - F200 Regards Pavel Stehule > Regards, > Gnanam > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Is TRUNCATE a DML statement?
>> I also understand that "There is no TRUNCATE command in the SQL standard." > this is old information - now TRUNCATE is part of SQL standard ANSI > SQL 2008 - F200 Thanks for the update. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimizing queries that use multiple tables and many order by columns
On Thu, Aug 26, 2010 at 2:51 AM, Wappler, Robert wrote: > Do you really have the requirement to sort anything? Or let me ask it > the other way round: Assuming you have too much data, to sort it on the > application side, which user can read all this from one single table in > the user interface? > The tool that I'm using to pull this information together is really easy to use and maintain when you use it's database drivers to generate the queries. The extra sort here is so that the I could order the dataset by company, then by job number, then by the specific lab number, where jobs are assigned to a single company, and labs are assigned to a given job. The idea is for the application to be a substitute for bringing the dataset into a spreadsheet and peruse it there. I could just sort by company XOR both job and lab, but in the case of sorting by company, all of the companies job numbers would not necessarily be in order, and likewise the labs within the jobs would also not. This could be smoothed over by cutting down the dataset to a subset based on a few criteria, which is the next approach to take. > > > In the application here, these queries are used by a client > > application to fill a window's listbox that can be scrolled > > up or down. If the user changes direction of the scroll, it > > initiates a new cursor and query to fetch a page of results. > > If the scrolling motion is in the same direction, it simply > > continues to fetch more results from the cursor. But each > > time the direction of movement changes, there can be a > > significant lag. > > > > Then, obviously you shouldn't create a new cursor. You can create > backwards scrollable cursors. See the SCROLL option of the DECLARE > statement. > These queries are generated by the database driver and are not easily tweakable. Generally they use a subset of whatever is available via the ODBC interface. So, although not optimal, it's not something that I can improve in the shortterm. > > Any suggestions would be helpful! I'll assume for now that > > the indexes and queries can't be improved, but rather that I > > should tweak more of the postmaster settings. Please correct > > me if you know better and have time to reply. > > > > These options heavily depend on the environment and the data set, I > always see them as some last resort, because they might slow down other > queries if tweaked to much towards a specific thing. I have not yet > played around with this a lot. The things simply work fast enough here. > Others can give you better hints on this. > Thanks for you tips and insight. I'll make getting this portion of the system "good enough" and look to refactor later when needed. > > P.S. Is it possible to have indexes that involves several > > columns from different but related tables? If so, where can I > > learn about them? > > Nope. An index is tied to one table only. But another option is, to > precalculate the join. Depending on your needs (especially INSERT/UPDATE > performance), you could use triggers and/or a regular batch job, which > writes the joined results in another table. There you can index these > columns accordingly. In general, this is ugly and leads to redundancy > but can give a big performance boost and is sometimes the only option. > That's an option. I do use triggers now to log user changes to the tables, this wouldn't be too hard to do, but a bit hard to maintain down the road, perhaps. It's great to have a backup plan in the case that I have a backlog of support requests regarding the UI lbeing too laggy. Kind Regards, -Joshua > > -- > Robert... > > >
[GENERAL] Weird behavior with custom operators
Greetings, I am using postgresql 8.4 (debian backport). In order to optimize some of my code I decided to go with a custom data type to which I associated operators and an operator class for indexation. Here is the code I use : -- 8<--- CREATE TYPE tagvalue AS (storedvalue text); CREATE OR REPLACE FUNCTION num_lt(tagvalue, double precision) RETURNS BOOLEAN AS $$ DECLARE r FLOAT; retval BOOLEAN; BEGIN r := CAST(($1).storedvalue AS double precision); SELECT r < $2 INTO retval; RETURN retval; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION num_gt(tagvalue, double precision) RETURNS BOOLEAN AS $$ DECLARE r FLOAT; retval BOOLEAN; BEGIN r := CAST(($1).storedvalue AS double precision); SELECT r > $2 INTO retval; RETURN retval; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION num_lte(tagvalue, double precision) RETURNS BOOLEAN AS $$ DECLARE r FLOAT; retval BOOLEAN; BEGIN r := CAST(($1).storedvalue AS double precision); SELECT r <= $2 INTO retval; RETURN retval; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION num_gte(tagvalue, double precision) RETURNS BOOLEAN AS $$ DECLARE r FLOAT; retval BOOLEAN; BEGIN r := CAST(($1).storedvalue AS double precision); SELECT r >= $2 INTO retval; RETURN retval; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION num_eq(tagvalue, double precision) RETURNS BOOLEAN AS $$ DECLARE r FLOAT; retval BOOLEAN; BEGIN r := CAST(($1).storedvalue AS double precision); SELECT r = $2 INTO retval; RETURN retval; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION num_neq(tagvalue, double precision) RETURNS BOOLEAN AS $$ DECLARE r FLOAT; retval BOOLEAN; BEGIN r := CAST(($1).storedvalue AS double precision); SELECT r != $2 INTO retval; RETURN retval; EXCEPTION WHEN OTHERS THEN RETURN FALSE; END; $$ LANGUAGE plpgsql IMMUTABLE; CREATE OR REPLACE FUNCTION txt_lt(tagvalue, text) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue < $2; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION txt_gt(tagvalue, text) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue > $2; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION txt_lte(tagvalue, text) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue <= $2; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION txt_gte(tagvalue, text) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue >= $2; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION txt_eq(tagvalue, text) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue = $2; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION txt_neq(tagvalue, text) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue != $2; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_lt(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue < ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_gt(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue > ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_lte(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue <= ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_gte(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue >= ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_eq(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue = ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_eq(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue = ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OR REPLACE FUNCTION tv_neq(tagvalue, tagvalue) RETURNS BOOLEAN AS $$ SELECT ($1).storedvalue != ($2).storedvalue; $$ LANGUAGE 'sql' IMMUTABLE; CREATE OPERATOR > ( LEFTARG = tagvalue, RIGHTARG = double precision, PROCEDURE = num_gt, commutator = <, negator = <=, RESTRICT = scalargtsel, JOIN = scalargtjoinsel ); CREATE OPERATOR < ( LEFTARG = tagvalue, RIGHTARG = double precision, PROCEDURE = num_lt, commutator = >, negator = >=, RESTRICT = scalarltsel, JOIN = scalarltjoinsel ); CREATE OPERATOR >= ( LEFTARG = tagvalue, RIGHTARG = double precision, PROCEDURE = num_gte, commutator = <=, negator = <, RESTRICT = scalargtsel, JOIN = scalargtjoinsel ); CREATE OPERATOR <= ( LEFTARG = tagvalue, RIGHTARG = double precision, PROCEDURE = num_lte, commutator = >=, negator = >, RESTRICT = scalarltsel, JOIN = scalarltjoinsel ); CREATE OPERATOR = ( LEFTARG = tagvalue, RIGHTARG = double precision, PROCEDURE = num_eq, commutator = =, negator = !=, RESTRICT = eqsel, JOIN = eqjoinsel, HASHES, MERGES ); CREATE OPERATOR != ( LEFTARG = tagvalue, RIGHTARG = double precision, PROCEDURE = num_neq, commutator = !=, negator = =, RESTRICT = neqsel, JOIN = neqjo
Re: [GENERAL] Weird behavior with custom operators
Matthieu HUIN writes: > xxx=> SELECT value FROM tags WHERE value > 3 LIMIT 1; > ERROR: unsupported type: 17886 I think you're probably hitting this: /* * Can't get here unless someone tries to use scalarltsel/scalargtsel on * an operator with one numeric and one non-numeric operand. */ elog(ERROR, "unsupported type: %u", typid); While you could possibly make it work by writing wrappers around those selectivity functions instead of using them directly, I'm kind of wondering what is the point of this datatype anyway? Seems like declaring it as a domain over text might be easier. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] IBATIS support for postgres cursors
On 8/26/2010 5:27 AM, atul.g...@globaldatapoint.com wrote: Hi Guy, Thanks for your reply. We were able to work out with type OTHER. How did you get OTHER to work? Did you define your own TypeHandler? -- Guy Rouillier -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] unexpected message type 0x58 during COPY from stdin
On Wed, Aug 25, 2010 at 12:20 PM, bricklen wrote: > On Wed, Aug 25, 2010 at 12:01 PM, Tom Lane wrote: >> >> bricklen writes: >> > I'm getting a strange error during reload of one of our databases. >> >> This appears to indicate that the server's first idea that there was >> trouble came when the client side disconnected partway through a COPY: >> >> > 2010-08-25 04:16:46 PDT [2401]: [1-1] (user=postgres) ERROR: unexpected >> > message type 0x58 during COPY from stdin >> >> (0x58 = 'X' = Terminate message, implying client called PQfinish) >> >> So it would be useful to look at what pg_restore thinks happened. >> I rather suspect your cron setup sends pg_restore's output to >> /dev/null ... or at least you failed to show it. >> >> regards, tom lane > > > cron: > > MAILTO="" > > 30 2 * * * /var/lib/pgsql/sync-db.sh 2>&1 > > I'll modify that to append to a log file and see what else turns up. > FWIW, the output to a logfile from the cron was minimal and didn't show anything new. I'm going to try it again tomorrow starting a couple hours later, to rule out some odd 4am-ish problem that affects the cron jobs. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] looping on NEW and OLD in a trigger
Hi, I'm very new to writing postgres procedures, and I'm trying to loop over the fields in the NEW and OLD variables available in an after trigger, and I can't quite get the syntax correct. Could someone point me at an example? Thanks, Mike -- Michael P. Soulier , 613-592-2122 x2522 "Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction." --Albert Einstein -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looping on NEW and OLD in a trigger
"Michael P. Soulier" writes: > I'm very new to writing postgres procedures, and I'm trying to loop over > the fields in the NEW and OLD variables available in an after trigger, > and I can't quite get the syntax correct. If you're trying to do this in plpgsql, the answer is you can't. plpgsql doesn't support dynamic field references, which is what you'd need for what (I think) you're trying to do. You can do it in pltcl or plperl, and probably also plpython though I don't know enough python to be sure. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looping on NEW and OLD in a trigger
On 26/08/10 Tom Lane did say: > If you're trying to do this in plpgsql, the answer is you can't. > plpgsql doesn't support dynamic field references, which is what you'd > need for what (I think) you're trying to do. > > You can do it in pltcl or plperl, and probably also plpython though > I don't know enough python to be sure. Ok, I'll try plpython then. Thanks, Mike -- Michael P. Soulier , 613-592-2122 x2522 "Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction." --Albert Einstein -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?
That index doesn't match the query, so it can't be used. Try select item_pk from node where tag='primaryIssuer.entityType' and to_tsvector('english', val) @@ plainto_tsquery('Limited Partnership'); Tom and Oleg: thank you for clarifying this. I see where I made the mistake in applying the example from the documentation. Note that seeing that the scan on the other index is pretty cheap, it's not obvious that indexing the @@ clause is better anyway. So is it the case that, as in the documented example, I need to add a column of type ts_vector to the table for the index to be most effective? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index?
Denis Papathanasiou writes: > So is it the case that, as in the documented example, I need to add a > column of type ts_vector to the table for the index to be most effective? You can do it either way, depending on what you want the queries to look like. One advantage of materializing the tsvector entries in the table is that the to_tsvector() work doesn't have to be done over again if the planner decides not to use that index for some reason (for instance, it thinks the other index is more effective for the particular query). regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Weird behavior with custom operators
Hello Tom, Thanks for the answer. I think this is exactly what's happening here. What I cannot understand though, is why this specific error isn't thrown when using the others operators I defined that have the same kind of operands. About your last question; I created this datatype so that I could store values as text and compare them to floats easily if they can be converted into a numeric type (since numeric order differs from alphanumeric order). I used to call CASTs in my queries when needed, but I realized that doing so made my various existing indexes useless. And since I'd rather not create a specific table for numeric values, this is what I came up with. I have very little experience with postgresql or databases in general so any advice on that will be gladly taken. You are suggesting to use a domain : would I be able to extend operators that way ? Le jeudi 26 août 2010 à 11:05 -0400, Tom Lane a écrit : > Matthieu HUIN writes: > > xxx=> SELECT value FROM tags WHERE value > 3 LIMIT 1; > > ERROR: unsupported type: 17886 > > I think you're probably hitting this: > > /* > * Can't get here unless someone tries to use scalarltsel/scalargtsel on > * an operator with one numeric and one non-numeric operand. > */ > elog(ERROR, "unsupported type: %u", typid); > > While you could possibly make it work by writing wrappers around those > selectivity functions instead of using them directly, I'm kind of > wondering what is the point of this datatype anyway? Seems like > declaring it as a domain over text might be easier. > > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert a binary filed to an integer field?
On Thu, Aug 26, 2010 at 1:47 PM, wrote: > I know what I need to do in the application for the change. My problem is > that I don't know a good process to make a such change in PostgreSQL. Since > the application is in production already, do I need to create another field > or convert the data from the boolean field into the new field? recent versions of postgres allow you to do transformations w/alter table: alter table foo alter bool_col type text using case when bool_col then 'foo' else 'bar' end; expression can hook to a function if you want to do something really complex (yes this is awesome) :-) merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert a binary filed to an integer field?
That shall work. I assume that 0 - false, 1 - true. Thank everyone for your suggestion. - Original Message - From: "Arjen Nienhuis" To: wei...@lycos.com Cc: pgsql-general@postgresql.org Sent: Thursday, August 26, 2010 4:56:09 AM GMT -08:00 US/Canada Pacific Subject: Re: [GENERAL] How to convert a binary filed to an integer field? On Wed, Aug 25, 2010 at 9:29 PM, < wei...@lycos.com > wrote: I have an application in the product. Now, one status field needs to have three statuses instead of two. How to make a such change in PostgreSQL? ALTER TABLE product ALTER status TYPE int USING status::integer; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] looping on NEW and OLD in a trigger
On Thu, Aug 26, 2010 at 12:59 PM, Michael P. Soulier wrote: > On 26/08/10 Tom Lane did say: > >> If you're trying to do this in plpgsql, the answer is you can't. >> plpgsql doesn't support dynamic field references, which is what you'd >> need for what (I think) you're trying to do. >> >> You can do it in pltcl or plperl, and probably also plpython though >> I don't know enough python to be sure. > > Ok, I'll try plpython then. For a more in-depth treatment of the topic, see: http://www.mail-archive.com/pgsql-hack...@postgresql.org/msg139182.html (the entire thread is worth reading IMO if you want to really understand dynamic record inspection really well, both in pl/pgsql and other PLs). merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to convert a binary filed to an integer field?
On Wed, Aug 25, 2010 at 1:29 PM, wrote: > I have an application in the product. Now, one status field needs to have > three statuses instead of two. How to make a such change in PostgreSQL? You could also use bitstrings. -- To understand recursion, one must first understand recursion. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] WAL Archive Log
Is there a way to get postgres to write a line to the log file when it creates a WAL file? We wrote a script that tries to grab the times between WAL file creation and ingestion without stopping to make absolutely sure that postgres actually logs the WAL file creation, and so we're kinda stuck staring at our screens and blinking. Seriously. It took me a good five minutes to muster the brain power to write this email. -Sam
Re: [GENERAL] WAL Archive Log
Excerpts from Sam Nelson's message of jue ago 26 19:24:00 -0400 2010: > Is there a way to get postgres to write a line to the log file when it > creates a WAL file? We wrote a script that tries to grab the times between > WAL file creation and ingestion without stopping to make absolutely sure > that postgres actually logs the WAL file creation, and so we're kinda stuck > staring at our screens and blinking. Not currently. WAL files are created well in advance of them being used, so perhaps there is no point, depending on what you want to do with the information. File recycling (which is the usual mechanism for them being created) is logged, sort of: it happens during checkpoint. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Feature proposal
Heyho! On Wednesday 25 August 2010 09.15:33 wstrzalka wrote: > I'm currently playing with very large data import using COPY from > file. > > As this can be extremely long operation (hours in my case) the nice > feature would be some option to show operation progress - how many > rows were already imported. Recently, I've found (on Linux, don't know if other OSs export this information) /proc//fdinfo/ extremely helpful. It tells you the position of the file pointer of file number in process (I guess for a COPY import this would be the postgresql backend handling your import session.) Unlike other options, you can also use this if you only notice that the process is long-running after you've already started it. Of course it probably will not work if the file is mmapped or otherwise not read in a sequential fashion. cheers -- vb -- All Hail Discordia! signature.asc Description: This is a digitally signed message part.
[GENERAL] Is your data okay?
I found this tool pretty helpful for validating my architectural decisions.. http://www.howfuckedismydatabase.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general