[GENERAL] Searching for Duplicates and Hosed the System
I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have a database with a little more than 18 million records that takes up about 3GB. I need to check to see if there are duplicate records, so I tried a command like this: SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable GROUP BY fld1, fld2, fld3, fld4 ORDER BY 1 DESC; I knew this would take some time, but what I didn't expect was that about an hour into the select, my mouse and keyboard locked up and also I couldn't log in from another computer via SSH. This is a Linux machine running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on the disc too. I finally had to shut the power off and reboot to regain control of my computer (that wasn't good idea, either, but eventually I got everything working again.) Is this normal behavior by PG with large databases? Did I misconfigure something? Does anyone know what might be wrong? - Bill Thoen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Searching for Duplicates and Hosed the System
Tom, here's the "explain" results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) -> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr -> Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote: > Bill Thoen <[EMAIL PROTECTED]> writes: > > I knew this would take some time, but what I didn't expect was that about > > an hour into the select, my mouse and keyboard locked up and also I > > couldn't log in from another computer via SSH. This is a Linux machine > > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on > > the disc too. > > > I finally had to shut the power off and reboot to regain control of my > > computer (that wasn't good idea, either, but eventually I got everything > > working again.) > > I've seen Fedora go nuts like that when it ran out of memory. Once it > starts to swap heavily, performance goes into the tank; and once the > kernel realizes it's in memory trouble, it starts to kill processes > more or less at random. That might explain why ssh stopped working. > > One thing to do to make it more robust is to disable memory overcommit. > I suspect also that configuring it with lots of swap space is > counterproductive, because that just encourages the kernel to allow lots > of swapping. I haven't actually experimented with that part though. > > As for why PG ran the system out of memory, I suspect that the planner > drastically underestimated the number of groups to be created by your > GROUP BY, and thought it could get away with a hash aggregation. We > don't currently have any provision for spilling hash aggregation to > disk, so if there's a very large number of groups the table just gets > very big :-(. The planner is not supposed to choose hash agg if the > estimated table size exceeds work_mem ... but if it had out-of-date > statistics to work with it might have gotten the wrong answer. Have > you ANALYZEd this table recently? What does EXPLAIN show as the > estimated number of result rows? > > regards, tom lane > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Searching for Duplicates and Hosed the System
Something is really screwy here. I tried what looked like a simpler task than I tried when I started this message thread. The only unusual issue here is that the table compliance_2006 is rather big (over 18 million records). The table ers_regions is pretty small (about 3100 records) and all the WHERE fields are indexed (except e.region). Here's the not-too-complicated SQL: SELECT a.* FROM compliance_2006 a, ers_regions e WHERE a.fips_st_cd=e.fips_st AND a.fips_cnty_cd=e.fips_cou AND e.region=1; I ran VACUUM ANALYZE just before I launched this and there were no other postgress jobs running. I'm the only user as well. I also ran EXPLAIN prior to the run and got this: Nested Loop (cost=11.71..28800.34 rows=7219 width=584) -> Seq Scan on ers_regions e (cost=0.00..71.90 rows=16 width=28) Filter: (region = 1) -> Bitmap Heap Scan on compliance_2006 a (cost=11.71..1788.76 rows=451 width=584) Recheck Cond: ((a.fips_st_cd = "outer".fips_st) AND (a.fips_cnty_cd = "outer".fips_cou)) -> Bitmap Index Scan on key_tract (cost=0.00..11.71 rows=451 width=0) Index Cond: ((a.fips_st_cd = "outer".fips_st) AND (a.fips_cnty_cd = "outer".fips_cou)) (7 rows) So I launched it to see what would happen. This resulted in a very busy disk drive for about an hour, and one by one things started dying on my FC6 Linux machine. First the mouse disappeared, then the cursor in the terminal window where I was running psql vanished, the clock stopped, and the keyboard stopped responding. Meanwhile, the disk drive thrashed on. Finally the screen saver kicked in and shortly thereafter I logged back in only to see the word "Killed" sitting there on the last line and all my machine's services were running again. Just no data output. I looked in the log and saw this: LOG: transaction ID wrap limit is 1073746500, limited by database "postgres" LOG: transaction ID wrap limit is 1073746500, limited by database "postgres" LOG: could not send data to client: Broken pipe I guess this is what killed it, but does it make sense that such a simple request wrapped around the transaction limit? Is the bad behavior of my machine while running this query consistent with an error like this or is the machine broken and that's what caused Postgresql to crater? What should I set the wrap limit to? What suggestions would you make for tracking down the exact problem and fixing it? Any help would be appreciated - Bill Thoen Tom Lane wrote: Bill Thoen <[EMAIL PROTECTED]> writes: Tom, here's the "explain" results: Does this help explain what went wrong? (And yes, I think there will be a *lot* of groups.) explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc; QUERY PLAN Sort (cost=15119390.46..15123902.54 rows=1804832 width=160) Sort Key: count(*) -> GroupAggregate (cost=13782933.29..14301822.43 rows=1804832 width=160) -> Sort (cost=13782933.29..13828054.08 rows=18048318 width=160) Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr, field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr -> Seq Scan on compliance_2006 (cost=0.00..1039927.18 rows=18048318 width=160) (6 rows) Hmm ... no, actually, that shows the planner doing the right thing for lotsa groups: picking GroupAggregate instead of HashAggregate. The estimated number of groups is 1804832, which might or might not have much to do with reality but in any case seems enough to keep it away from HashAggregate. Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE or VACUUM ANALYZE? The only theory I can think of at this point is that your database statistics are more correct now than they were when you had the problem. If you try the query again, does it behave more sanely? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] do you have an easy example of postgis and mapserver?
On Tue, Aug 21, 2007 at 12:44:49PM +0200, Ottavio Campana wrote: > I'm sorry this mail is not very in topic, but I hope you can help me. Just so you know, perhaps a better list to contact with this is the MapServer mailing list at http://lists.umn.edu/archives/mapserver-users.html, or maybe the PostGIS list at http://www.postgis.org/mailman/listinfo/postgis-users. > I'm trying to learn how postgis and mapserver work together, but I > cannot understand nearly anything. I mean, I read the documentation of > postgis and I think I understood it, but I cannot do anything useful > with it. I think before you try to do something customized with python you should get familiar the mapserver MAP file which contains all the instructions to map features from various sources like shape files, MapInfo TAB files, PostGIS data sources, and so on. An example for mapping a layer from a PostGIS source in a MAP file looks like this: LAYER NAME states TYPE POLYGON STATUS ON CONNECTION "user=gisuser password=*** dbname=us_data host=localhost port=5432" CONNECTIONTYPE POSTGIS DATA "the_geom from states" CLASSITEM 'name' PROJECTION "+proj=latlong +datum=WGS84" END CLASS NAME "US States" OUTLINECOLOR 0 0 0 COLOR 255 255 196 END END The key bits here related to PostGIS are the CONNECTION, where you specify all the info needed to get access to your PostgreSQL database, the CONNECTIONTYPE which you set to POSTGIS, and the DATA directive in which you supply a string that contains the SQL expression that pulls out the geometry features you want to map. If you do this with python, then you would first need to create an instance of a mapObj, and then create a layerObj and add it to the map object, setting the properties for the connection. However, my knowledge of this approach is still very sketchy. Documentation is spread around, but the two sites you should peruse are the MapServer docs at: http://mapserver.gis.umn.edu/docs and the PostGIS docs at: http://postgis.refractions.net/docs/. - Bill Thoen ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] PG Seg Faults Performing a Query
How would you suggest I try to track down this problem? I run the following query: SELECT a.* FROM compliance_2006 a, ers_regions b WHERE a.fips_st_cd=b.fips_st AND a.fips_cnty_cd=b.fips_cou AND b.region =1 AND a.fips_st_cd='17' AND a.fips_cnty_cd='003'; and it works. But when I try this: SELECT a.* FROM compliance_2006 a, ers_regions b WHERE a.fips_st_cd=b.fips_st AND a.fips_cnty_cd=b.fips_cou AND b.region =1 AND a.fips_st_cd='17' ; psql dies with the message: Segmentation Fault. Any suggestions? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] PG Seg Faults Performing a Query
On Tue, Aug 21, 2007 at 04:38:42PM -0500, Scott Marlowe wrote: > On 8/21/07, Bill Thoen <[EMAIL PROTECTED]> wrote: > > How would you suggest I try to track down this problem? > > I run the following query: > > > > SELECT a.* FROM compliance_2006 a, ers_regions b > > WHERE a.fips_st_cd=b.fips_st > > AND a.fips_cnty_cd=b.fips_cou AND b.region =1 > > AND a.fips_st_cd='17' AND a.fips_cnty_cd='003'; > > > > and it works. But when I try this: > > > > SELECT a.* FROM compliance_2006 a, ers_regions b > > WHERE a.fips_st_cd=b.fips_st > > AND a.fips_cnty_cd=b.fips_cou AND b.region =1 > > AND a.fips_st_cd='17' ; > > > > psql dies with the message: > > Segmentation Fault. > > so the client psql is what's dieing right? In that case you likely > are getting too big a result set for psql to handle at once. Trying > declaring a cursor to hold your query and fetching 100 or 1000 or so > rows at a time. > > Just guessing. What's the exact text of the error message? > The exact message was: Segmentation Fault. But the table compliance_2006 is very big (18 million plus records) so I'll try that cursor idea. But even so, an error like that makes me think that something's broken. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PG Seg Faults Performing a Query
On Wed, Aug 22, 2007 at 09:46:21AM +1200, Andrej Ricnik-Bay wrote: > On 8/22/07, Bill Thoen <[EMAIL PROTECTED]> wrote: > > How would you suggest I try to track down this problem? > > Any suggestions? > postgres version? > Operating system? > Anything in the log(s)? PostgreSQL Version is 8.1.5, running on Linux (Fedora Core 6). The last few lines in the Serverlog are: LOG: unexpected EOF on client connection LOG: transaction ID wrap limit is 1073746500, limited by database "postgres" LOG: transaction ID wrap limit is 1073746500, limited by database "postgres" (I ran VACUUM FULL after it crashed to make sure there was no loose disk space floating around, so that last line was probably from that.) I assume that bit about "transaction wrap limit" is informational and not related to this problem. My PostgreSQL is working great for small SQL queries even from my large table (18 million records). But when I ask it to retrieve anything that takes it more than 10 minutes to assemble, it crashes with this "Segmentation Fault" error. I get so little feedback and I'm still pretty unfamiliar with Postgresql that I don't even know where to begin. This version of PostgreSQL was compiled from source with support for various other packages needed for GIS support, but the tables I'm trying to extract data from contain no GIS information. So I believe that this operation is plain PostgreSQL. Any help you can offer as to how I can track down what's wrong would be greatly appreciated. If I can't get this to work and can only use small tables in PG, then its usefulnes to me will be pretty limited. - Bill Thoen ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] PG Seg Faults Performing a Query
As you requested, here's what bt in gbd reports: (gdb) bt #0 0x003054264571 in fputc () from /lib64/libc.so.6 #1 0x0040dbd2 in print_aligned_text () #2 0x0040f10b in printTable () #3 0x0041020b in printQuery () #4 0x00407906 in SendQuery () #5 0x00409153 in MainLoop () #6 0x0040b16e in main () Please tell me what it means if you can and if I can fix this problem. Thanks, - Bill Thoen Alvaro Herrera wrote: Martijn van Oosterhout escribió: That said, it would be nice if it returned an error instead of crashing. In my opinion it isn't just a matter of "would be nice". It is a possible bug that should be investigated. A look at a stack trace from the crashing process would be the first place to start. In order to do that, please set "ulimit -c unlimited" and rerun the query under psql. That should produce a core file. Then run gdb psql core and inside gdb, execute "bt". Please send that output our way. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] PG Seg Faults Performing a Query
I'm a bit out of my depth with using these debugging tools and interpreting their results, but I think the problem is due to the output being just too big for interactive display. Using the same query with tighter limits in the WHERE clause works perfectly. When I changed the SQL script to write output into a table it worked with the same query using even looser limits in the WHERE clause. So sending output to a table instead of to the monitor when the queries produce a large amount of output is reliable, faster and doesn't tie up the machine. I tried using strace, but it produced so much telemetry and unfortunately I couldn't understand it anyway that I don't think this would do me any good. I don't want to bug the PostgreSQL list with a problem that's probably not a PostgreSQL one, but if someone here would be willing to help me track down this apparent popen or fork problem I'd appreciate it. However, I managed to get the results I needed, so we could also call this "fixed via workaround." Thanks for the help, Tom and others! - Bill Thoen Tom Lane wrote: Bill Thoen <[EMAIL PROTECTED]> writes: (gdb) bt #0 0x003054264571 in fputc () from /lib64/libc.so.6 #1 0x0040dbc2 in print_aligned_text (title=0x0, headers=0x5665d0, cells=0x2f8fc010, footers=0x557c90, opt_align=0x557ef0 'l' , "rr", 'l' times>, "rl lll", opt_tuples_only=0 '\0', opt_numeric_locale=0 '\0', opt_border=1, encoding=8, fout=0x0) at print.c:448 #2 0x0040f0eb in printTable (title=0x0, headers=0x5665d0, cells=0x2f8fc010, footers=0x557c90, align=0x557ef0 'l' , "rr", 'l' , "rl lll", opt=0x7fff3e3be8c0, fout=0x3054442760, flog=0x0) at print.c:1551 OK, so the problem is that print_aligned_text is being passed fout = NULL. Since that wasn't what was passed to printTable, the conclusion must be that PageOutput() was called and returned NULL --- that is, that its popen() call failed. Obviously we should put in some sort of check for that. I can see three reasonable responses: either make psql abort entirely (akin to its out-of-memory behavior), or have it fall back to not using the pager, either silently or after printing an error message. Any thoughts which way to jump? Meanwhile, the question Bill needs to look into is why popen() is failing for him. I'm guessing it's a fork() failure at bottom, but why so consistent? strace'ing the psql run might provide some more info. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Why does a normally fast query run so slow when the table is in a partition?
ce_disc, vfm.clu.ogc_fid, vfm.clu.statecd, vfm.clu.countycd, vfm.clu.tractnbr, vfm.clu.farmnbr, vfm.clu.clunbr, vfm.clu.acres -> Append (cost=0.00..1098364.31 rows=2 width=52) (actual time=38367.325..38367.339 rows=1 loops=1) -> Seq Scan on clu (cost=0.00..1098356.00 rows=1 width=57) (actual time=38367.222..38367.222 rows=0 loops=1) Output: vfm.clu.source_disc, vfm.clu.ogc_fid, vfm.clu.statecd, vfm.clu.countycd, vfm.clu.tractnbr, vfm.clu.farmnbr, vfm.clu.clunbr, vfm.clu.acres Filter: ((source_disc = 'co'::bpchar) AND (ogc_fid = 116337)) -> Index Scan using clu_co_pkey on clu_co clu (cost=0.00..8.31 rows=1 width=48) (actual time=0.090..0.096 rows=1 loops=1) Output: vfm.clu.source_disc, vfm.clu.ogc_fid, vfm.clu.statecd, vfm.clu.countycd, vfm.clu.tractnbr, vfm.clu.farmnbr, vfm.clu.clunbr, vfm.clu.acres Index Cond: ((source_disc = 'co'::bpchar) AND (ogc_fid = 116337)) Total runtime: 38367.450 ms (10 rows) TIA, - Bill -- *Bill Thoen* GISnet - www.gisnet.com <http://www.gisnet.com/> 1401 Walnut St., Suite C Boulder, CO 80302 303-786-9961 tel 303-443-4856 fax bth...@gisnet.com
Re: [GENERAL] Why does a normally fast query run so slow when the table is in a partition?
Got it solved! The problem was one of two things,or maybe both. I had somehow gotten over 15 million records into the master table and even though I "deleted" them and run VACUUM ANALYZE over the table, they were still taking up space in the table. Perhaps even just opening a table with that much garbage in it is what was taken 30-40 seconds. So I made a copy of the structure, blew away the original table, taking the bad records out with it and then renamed the copy and used that as the master table Also, I found that some of the partition tables had a third index besides the Primary Key index. This was an earlier unique index that I was using before I learned how to add a primary key to an existing table. So I cleaned up all the partition tables making sure that they all had the exact same indexes and constraints set,. Then I relinked everything and tried it with several queries. Wow! Over 20 million records (so far), and now I can retrieve any one of them in less than a second! It seems that partitioning is even better than I expected. Thanks for the help. Your comment, Steve, about the Planner being finicky was the clue. KI guess it's intolerant of inconsistency. Probably a good thing for a database. On 2/2/2011 3:10 PM, Steve Crawford wrote: On 02/02/2011 01:35 PM, Bill Thoen wrote: Steve Crawford wrote: On 02/02/2011 12:17 PM, Bill Thoen wrote: I've got a large (and growing) database set up as a partitioned database What is the setting of contstraint_exclusion? http://www.postgresql.org/docs/9.0/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION Cheers, Steve It's set to 'Partition' That sounds good. Out of curiosity, what happens if you use an explicit cast?: ...where 'co'::char(2)... I've seen lots of cases where the planner doesn't use indexes when the data-type differs sometimes even subtly. Might be the same for constraint exclusion. Cheers, Steve -- *Bill Thoen* GISnet - www.gisnet.com 303-786-9961 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Directing Partitioned Table Searches
In a partitioned table, is it possible to specify the partition for a query to search using a variable instead of a constant? EXAMPLE: Join another table to the partitioned one Table: clu (partitioned by state) ogc_fid bigint cluid char(16) state bpchar(2) constraint: state='mi' (or 'co', 'ks', etc. for each partition) Table: farms link bigint zone bpchar(2) farmid char(7) ... This selection will result in partitions being searched SELECT cluid, farmid FROM clu JOIN farms ON ogc_fid=link This selection will result in only the 'mi' (Michigan) partition being searched SELECT cluid, farmid FROM clu JOIN farms ON ogc_fid=link WHERE state='mi' And this selection will result in ALL partitions being searched. But why? SELECT cluid, farmid FROM clu JOIN farms ON ogc_fid=link WHERE state=zone I'd like to be able to run some queries w/o the overhead of searching partitions unnecessarily. Can it be done? Regards, -- *Bill Thoen* GISnet - www.gisnet.com <http://www.gisnet.com/> 1401 Walnut St., Suite C Boulder, CO 80302 303-786-9961 tel 303-443-4856 fax bth...@gisnet.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Directing Partitioned Table Searches
Vick Khera wrote: Your best bet is to know which partition you need and write your query that way dynamically, rather than trying to use a generic query and have the DB do the constraint exclusion. In your above case, if you know that 'zone' will limit you to just the MI table, then specify the MI table instead of the base clu table. Thanks. That sounds simple enough. Since I want to automate this, I guess the next step is to learn how to create and execute a "dynamic" query. I think I know how to do that. -- *Bill Thoen* GISnet - www.gisnet.com <http://www.gisnet.com/> 1401 Walnut St., Suite C Boulder, CO 80302 303-786-9961 tel 303-443-4856 fax bth...@gisnet.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] plsql question
On 2/10/2011 6:40 PM, Geoffrey Myers wrote: I am trying to write a plsql routine that will delete a range of characters based on their octal or hexadecimal values. Something like the 'tr' shell command will do: cat file| tr -d ['\177'-'\377'] Can't seem to figure this one out. Pointers would be appreciated. Would the string functions to_hex(), translate(), replace() and friends work for you? (see http://www.postgresql.org/docs/9.0/static/functions-string.html)
Re: [GENERAL] Logic AND between some strings
Just to make sure, you're asking for the logical AND, not the bitwise AND? In other words you're not talking about getting into bit shifting with << and >> and masking with &? For the logical AND, you need to use expressions that evaluate to TRUE or FALSE, and follow the rules in this "truth table": Expr 1Expr 2 AND returns truetrue true truefalsefalse false true false false falsefalse like 01100010 == 1100 AND 01100010 > 1100 would evaluate to false AND true (depending on how you interpret these bitstrings), and so that evaluates to false. If you're dealing with bitstrings (text made up of only zeros and ones), then I'd implement this by just using text comparisions, since bitstrings are just text and have the same comparision behavior as the unsigned integers they could represent. On 3/7/2011 7:25 AM, yagru_alvarez wrote: I want to make a Logic AND between some strings of 0s and 1s .. Here you have an example: 1-> 01100010 2-> 1100 I wanto to make a LOGIC AND between 01100010 and 1100. I' m working with C++, I need some code to have an idea about how I can perform that. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PG and dynamic statements in stored procedures/triggers?
On 3/7/2011 7:55 AM, Adrian Klaver wrote: On Monday, March 07, 2011 6:45:11 am Durumdara wrote: Hi! Thanks! How do I create "cursor" or "for select" in PGSQL with dynamic way? For example :tbl = GenTempTableName() insert into :tbl... insert into :tbl... insert into :tbl... for select :part_id from :tbl begin exec 'select count(*) from subitems where id = ?' using :part_id into :sumof update :tbl set sumof = :sumof where part_id=:part_id end; Can you show me same example? There are examples in the docs at the link provided. Though I would suggest reading the pl/pgsql documentation from the beginning to get an idea of its structure. You won't find this easy. I've spent an awful lot of time the last two days trying to figure out how to pass variables between SQL and plpgsql, and the examples don't cover all the things you'd think you should be able to do but because Postgres SQL doesn't have variables. What it does have comes from psql and they seem to be more like text replacement placeholders than variables you can evaluate. For example, I have a need for a tool that gets an initial record id from the user, then it looks up that key and finds the primary keys of two other tables related to the firstkey, then it looks those tables up and displays the data from each side by side so I can check the differences between the records. (Basically, it's a case of data from two vendors that carry a common key, and I'm just spot checking). I've been using interactive psql, but I thought an app as simple as this is in concept wouldn't be so hard to do, but it is if you don't know enough of what's in the API like, isn't there a function to enumerate a table's attributes?. Or how do you capture the results of a select that calls a function in SQL? (e.g.: \set myResults :myResults = SELECT myFunction(); -- this won't fly; nor will this: SELECT INTO :myResults myFunction(); Anyway, I'm begining to see that I had some misconceptions about what you can do within SQL and what you're better off doing in plpgsql. Or C. Read the whole section on variables in the manual. That's very good advice. In fact, peruse it. Because if you read it lightly, you'll have to to go over it again and again. But after reading your note, dynamic SQL seems like it might be just what I'm looking for too. Didn't realize it was an option, since I see it's documented near the end of the manual, and there's only so much RTFMing I can do at a sitting, so that's all new territory to me. But if it works like you've sketched out here... well I'm going to try it and see. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] output screen in psql
On Wed, Mar 09, 2011 at 05:51:51PM -0800, abcdef wrote: > Hi, I use postgres in red hat linux . > When I use psql and select the content of the table . > I find the format is not tidy . > I change align and unalign , the output is not filful my requirement. > Any special setting that can change the line size like oralce to fit my > requirement ??? > So, the whole table can be view within the screen of psql ??? > I use \x to toggle between block display and row display. And if you log in using ssh on Linux, you can zoom in and out with Ctrl-Shift + and Ctrl-Shft - (Hold down the control and shift keys press the plus key to zoom in (increase the font size), or the minus key to decrease it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Partitioned Database and Choosing Subtables
I've got a ver 8.4.5 partitioned data base with records organized by US state, so the partitions are set up by state. When I query this database and include the key field that tells postgres what partition you , everything works as I expect. It searches only the specified partition, and it's fast . But that's only if I use a constant, like this: SELECT lions, tigers, bears FROM WildLife WHERE state_pt = 'CO'; What I want to be able to do is put this key value in a table and PG look in whatever partition the column specifies, like so: SELECT lions, tigers, bears, statecode FROM WildLife WHERE state_pt = statecode; However when I try anything other than a constant, it search EVERY partition, sequentially, which is not what I want it to do. So is there any way to specify the partition to search using a variable/column name? -- *Bill Thoen* GISnet - www.gisnet.com 303-786-9961 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partitioned Database and Choosing Subtables
On 3/15/2011 12:02 PM, Igor Neyman wrote: -Original Message- From: Bill Thoen [mailto:bth...@gisnet.com] Sent: Monday, March 14, 2011 11:31 PM To: pgsql-general@postgresql.org Subject: Partitioned Database and Choosing Subtables I've got a ver 8.4.5 partitioned data base with records organized by US state, so the partitions are set up by state. When I query this database and include the key field that tells postgres what partition you , everything works as I expect. It searches only the specified partition, and it's fast . But that's only if I use a constant, like this: SELECT lions, tigers, bears FROM WildLife WHERE state_pt = 'CO'; What I want to be able to do is put this key value in a table and PG look in whatever partition the column specifies, like so: SELECT lions, tigers, bears, statecode FROM WildLife WHERE state_pt = statecode; However when I try anything other than a constant, it search EVERY partition, sequentially, which is not what I want it to do. So is there any way to specify the partition to search using a variable/column name? -- *Bill Thoen* GISnet - www.gisnet.com 303-786-9961 Try using "dynamic" sql: EXECUTE 'SELECT lions, tigers, bears, statecode FROM WildLife WHERE state_pt = ' || statecode INTO ...; See: http://www.postgresql.org/docs/8.4/interactive/plpgsql-statements.html Thanks Igor. It was a nice try -- and I thought it would work, but the Planner had other plans. Basically, I tried the interactive method using a PREPARE statement with one text parammeter followed by an EXECUTE statement. Unfortunately, the query still went rummaging across the entire database sequentially. To recap, (I still hope there's a chance for a workable answer) here's a simple contrived scenario Incidents Table --- total integer, -- total number of human:critter encounters statecode char(2), -- by state outcome char(20) -- outcome of incident {fatal, serious, minor, close call} Wildlife Table -- the Partitioned table. Uses state_pt to separate data tables. table 1: state_pt='AK' table 2: state_pt='AL' ... table 50: state_pt='WY' - lions integer, -- total population of lions in the state tigers integer, -- total tigers bears integer, -- total bears outcome char(20),-- outcome of incident {fatal, serious, minor, close call} ... more attributes... state_pt -- key index that separates the sub tables inthe partitioned datbase Some incidents table data: total | statecode | outcome --+--+- 7 | CA | fatal 12 | CA | close call 3 | CO | fatal 16 | CO | close call 4 | WY | minor If I do the following it finds the data and goes directly to the right table. SELECT lions, tigers, bears FROM Wildlife WHERE state_pt = 'CO'; Or this: \set theState '\''CO'\'' SELECT lions, tigers, bears FROM Wildlife WHERE state_pt = :theState ; It works perfectly. But when I tried this: PREPARE foo (text) AS SELECT lions, tigers, bears FROM Wildlife WHERE state_pt = $1; EXECUTE foo( 'CO' ) It sequentially scanned everything, so I didn't even try the scenario I'm trying to create, which is using the results of a select to supply the key code to the Planner as to what partition it should search for reach record. SELECT lions, tigers , bears FROM Wildlife a JOIN incidents b ON a.outcome=b.outcome WHERE state_pt=statecode; Apparently, the Planner is pretty fussy about how you refer to one of your subtables, but I almost always use queries that involve multiple states,and the knowledge of what state table a particular record needs to access is carried as one of that record's attributes. I'm surprised that there's so little info on this. I tseems like an obvious application of partitioned databases, but maybe I just don't understand how it should be used? It seems that the only variables it likes are host variables, and there maybe something down that road... Anywya the quuestion is still open if anyone has some good ideas. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Partitioned Database and Choosing Subtables
On 3/16/2011 12:40 AM, Alban Hertroys wrote: Try using "dynamic" sql: EXECUTE 'SELECT lions, tigers, bears, statecode FROM WildLife WHERE state_pt = ' || statecode INTO ...; Thanks Igor. It was a nice try -- and I thought it would work, but the Planner had other plans. Basically, I tried the interactive method using a PREPARE statement with one text parammeter followed by an EXECUTE statement. Unfortunately, the query still went rummaging across the entire database sequentially. That's not the same as using dynamic SQL. In fact, that would behave just like your previous, problematic, query. Dynamic SQL makes that the planner sees a new query each time, that needs to be planned from scratch. Thanks for explaining this. I think I can just see what you mean. I can also see that I'm way out of my depth here. It's just so much fun when you mix deadlines with ignorance. If you can't see the forest for the trees, cut the trees and you'll see there is no forest. Good point. I'm feeling a bit chainsaw-ish right now. -- *Bill Thoen* GISnet - www.gisnet.com 303-786-9961 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Importing undelimited files (Flat Files or Fixed-Length records)
I've got to load some large fixed-legnth ASCII records into PG and I was wondering how this is done. The Copy command looks like it works only with delimited files, and I would hate to have to convert these files to INSERT-type SQL to run them through psql.. Is there a way one can specify a table structure with raw field widths and then just pass it a flat file? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Abnormal termination of PostgreSQL
I recently installed Fedora Core 9 and included PostgreSQL, built form source. I also built PostGIS, GEOS and GDAL to work with it. I had no problems compiling and installing PostgreSQL, but I did have some problems building GEOS and GDAL, although I think I've resolved those (FC9 comes with gcc 4.3.0 which has had some changes that affect building older source.) I'm using the latest versoin of everything. PostgreSQL is ver 8.3.3. Anyway, I created a database and "spatialized" it. Then created and populated a table of about 1.7 million records that have no spatial component. Then I ran an SQL update command to convert some double precision columns to 1/100 and 1/10 of their original values. During this update, about ten minutes into the operation (about when I expected it would be done) the postgres server shuts down suddenly. It reports this message: vfm=> \i update.sql psql:update.sql:1: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:update.sql:1: connection to server was lost Nothing very helpful is reported in the serverlog file except the shutdown message. In /var/log/messages I get: Jun 21 10:42:57 sandbox kernel: postmaster[2534]: segfault at b8cfeabf ip 08081357 sp bfb113c0 error 4 in postgres[8048000+335000] I tried changing the SQL to only updating one field (e.g. 'update corn0716 set average_yield = average_yield / 100 ;') and it also failed in the same manner in about the same time (~10 minutes). Does anyone know what happened and how I can fix it? - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Importing undelimited files (Flat Files or Fixed-Length records)
Patrick TJ McPhee wrote: In article <[EMAIL PROTECTED]>, Bill Thoen <[EMAIL PROTECTED]> wrote: % I've got to load some large fixed-legnth ASCII records into PG and I was % wondering how this is done. The Copy command looks like it works only % with delimited files, and I would hate to have to convert these files to % INSERT-type SQL to run them through psql.. Is there a way one can % specify a table structure with raw field widths and then just pass it a % flat file? pg_loader is supposed to handle this. http://pgfoundry.org/projects/pgloader Thanks, but as it turned out I also had to skip blank lines, so I wrote a short sed script and piped the data through that into COPY. That worked just fine. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Advice Wanted on Selecting Multi-row Data Requests in 10-Row Blocks
What I want to do is present the results of a query in a web page, but only 10 rows at a time. My PostgreSQL table has millions of records and if I don't add a LIMIT 10 to the SQL selection, the request can take too long. The worst case scenario is when the user requests all records without adding filtering conditions (e.g. SELECT * FROM MyTable;) That can take 10-15 minutes, which won't work on a web application. What I'm wondering is how in PostgreSQL do you select only the first 10 records from a selection, then the next 10, then the next, and possibly go back to a previous 10? Or do you do the full selection into a temporary table once, adding a row number to the columns and then performing sub-selects on that temporary table using the row id? Or do you run the query with Limit 10 set and then run another copy with no limit into a temporary table while you let the user gaze thoughtfully at the first ten records? I know how to get records form the database into a web page, and I know how to sense user actions (PageDown, PageUp, etc.) so I'm basically looking for techniques to extract the data quickly. Also, if this isn't the best forum to ask this sort of question, I'd appreciate being pointed to a more appropriate one. TIA, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Advice Wanted on Selecting Multi-row Data Requests in 10-Row Blocks
Thanks for tip on OFFSET. That's just what I needed. It's so easy when you know the command you're looking for, and so hard when you know what you want to do but don't know what the command is called! Thanks, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Accessing a database form another database
I've got some GIS tables that I want to use across several projects and currently I've got each project in a separate database. So say I've got two databases, one for the Spacely Sprookets project and one for the Cogswell Cogs contract, but I want to access my WorldMap tables in my GIS database to make maps for both of these projects. How do I reference a table that's in another database? Or should I organize my PostgreSQL data differently? Thanks, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] When to use Vacuum?
I'm putting some large read-only data sets together and occasionally I change my mind about something, drop a table or two and replace them with something different. Do I need to use VACUUM when I drop or re-arrange tables? Or does PG release disk space when you drop tables? And finally, if the tables are strictly read-only, then do I need to use VACUUM at all on a regular basis? (Like, do selections take up disk space that needs to be cleaned up on a regular basis? It's my understanding that VACUUM only clears or marks for re-use deleted records, but the docs say nothing about deleted tables and I don't want to get into an intolerable situation someday (like disk full.) TIA, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to Declare Functions Containing OUT PArameters?
I'm having some difficulty getting plpgsql to recognize a function with a couple of OUT parameters. I'm either declaring the function incorrectly, making the call to it in the wrong way or my program is simply possessed by evil spirits. I'm using Postgres 8.1.5. What appears to be happening is that it's declaring the function as if it returned a record and had only two parameters, but I'm trying to call it with four parameters, with two of them being OUT parameters. So the compiler sees two different versions of the function and refused to do anything more. The example below shows the problem, but it's just something to exercise the function calls and generate the error. Can anyone spot the screw-up in this little example? (the error message is listed below in the block comment) TIA, -Bill Thoen CREATE OR REPLACE FUNCTION fishy( s1 text, s2 text, OUT n integer, OUT f real ) AS $$ DECLARE c integer; BEGIN c := length( s1 ); n := length( s1 || s2 ); f := c::real / n::real; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION main() RETURNS VOID AS $$ DECLARE str1 text; str2 text; num integer := 0; fnum real := 0.0; BEGIN str1 := 'One fish, two fish'; str2 := 'Shark fish, No fish'; SELECT fishy( str1, str2, num, fnum) ; RAISE NOTICE 'fishy() analysis: % %', num, fnum; END; $$ LANGUAGE plpgsql; SELECT main(); /* ERROR MESSAGE psql:ex_out_fail.sql:28: ERROR: function fishy(text, text, integer, real) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts. CONTEXT: SQL statement "SELECT fishy( $1 , $2 , $3 , $4 )" PL/pgSQL function "main" line 9 at SQL statement And when I run \df from the pgsql command line, it shows up like this: | fishy| record| text, text */ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to Declare Functions Containing OUT PArameters?
Thanks guys. I think I see now. I was thinking it was a more transparent pass-by-value / pass-by-reference thing. Anyway I solved my problem by going back into my comfort zone and explicitly return a record and I'm not using OUT parameters. They're aren't what I thought they were and I'm working on a tight schedule, so I don't have much toim m eto explre. The compiler seems happier without them, and when it's happy, I'm happy. Tom Lane wrote: "Joshua D. Drake" writes: On Wed, 2010-07-14 at 12:21 -0600, Bill Thoen wrote: I'm having some difficulty getting plpgsql to recognize a function with a couple of OUT parameters. psql:ex_out_fail.sql:28: ERROR: function fishy(text, text, integer, real) does not exist You are passing four IN paramaters. The out paramaters are used in return are they not? You'd need to do something like SELECT * INTO num, fnum FROM fishy(str1, str2) ; OUT parameters in PG are just syntactic sugar for defining a record-returning function. The call syntax still works as if you'd written CREATE FUNCTION foo (IN-parameters-only) RETURNS some-record-type. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: What to do if space for database decrease and no additional hard drive is possible?
A. Kretschmer wrote: In response to Ivan Voras : * buy external storage (NAS, or even an external USB drive), move the database to it buy external USB-Drive, and create a new tablespace, and move some large table into this new tablespace and/or use the new tablespace for new tables. You can also use table-partitioning with different tablespaces. Can you then unmount that USB drive without causing any damage to the other databases? -- - Bill Thoen GISnet - www.gisnet.com 303-786-9961 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How Big is Too Big for Tables?
I'm building a national database of agricultural information and one of the layers is a bit more than a gigabyte per state. That's 1-2 million records per state, with a mult polygon geometry, and i've got about 40 states worth of data. I trying to store everything in a single PG table. What I'm concerned about is if I combine every state into one big table then will performance will be terrible, even with indexes? On the other hand, if I store the data in several smaller files, then if a user zooms in on a multi-state region, I've got to build or find a much more complicated way to query multiple files. So I'm wondering, should I be concerned with building a single national size table (possibly 80-100 Gb) for all these records, or should I keep the files smaller and hope there's something like ogrtindex out there for PG tables? what do you all recommend in this case? I just moved over to Postgres to handle big files, but I don't know its limits. With a background working with MS Access and bitter memories of what happens when you get near Access' two gigabyte database size limit, I'm a little nervous of these much bigger files. So I'd appreciate anyone's advice here. TIA, - Bill Thoen
Re: [GENERAL] Need magic for identifieing double adresses
> On 9/16/2010 5:18 AM, Sam Mason wrote: >> On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote: >>> I need to clean up a lot of contact data because of a merge of customer >>> lists that used to be kept separate. >>> I allready know that there are double entries within the lists and they >>> do overlap, too. >>> >>> Relevant fields could be name, street, zip, city, phon > > Since your're working with words, a better "fuzzy" match algorithm is > Levenshtein distance. It's surprizingly good about coming up with an > index number for words (or even phrases). Calculate the Levenshtein > distance for two words and the closer the indexes are to each other the > more alike the words are. It's well-documented on Wikipedia (and other > places) The problem with using something like this is that it's awkward to do anything apart from the Cartesian product and compare everything with everything else. If you've got 500k rows to start with, that's a *lot* of comparisons to be doing. The nice thing about using something like soundex is that it allows you to use equality when comparing and hence PG has a chance to finish the query in a reasonable amount of time. If you're dealing with a small number of possibilities, I'd agree that some form of edit distance is a very useful tool. > But overall, you're basically normalizing an address list with people > names. So I first go through the lists with an eye to getting rid of the > junk (regexp is great for identifying names with "illeagal" characters) > and creating a starting name in uppercase. There's always at least one > data entry clerk (usually several) who do quirky things like include > notes in the last name field. Yup, normalisation is very important when dealing with this sort of thing. Reducing free form text down to something more constrained is important. > You really need to parse each word, > correct spelling and consider it in the local context, i.e. does it > "fit" where it is? (as a geocding routine would do with street names) > There are canonical forms for names and though these vary from country > to country, they make good templates for a name parser. I've tried doing something like this before, but tended to give up. I guess all the matching I've been doing has been small enough that it was quicker to handle the odd cases by hand than spending more time writing general purpose code that never seemed to be quite good enough. -- Sam http://samason.me.uk/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Problems with pg_dump and -t wildcards
I'm having some problems with pg_dump and the -t switch. I can't get it to work using wildcards nor with multiple -t switches. I'm using PostgreSQL 8.1.5 and trying to dump just a table and its associated sequence. As shown below, the table il_sections and the sequence il_sections_gid_seq both exist in the database spatial1. But neither pg_dump -Fc -t 'il_sections*' spatial1 > il_plss.dump nor pg_dump -Fc -t il_sections -t il_sections_gid_seq spatial1 > il_plss.dump will work. The former generates the error listed below while the latter just creates a tiny file with nothing but a couple of comments in it. Anyone know what's wrong? - Bill Thoen Here's what I get: $ psql -dspatial1 Welcome to psql 8.1.5, the PostgreSQL interactive terminal. ... spatial1=# \d List of relations Schema | Name | Type | Owner +---+--+-- public | US_States_ogc_fid_seq | sequence | bthoen public | counties | table| bthoen public | counties_gid_seq | sequence | bthoen public | geometry_columns | table| postgres public | il_sections | table| bthoen public | il_sections_gid_seq | sequence | bthoen public | spatial_ref_sys | table| postgres public | states| table| bthoen (8 rows) spatial1=# \q $ pg_dump -Fc -t 'il_section*' spatial1 > il_plss.dump pg_dump: specified table "il_section*" does not exist -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problems with pg_dump and -t wildcards
Ah ha! I'm reading the wrong docs. Duh... I guess it's upgrade time! Thank you. Tom Lane wrote: "Joshua D. Drake" <[EMAIL PROTECTED]> writes: On Fri, 2008-07-25 at 11:36 -0600, Bill Thoen wrote: I'm having some problems with pg_dump and the -t switch. I can't get it to work using wildcards nor with multiple -t switches. I'm using PostgreSQL 8.1.5 and trying to dump just a table and its associated sequence. That is because it doesn't work like that. IIRC you can dump exactly one table with the -t switch. -t does take a wildcard pattern ... in 8.2 and later. I don't believe 8.1 handled more than one -t switch, either. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Need Some Explanation of an EXPLAIN
I'm trying to update a field in a table that has about 5 millin records with a table that has about 3.5 million records. I've created indexes for th a joined columns, but PostgreSQL 8.1 doesn't seem to want to use them. This makes for a very slow update. Below are descriptions of the two tables followed by the query plan that my PostgreSQL wants to use. So I have two questions: 1.) Why won't it use the indexes? 2.) How can I make this update faster? TIA, Bill Thoen Table "public.id2" Column | Type | Modifiers ---+--+--- grower_id | integer | fmid | character(7) | fsa_id| character(9) | Indexes: "id2_fsa_is_key" UNIQUE, btree (fsa_id) Table "public.growers" Column| Type | Modifiers --+---+--- grower_id| integer | fsa_id | character(9) | co_name | character varying(45) | . . . Indexes: "grower_fsa_id_key" btree (fsa_id) fsa=# EXPLAIN UPDATE growers SET grower_id = id2.grower_id FROM id2 WHERE growers.fsa_id = id2.fsa_id; QUERY PLAN -- Hash Join (cost=70375.50..1603795.30 rows=6802720 width=355) Hash Cond: ("outer".fsa_id = "inner".fsa_id) -> Seq Scan on growers (cost=0.00..672373.20 rows=6802720 width=351) -> Hash (cost=46249.20..46249.20 rows=1966920 width=44) -> Seq Scan on id2 (cost=0.00..46249.20 rows=1966920 width=44) (5 rows) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why Does UPDATE Take So Long?
Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a very long time; 15 minutes so far and no end in sight. From the explain, it doesn't seem like it should take that long, and this column is not indexed. Sure, there's 2.7 million records but it only takes a few minutes to scan the whole file. Is there some special overhead I should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too. Or am I just expecting too much? Here's the explain: explain UPDATE farms SET prog_year='2007'; QUERY PLAN Seq Scan on farms (cost=0.00..59144.07 rows=2772207 width=54) (1 row) TIA, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Does UPDATE Take So Long?
Doesn't look like that's the problem. I moved my table over to another Linux box running PG 8.3 and update performance was pretty bad there as well. In the time that PG 8.3 was struggling with update there I created a copy of my table on my PG 8.1 machine and inserted all columns with one containing the altered values I wanted and that took less than two minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still thrashing away trying to update that one column that's not even part of any index.. Something is really wrong with UPDATE in PostgreSQL I think. Scott Marlowe wrote: On Tue, Sep 30, 2008 at 1:00 PM, Bill Thoen <[EMAIL PROTECTED]> wrote: Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a very long time; 15 minutes so far and no end in sight. From the explain, it doesn't seem like it should take that long, and this column is not indexed. Sure, there's 2.7 million records but it only takes a few minutes to scan the whole file. Is there some special overhead I should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too. Or am I just expecting too much? The problem is that on older versions of pgsql, the db had to update each index for each row updated as well as the rows. The latest version, with a low enough fill factor, can update non-indedexed fields by using the free space in each page and not have to hit the indexes. But on 8.1 you don't get that optimization. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Does UPDATE Take So Long?
Sorry for the hyperbole; I should have qualified that ridiculous statement with "...on my machines." No doubt the problem has something to do with configuration, because I don't know much about that. One of my machines is running PG 8.1 on Linux Fedora Core 5. It's got an AMD 64bit CPU with a GB RAM and plenty of normal disk space (not running RAID 5). The other machine is running Linux FC9 and PG 8.3. It's got a i686 cpu with a GB RAM and also not using RAID. Since I don't understand much about configuring PostgreSQL, both of these machines use the default PostgreSQL configuration. I figured that it was optimized for general use but maybe since my files are large-ish (in the low multi-million record ranges) mayb ethta doesn't qualify as general use. Anyway, here's the configuration settings you mentioned. Shared_buffers are = 1000 #checkpoint_segments = 3 #checkpoint_timeout = 300 #checkpoint_warning = 30 What should I be looking for in the configuration to improve UPDATE performance? Thanks, - Bill Thoen Bill Moran wrote: In response to Bill Thoen <[EMAIL PROTECTED]>: Doesn't look like that's the problem. I moved my table over to another Linux box running PG 8.3 and update performance was pretty bad there as well. In the time that PG 8.3 was struggling with update there I created a copy of my table on my PG 8.1 machine and inserted all columns with one containing the altered values I wanted and that took less than two minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still thrashing away trying to update that one column that's not even part of any index.. Something is really wrong with UPDATE in PostgreSQL I think. That's an interesting theory, although it's completely wrong and founded in ridiculosity. If something were "really wrong with UPDATE" in every version of PostgreSQL, you'd be reading about it on the mailing lists, and you won't. What I suspect is that the typical tuning advice applies here. I don't see any information about your configuration or your hardware setup. * What are shared_buffers set at? * What do the checkpoint configs look like? * In general, what does your postgresql.conf look like, how much tuning have you done? * What is your hardware setup? You're not running RAID 5 are you? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Does UPDATE Take So Long?
Alvaro Herrera wrote: Bill Moran wrote: What I suspect is that the typical tuning advice applies here. I don't see any information about your configuration or your hardware setup. * What are shared_buffers set at? * What do the checkpoint configs look like? * In general, what does your postgresql.conf look like, how much tuning have you done? * What is your hardware setup? You're not running RAID 5 are you? Also, how many indexes does this table have? Two, but the column I'm updating isn't included in either one of them. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why Does UPDATE Take So Long?
Many thanks to everyone who helped me with this. It'll be a while before I understand enough to be able to do a good job of tuning my system's configuration, but there seem to be a few basics I can apply right away. Also pointing out how UPDATE actually works was very helpful. Since I'm at the data building stage, most of my updates will apply to an entire column and in cases like that it's much more efficient to simply use joins into a new table and delete the old. In this case: CREATE TABLE farm2 (LIKE farms); INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT farm_id, fips_cd, farm_nbr, '2007' FROM farms; DROP TABLE farms; ALTER TABLE farm2 RENAME TO farms; CREATE UNIQUE INDEX farms_id_key ON farms(farm_id); CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr); takes only a few minutes for this 2.77 million record table. The alternative UPDATE farms SET prog_year='2007'; takes hours! I don't know how many because I gave up after waiting for 1.5 hrs. Thanks all, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Question About UNION
I'm trying to combine two tables, but I only want unique records based on the first two columns. Can UNION be used to join three-column tables but only include records based on the uniqueness of the first two columns? If not, how would I do this with PostgreSQL 8.1? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question About UNION
Raymond O'Donnell wrote: On 09/10/2008 17:36, Bill Thoen wrote: I'm trying to combine two tables, but I only want unique records based on the first two columns. Can UNION be used to join three-column tables but only include records based on the uniqueness of the first two columns? If not, how would I do this with PostgreSQL 8.1? How do you decide which records you want? - e.g. given the following rows... (a, b, c) (a, b, d) ...how do you decide whether you the one with c or the one with d? The physical order that they appear will take care of that. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question About UNION
Raymond O'Donnell wrote: On 09/10/2008 17:59, Bill Thoen wrote: I'm trying to combine two tables, but I only want unique records based on the first two columns. Can UNION be used to join three-column tables but only include records based on the uniqueness of the first two columns? If not, how would I do this with PostgreSQL 8.1? How do you decide which records you want? - e.g. given the following rows... (a, b, c) (a, b, d) ...how do you decide whether you the one with c or the one with d? The physical order that they appear will take care of that. H Leaving that aside, how about SELECTing the columns you want to be unique from the two tables, and then JOINing the UNION of those back with the UNION of the complete (three-column) tables? Say, that just might work! I'll give it a try. Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question About UNION
Raymond O'Donnell wrote: On 09/10/2008 17:59, Bill Thoen wrote: I'm trying to combine two tables, but I only want unique records based on the first two columns. Can UNION be used to join three-column tables but only include records based on the uniqueness of the first two columns? If not, how would I do this with PostgreSQL 8.1? How do you decide which records you want? - e.g. given the following rows... (a, b, c) (a, b, d) ...how do you decide whether you the one with c or the one with d? The physical order that they appear will take care of that. H Leaving that aside, how about SELECTing the columns you want to be unique from the two tables, and then JOINing the UNION of those back with the UNION of the complete (three-column) tables? No, this won't work. Here's an example of my tables: Table1 1, 12, A 2, 16, B 8, 6, A 19, 9, C Table2 1, 13, D 2, 16, B 8, 6, B 12, 5, A A simple UNION will remove the duplicate row 2, 16, B, but it won't block row 8, 6, B in table 2 from being included in the output. What I want is for records in table 1 to take precedence and for the output records to be unique based only on the first two columns. In other words, I want this output: Table3 1, 12, A 2, 16, B 8, 6, A 19, 9, C 1, 13, D 12, 5, A Trying your suggestion: create table tmp as select col1, col2 from table1 union select col1, col2 from table2; create table tmp2 as select * from table1 union select * from table2; I'll get: tmp 1, 12 2, 16 8, 6 19, 9 1, 13 12, 5 tmp2 1, 12, A 2, 16, B 8, 6, A 19, 9, C 1, 13, D 8, 6, B 12, 5, A I now have two rows with the same first two columns (8,6,A and 8,6,B and if I try to JOIN tmp and tmp2 I'm going to get duplicates on the 8,6 key. Or am I misunderstanding what you suggested? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question About UNION
David Wilson wrote: On Thu, Oct 9, 2008 at 1:48 PM, Bill Thoen <[EMAIL PROTECTED]> wrote: No, this won't work. Here's an example of my tables: Table1 1, 12, A 2, 16, B 8, 6, A 19, 9, C Table2 1, 13, D 2, 16, B 8, 6, B 12, 5, A select * from table1 union select table2.* from table2 left join table1 on table2.a=table1.a and table2.b=table1.b where table2.a is null; (Written in gmail, but you should get the basic idea.) Thanks, but that didn't work. That selected only the records from table1. However, this did work: CREATE TABLE table3 AS SELECT * FROM table1; CREATE UNIQUE INDEX table3_pk ON table3 (a, b); INSERT INTO table3 SELECT * FROM table2 WHERE NOT EXISTS (SELECT a, b FROM table1 t1 WHERE table2.a=t1.a AND table2.b=t1.b); -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question About UNION
David Wilson wrote: On Thu, Oct 9, 2008 at 3:31 PM, Bill Thoen <[EMAIL PROTECTED]> wrote: Thanks, but that didn't work. That selected only the records from table1. That's why I warned you about it being written in gmail. :) I'm sorry, you had it right the first time. Here's a script that verifies it: create table table1 ( a integer, b integer, c char(1) ); insert into table1 values ( 1, 12, 'A'); insert into table1 values ( 2, 16, 'B'); insert into table1 values ( 8, 6, 'A'); insert into table1 values (19, 9, 'C'); insert into table1 values (20, 12, 'A'); create table table2 ( a integer, b integer, c char(1) ); insert into table2 values ( 1, 13, 'D'); insert into table2 values ( 2, 16, 'B'); insert into table2 values ( 8, 6, 'B'); insert into table2 values (12, 5, 'A'); select * from table1 union select table2.* from table2 left join table1 on table2.a=table1.a and table2.b=table1.b where table2.a is null; drop table table1; drop table table2; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Question About UNION
Josh Williams wrote: On Thu, 2008-10-09 at 10:59 -0600, Bill Thoen wrote: I'm trying to combine two tables, but I only want unique records based on the first two columns. Can UNION be used to join three-column tables but only include records based on the uniqueness of the first two columns? If not, how would I do this with PostgreSQL 8.1? How do you decide which records you want? - e.g. given the following rows... (a, b, c) (a, b, d) ...how do you decide whether you the one with c or the one with d? The physical order that they appear will take care of that. If the rest of the columns don't matter, how about: SELECT DISTINCT ON(col1, col2) * FROM ( SELECT col1, col2, col3 FROM table1 UNION SELECT col1, col2, col3 FROM table2 ORDER BY col1, col2 ) AS uniontable; - Josh Williams Great! This works too. And to correct my last post, David Wilson had it right the SECOND time. I've been wrestling with this stupid problem all morning and now my mind is so gone I don't even trust whether I can get 'SELECT * FROM table1;' to work! Thanks for the help once again everybody!!! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Bug In COPY TO?
I noticed that in PG 8.1 the docs for COPY TO say that you can use WITH HEADER, but when I do that it throws an error saying that this works only in CSV mode. Is this a bug or do I have something wrong with my syntax? COPY mytable TO '/data/out.txt' WITH HEADER DELIMITER '|' NULL '' ; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Update with a Repeating Sequence
I've got a table with repeated records that I want to make unique by adding a sequence code of 0,1,2,...,n for each set of repeated records. Basically, I want to turn: field_id | seq --+- 1 | 0 2 | 0 3 | 0 3 | 0 3 | 0 4 | 0 4 | 0 5 | 0 6 | 0 into: field_id | seq --+- 1 | 0 2 | 0 3 | 0 3 | 1 3 | 2 4 | 0 4 | 1 5 | 0 6 | 0 What's the best way to that? TIA, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Update with a Repeating Sequence
The table exists already; all I need to do is update the sequence code to make the records unique, but also I need each repeating set numbered from 0 (Zero) so I can select a list of unique farm field records where seq = 0. I think that the suggestion to use a cursor sounds good, but I'm concerned that according to the PG 8.1 documentation, update cursors are not supported. However, with a read only cursor can I FETCH a row, read the field variables in it and update one or two of them and have that change posted back into that record to update the table? I'm not at all familiar with PostgreSQL cursors yet, so any help on the syntax would be welcome too. An example script or function showing how to step through all records and updating just one or two column values in each row would be appreciated if someone could point me to it. Again here's the table structure for my small example: create table farm_fields ( field_id integer, seq integer ); And I want to convert something like this: to this: field_id | seq field_id | seq -+--+- 34556 | 0 34556 | 0 34556 | 0 34556 | 1 34556 | 0 34556 | 2 37000 | 0 37000 | 0 37002 | 0 37002 | 0 37002 | 0 37002 | 1 37005 | 0 37005 | 0 Webb Sprague wrote: Untested ideas (beware): Use an insert trigger that: curr_seq := select max(seq) from foo where field_id = NEW.field_id if curr_seq is null then NEW.seq := 0 else NEW.seq := curr_seq + 1 (You have to figure out how to build the trigger infrastructure...) If you need to do it on a table that is already populated, let us know. On Tue, Oct 14, 2008 at 9:21 AM, Bill Thoen <[EMAIL PROTECTED]> wrote: Grzegorz Jas'kiewicz wrote: alter table foo add newid sequencial; alter table foo drop field_id; alter table foo rename newid to field_id; I can't do that; I need to preserve the field_id values. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Update with a Repeating Sequence
Steve Atkins wrote: On Oct 14, 2008, at 9:04 AM, Bill Thoen wrote: I've got a table with repeated records that I want to make unique by adding a sequence code of 0,1,2,...,n for each set of repeated records. Basically, I want to turn: field_id | seq --+- 1 | 0 2 | 0 3 | 0 3 | 0 3 | 0 4 | 0 4 | 0 5 | 0 6 | 0 into: field_id | seq --+- 1 | 0 2 | 0 3 | 0 3 | 1 3 | 2 4 | 0 4 | 1 5 | 0 6 | 0 What's the best way to that? This is mildly tricky to do, and hard to maintain. In most cases where people say they need this, they're actually perfectly happy with the seq value being enough to make the row unique, and ideally increasing in order of something such as insertion time ... Thanks, but in this case I really need both unique records and a repeated sequence so I can select the first occurrence of each record (i.e. WHERE seq =0) and sometimes I need the max(seq) for particular records. Since this is a read-only table, maintaining it is not a problem. Anyway, I did mange to solve it. Here's a little test script that shows how: CREATE TABLE test ( field_id integer, seq integer ); INSERT INTO test VALUES (1, 0); INSERT INTO test VALUES (2, 0); INSERT INTO test VALUES (3, 0); INSERT INTO test VALUES (3, 0); INSERT INTO test VALUES (3, 0); INSERT INTO test VALUES (4, 0); INSERT INTO test VALUES (4, 0); INSERT INTO test VALUES (5, 0); INSERT INTO test VALUES (6, 0); -- Create table to hold static variables CREATE TABLE tmp (last_id integer, cnt integer); INSERT INTO tmp VALUES(0,0); -- Function to fill in repeated sequence CREATE OR REPLACE FUNCTION test_it (field_id integer) RETURNS integer AS $$ DECLARE r tmp%ROWTYPE; nLast_id integer; nCnt integer; BEGIN SELECT * INTO r FROM tmp; nLast_id = r.last_id; nCnt = r.cnt; IF field_id = nLast_id THEN nCnt = nCnt + 1; ELSE nCnt = 0; nLast_id = field_id; END IF; UPDATE tmp SET last_id=nLast_id, cnt=nCnt; RETURN nCnt; END; $$ LANGUAGE plpgsql; -- Fill in repeated sequence UPDATE test SET seq=test_it (field_id); -- Show results SELECT * FROM test ORDER BY field_id, seq; -- Clean up DROP FUNCTION test_it(integer); DROP TABLE tmp; DROP TABLE test; -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Update with a Repeating Sequence
Grzegorz Jas'kiewicz wrote: alter table foo add newid sequencial; alter table foo drop field_id; alter table foo rename newid to field_id; I can't do that; I need to preserve the field_id values. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] UPDATE and Indexes and Performance
Does PG (8.1) ever use existing indexes when executing an UPDATE? I've got some tables with millions of records and whenever I update a column that involves most or all the records the EXPLAIN command seems to indicate that it isn't using the pre-existing indexes. This result in a slow update, which is further slowed by the presence of indexes. So when doing a large update should I just drop the indexes first, or is there some good reason to keep them? Thanks, - Bill Thoen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How Do I Find the Date When A Table Was Last Changed?
Is there a way to find when data in a postgresql table was last changed? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Need Help With a A Simple Query That's Not So Simple
I think this should be easy, but I can't seem to put the SQL together correctly and would appreciate any help. (I'm using Pg 8.4 in CentOS 5.5, if that matters.) I have a table of Farms and a table of crops in a 1:M relationship of Farms : Crops. There are lots of different crops to choose form but for now I'm only interested in two crops; corn and soybeans. Some farms grow only corn and some grow only soybeans, and some grow both. What I'd like to know is, which Farms and how many are growing only corn, which and how many are growing soybeans and which and how many are growing both? I can easily get all the corn growers with: SELECT a.* FROM farms a JOIN crops b ON a.farm_id=b.farm_id WHERE crop_cd='0041' I can do the same with soybeans (crop_cd= '0081') and then I could subtract the sum of these from the total of all farms that grow either corn or soybeans to get the number of farms growing both, but having to do all those queries sounds very time consuming and inefficient. Is there a better way to get the farm counts or data by categories like farms growing only corn, farms growing only soybeans, farms growing both? I'm also interested in possibly expanding to a general case where I could select more than two crops. and get counts of the permutations. Here's a sketch of the relevant pieces of the data base. *Tables:* farms crops === === farm_id bigint (pkey) crop_id (pkey) type farm_idforeign key to farms size crop_cd0041 = corn 0081=soybeans ...year ... Any help would be much appreciated. TIA, - Bill Thoen
Re: [GENERAL] Need Help With a A Simple Query That's Not So Simple
On 10/31/2011 5:05 PM, David Johnston wrote: From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bill Thoen Sent: Monday, October 31, 2011 6:51 PM To: Postgrresql Subject: [GENERAL] Need Help With a A Simple Query That's Not So Simple [...] What I'd like to know is, which Farms and how many are growing only corn, which and how many are growing soybeans and which and how many are growing both? [...] Is there a better way to get the farm counts or data by categories like farms growing only corn, farms growing only soybeans, farms growing both? I'm also interested in possibly expanding to a general case where I could select more than two crops. and get counts of the permutations. [...] --- General Idea: WITH crop_one AS ( SELECT farm_id, crop_cd AS crop_one_cd ... ), crop_two AS ( SELECT farm_id, crop_cd AS crop_two_cd ) SELECT * FROM crop_one FULL OUTER JOIN crop_two USING (farm_id) ; Records with NULL for "crop_one_cd" only grow crop 2, records with NULL for "crop_two_cd" only grow crop 1, records where neither field is NULL grow both. Not sure regarding the general case. You likely want to use ARRAY_AGG to get a result like: Farm_id_100, { 'CROP_CD_1', 'CROP_CD_2' } You could then probably get a query to output something like: (crop_id, farms_exclusive, farms_shared, farms_without) Where each of the "farms_" columns is an array of farm_ids that match the particular conditional = ALL (exclusive); != ALL&& = ANY (shared); != ANY (without) David J. Thanks David! That worked great! When I filled in the the query from the "general idea" in your example above like so: WITH crop_one AS ( SELECT farm_id, crop_cd AS corn FROM gfc_inilmoidia_2007 WHERE crop_cd ='0041' ), crop_two AS ( SELECT farm_id, crop_cd AS soybeans FROM gfc_inilmoidia_2007 WHERE crop_cd = '0081' ) SELECT * FROM crop_one FULL OUTER JOIN crop_two USING (farm_id) ; It produced the following (which is essentially the base of what I'm looking for): farm_id | corn | soybeans -+--+-- 1473 | 0041 | 0081 1474 | 0041 | 0081 1474 | 0041 | 0081 1474 | 0041 | 0081 1474 | 0041 | 0081 1475 | 0041 | 1475 | 0041 | 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1476 | 0041 | 0081 1477 | 0041 | 1478 | 0041 | 0081 1479 | 0041 | 1480 | | 0081 1480 | | 0081 Thanks so much for the quick reply. You've also just opened up a whole new area of query possibilities for me of which I wasn't aware - Bill Thoen
Re: [GENERAL] Re: [pgsql-general] Need Help With a A Simple Query That's Not So Simple
Thanks! Half the problem searching the 'Net for answers is knowing what it's called. Regards, Bill Thoen GISnet http://gisnet.com 303-786-9961 On Nov 1, 2011, at 10:01 AM, "James B. Byrne" wrote: > >> Date: Mon, 31 Oct 2011 16:51:02 -0600 >> From: Bill Thoen >> To: Postgrresql >> Subject: Need Help With a A Simple Query That's Not So >> Simple >> Message-ID: <4eaf2656.6020...@gisnet.com> >> >> I think this should be easy, but I can't seem to put the >> SQL together correctly and would appreciate any help. >> (I'm using Pg 8.4 in CentOS 5.5, if that matters.) >> >> I have a table of Farms and a table of crops in a 1:M >> relationship of Farms : Crops. There are lots of >> different crops to choose form but for now I'm only >> interested in two crops; corn and soybeans. >> >> Some farms grow only corn and some grow only soybeans, >> and some grow both. What I'd like to know is, which >> Farms and how many are growing only corn, which and >> how many are growing soybeans and which and how many are >> growing both? I can easily get all the corn growers with: >> >> SELECT a.* >> FROM farms a >> JOIN crops b >> ON a.farm_id=b.farm_id >> WHERE crop_cd='0041' >> >> I can do the same with soybeans (crop_cd= '0081') and >> then I could subtract the sum of these from the total >> of all farms that grow either corn or soybeans to get >> the number of farms growing both, but having to >> do all those queries sounds very time consuming and >> inefficient. Is there a better way to get the farm >> counts or data by categories like farms growing only >> corn, farms growing only soybeans, farms growing >> both? I'm also interested in possibly expanding to a >> general case where I could select more than two crops. >> and get counts of the permutations. >> >> Here's a sketch of the relevant pieces of the data base. >> >> *Tables:* >> farms crops >> === === >> farm_id bigint (pkey) crop_id (pkey) >> type farm_idforeign key to farms >> size crop_cd0041 = corn 0081=soybeans >> ...year >> ... >> >> Any help would be much appreciated. >> >> TIA, >> >> - Bill Thoen > > I believe that what you are trying to do is called > relational algebra division. Take a look at these > references and see if either fits your needs: > > http://en.wikipedia.org/wiki/Relational_algebra#Division_.28.C3.B7.29 > > http://www.cs.arizona.edu/~mccann/research/divpresentation.pdf > > > > > > -- > *** E-Mail is NOT a SECURE channel *** > James B. Byrnemailto:byrn...@harte-lyne.ca > Harte & Lyne Limited http://www.harte-lyne.ca > 9 Brockley Drive vox: +1 905 561 1241 > Hamilton, Ontario fax: +1 905 561 0757 > Canada L8E 3C3 > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Using the internal data dictionary
I need to assemble a complete data dictionary for project documentation and other purposes and I was wondering about the pros and cons of using the pg_catalog metadata. But I hesitate to poke around in here because I don't know why it's kept so out of sight and not much documented. But it seems like an ideal source of information to tap with a program to generate accurate, current reports of what's in the database. Is this a bad idea (everything I'm thinking of doing would be read only except for the description fields) but I'd just like to make sure that there's not some innocent looking table in there that acts as a doomsday device if you so much as read its first record, etc. I'm just not sure why this isn't more widely used or talked about. Regards, Bill Thoen GISnet http://gisnet.com 303-786-9961 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Using the internal data dictionary
Thanks, guys! I'll take a closer look at the information_schema and pgAdmin and Maestro. Reinventing the wheel isn't a problem as this job is not critical, but the educational experience in looking at the system from another POV may be the bigger prize. - Bill On 11/17/2011 8:34 PM, David Johnston wrote: On Nov 17, 2011, at 22:17, Bill Thoen wrote: I need to assemble a complete data dictionary for project documentation and other purposes and I was wondering about the pros and cons of using the pg_catalog metadata. But I hesitate to poke around in here because I don't know why it's kept so out of sight and not much documented. But it seems like an ideal source of information to tap with a program to generate accurate, current reports of what's in the database. Is this a bad idea (everything I'm thinking of doing would be read only except for the description fields) but I'd just like to make sure that there's not some innocent looking table in there that acts as a doomsday device if you so much as read its first record, etc. I'm just not sure why this isn't more widely used or talked about. Regards, Bill Thoen GISnet http://gisnet.com 303-786-9961 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general information_schema is the more standardized point of entry into the database meta-data, catalog is generally intended for internals use and thus has a less stable API contract. That said, you are reinventing the wheel if you are looking for a straight dump of the current reality. Various third-party tools already do this. I've used, but am not affiliated with, PostgreSQL Maestro. Also, pgAdmin, I think, provides access to this information as well (as does psql via it's various commands). You should never directly update the catalog but instead use the appropriate SQL command. For descriptions you need to use "COMMENT ON". Reading it should never cause a problem. David J.
[GENERAL] Permission Problems
I'm trying to put together a very simple web application to display information about any table in my database that a web user wants to see. The general idea is to present a list of schemata and their associated tables to the user who then picks one, which causes the server to send the list of fields in that table. Pretty simple. My thought was to use an internal, low-privileged account to do the look-up and pass the requested info back to the client, but I think I'm running into permission problems and I don't know where to add all the "GRANTS" so that the go-between account has enough privilege to do its job, but not more. I'm getting the list of schemata from the information_schema.schemata table, and using my superuser account it works fine. However, using the account I've set up for this job isn't getting very far and I'm getting nothing returned. I've granted permissions for SELECT and REFERENCES on all of my tables, and granted USEAGE on the schemas including information_schema and pg_catalog and the relevant views; I've even granted execute priv on the functions used in the view, but I'm still not getting results for this psuedo user, even though the SQL selection works fine for my account. I'd prefer not to just hand out a superuser privilege to the database go-between because my purpose is to keep this db-web interface role's reach short. Am I digging too deep here, or what am I missing? Is there a better way to tranfer info between my database and the web than by using a generic account? It sure seems like I'm granting too much access to too little a player. Any advice would be welcome. TIA, - Bill Thoen