Re: [GENERAL] Adding more space, and a vacuum question.
On 30/01/2011, at 13:03, Alban Hertroys wrote: > On 28 Jan 2011, at 22:12, Herouth Maoz wrote: > >> 2. That database has a few really huge tables. I think they are not being >> automatically vacuumed properly. In the past few days I've noticed a vacuum >> process on one of them which has been running since January 14th. >> Unfortunately, it never finished, because we were informed of a scheduled >> power down in our building yesterday, and had to shut down the machine. The >> questions are: >> >> a. Is it normal for vacuum processes to take two weeks? > > For a 200M record table that's definitely on the long side. It was probably > waiting on a lock by another transaction. In most cases that means that some > transaction was kept open for that duration. > If that transaction came into existence by accident, then vacuum should be > fine now that the server has restarted - that transaction is gone now. You > may want to keep an eye out for long-running transactions though, that's > usually a programming error - it's sometimes done deliberately, but it's > still a bad idea from the point of the database. Unless my eyes were deceiving me, this was not the case. Sure, there have been heavy transactions during that time (e.g. the daily backup of the database, and the daily inserts into other tables, which take a long time, and a few selects which I haven't been able to find an optimal index for). But this is the query I use to see these processes (ran from a superuser): SELECT usename, procpid, query_start, client_addr, client_port, current_query,waiting FROM pg_stat_activity WHERE query_start < now() - interval '3 seconds' AND xact_start is not null order by xact_start Any long transactions should be caught by it, but most of the time, all I see are vacuum workers. By the way, the auto vacuum on that table has started again - but only after more records were deleted from it. It has now been running since yesterday at 17:00. Here is the pg_stat_user_tables record for this table (which has also updated after the deletes): relid| 17806 schemaname | sms relname | billing__archive seq_scan | 9 seq_tup_read | 2053780855 idx_scan | 2553 idx_tup_fetch| 8052678 n_tup_ins| 11437874 n_tup_upd| 0 n_tup_del| 7987450 n_tup_hot_upd| 0 n_live_tup | 218890768 n_dead_tup | 33710378 last_vacuum | last_autovacuum | last_analyze | 2011-01-29 15:29:37.059176+02 last_autoanalyze | > > In older PG versions autovacuum could get stuck like that on large tables. It > keeps starting over trying to vacuum that same table, but never reaches the > end of it. Since it's only a single worker process (in those versions), it > also will never vacuum any tables beyond the table it got stuck on. How old? Mine is 8.3.11. > > If you don't delete or update tuples a lot, then the tables are probably just > that big. If you do delete/update them regularly, try if a normal vacuum will > shrink them enough (probably not) and if not, schedule a VACUUM FULL and a > REINDEX at some time the database isn't too busy. Both are quite heavy > operations that take exclusive locks on things (tables, indices). Yes, I do delete many tuples from that table. My mode of usage is like this: I have a small table called billing which receives new data every night. I want to keep that table small so that those nightly updates don't take an overly long time, because all data (several such tables) has to be ready in the database by the next morning. Therefore, once a week on the weekend, I move a week's worth of data to billing__archive (the table we are discussing), and delete a week's worth from its end. Now, the indexes on that table would make this impossible to do within the weekend, so what I do is drop all the indexes before I do the inserts, and then recreate them, and then do the deletes. What you are saying is that in this mode of operation, there's basically no hope that autovacuum will ever salvage the deleted records? Does removing and recreating the indexes have any effect on the vacuuming process? If a vacuum takes me several days (let alone over a week!) than a VACUUM FULL is out of the question. VACUUM FULL locks the table completely and that table is essential to our customer care. If push comes to shove, I think I'd rather dump that table, drop it, and restore it over the weekend, which I believe will be faster than a VACUUM FULL. One other important question: a tuple marked by VACUUM as reusable (not VACUUM FULL which restores it to the operating system) - can its space ever be used by another table, or can it only be used for new inserts into the same table? > >> d. After restarting the server, all the data in pg_stat_user_tables seem to >> have been reset. What does this mean and how does this affect vacuum >> scheduling? > > I recall reading somewhere that that's normal; probably this
Re: [GENERAL] Adding more space, and a vacuum question.
On Sun, Jan 30, 2011 at 04:56:29PM +0200, Herouth Maoz wrote: > > Unless my eyes were deceiving me, this was not the case. Sure, there have > been heavy transactions during that time (e.g. the daily backup of the > database, and the daily inserts into other tables, which take a long time, > and a few selects which I haven't been able to find an optimal index for). > But this is the query I use to see these processes (ran from a superuser): > > SELECT usename, procpid, query_start, client_addr, client_port, > current_query,waiting > FROM pg_stat_activity > WHERE query_start < now() - interval '3 seconds' > AND xact_start is not null order by xact_start > > Any long transactions should be caught by it, but most of the time, all I see > are vacuum workers. Well, what's your I/O on the disk? Have you tuned vacuum? Maybe you're just saturating the ability of the table to be vacuumed, or else vacuum is being told to back off? > Yes, I do delete many tuples from that table. My mode of usage is > like this: I have a small table called billing which receives new > data every night. I want to keep that table small so that those > nightly updates don't take an overly long time, because all data > (several such tables) has to be ready in the database by the next > morning. Therefore, once a week on the weekend, I move a week's > worth of data to billing__archive (the table we are discussing), and > delete a week's worth from its end. Now, the indexes on that table > would make this impossible to do within the weekend, so what I do is > drop all the indexes before I do the inserts, and then recreate > them, and then do the deletes. Without looking at the details of your database, I have to say that the above sounds to me like more work than letting the system handle this itself. I have a suspicion that what you really want to do is trickle out the changes rather than trying to do things in big batches this way. > If a vacuum takes me several days (let alone over a week!) than a > VACUUM FULL is out of the question. VACUUM FULL locks the table > completely and that table is essential to our customer care. If push > comes to shove, I think I'd rather dump that table, drop it, and > restore it over the weekend, which I believe will be faster than a > VACUUM FULL. Yes, I think so too. And I bet at the current state of affairs, that's a good bet. Whatever the situation, I suspect things are too bad off to be worth trying to get through a vacuum with. > One other important question: a tuple marked by VACUUM as reusable > (not VACUUM FULL which restores it to the operating system) - can > its space ever be used by another table, or can it only be used for > new inserts into the same table? It's managed by postgres, but given your churn rate on these tables I'd be tempted to set a fillfactor with a lot of room, and let the tables be "big" (i.e. with a lot of empty space) so that their long term storage footprint is stable. A -- Andrew Sullivan a...@crankycanuck.ca -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Adding more space, and a vacuum question.
On 30/01/2011, at 12:27, Craig Ringer wrote: > > OK, so you're pre-8.4 , which means you have the max_fsm settings to play > with. Have you seen any messages in the logs about the free space map (fsm)? > If your install didn't have a big enough fsm to keep track of deleted tuples, > you'd face massive table bloat that a regular vacuum couldn't fix. Ouch. You're absolutely right. There are messages about max_fsm_pages in the postgres log. It's currently set to 153600. According to the documentation, I can increase it up to 20. Will that even help? How do I find out how many I need to set it to? > > You also don't have the visibility map, which means that (auto)vacuum can't > skip bits of the tables it knows don't need vacuuming. Your vacuums will be > slower. > > Autovacuum improved significantly in both 8.4 and 9.0; consider an upgrade. I will consider it. Thank you. Herouth
Re: [GENERAL] Full Text Index Scanning
Doesn't seem to work either. Maybe something changed in 9.1? create index test_idx on testtable using gin(to_tsvector(wordcolumn||' '||reverse(wordcolumn))); ERROR: functions in index expression must be marked IMMUTABLE On Sun, Jan 30, 2011 at 3:28 AM, Oleg Bartunov wrote: > I used 9.1dev, but you can try immutable function (from > http://andreas.scherbaum.la/blog/archives/10-Reverse-a-text-in-PostgreSQL.html > ) > > create function reverse(text) returns text as $$ > > select case when length($1)>0 > > then substring($1, length($1), 1) || reverse(substring($1, 1, > length($1)-1)) > > else '' end $$ language sql immutable strict; > > > > On Sat, 29 Jan 2011, Matt Warner wrote: > > 9.0.2 >> >> On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov wrote: >> >> What version of Pg you run ? Try latest version. >>> >>> Oleg >>> >>> >>> On Sat, 29 Jan 2011, Matt Warner wrote: >>> >>> Reverse isn't a built-in Postgres function, so I found one and installed >>> it. However, attempting to use it in creating an index gets me the message "ERROR: functions in index expression must be marked IMMUTABLE", even though the function declaration already has the immutable argument. Is there a specific version of the reverse function you're using? Or am I just missing something obvious? This is Postgres 9, BTW. Thanks, Matt On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner >>> > wrote: > Thanks Oleg. I'm going to have to experiment with this so that I > understand > it better. > > Matt > > > On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov > wrote: > > Matt, I'd try to use prefix search on original string concatenated > with > >> reverse string: >> >> Just tried on some spare table >> >> knn=# \d spot_toulouse >> Table "public.spot_toulouse" >> Column| Type| Modifiers >> -+---+--- >> clean_name | character varying | >> >> >> 1. create index knn=# create index clean_name_tlz_idx on spot_toulouse >> using gin(to_tsvector('french', clean_name || ' ' || >> reverse(clean_name))); >> 2. >> select clean_name from spot_toulouse where to_tsvector('french', >> clean_name|| ' ' || reverse(clean_name) ) @@ >> to_tsquery('french','the:* >> | >> et:*'); >> >> Select looks cumbersome, but you can always write wrapper functions. >> The >> only drawback I see for now is that ranking function will a bit >> confused, >> since coordinates of original and reversed words will be not the same, >> but >> again, it's possible to obtain tsvector by custom function, which >> aware >> about reversing. >> >> Good luck and let me know if this help you. >> >> Oleg >> >> >> On Fri, 28 Jan 2011, Matt Warner wrote: >> >> I'm in the process of migrating a project from Oracle to Postgres and >> >> have >>> run into a feature question. I know that Postgres has a full-text >>> search >>> feature, but it does not allow scanning the index (as opposed to the >>> data). >>> Specifically, in Oracle you can do "select * from table where >>> contains(colname,'%part_of_word%')>1". While this isn't terribly >>> efficient, >>> it's much faster than full-scanning the raw data and is relatively >>> quick. >>> >>> It doesn't seem that Postgres works this way. Attempting to do this >>> returns >>> no rows: "select * from table where to_tsvector(colname) @@ >>> to_tsquery('%part_of_word%')" >>> >>> The reason I want to do this is that the partial word search does not >>> involve dictionary words (it's scanning names). >>> >>> Is this something Postgres can do? Or is there a different way to do >>> scan >>> the index? >>> >>> TIA, >>> >>> Matt >>> >>> >>> Regards, >>> >> Oleg >> _ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru >> ), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >> >> > > Regards, >>> Oleg >>> _ >>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>> Sternberg Astronomical Institute, Moscow University, Russia >>> Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(495)939-16-83, +007(495)939-23-83 >>> >>> >> >Regards, >Oleg > _ > Oleg Bartunov, Research Scientist, Head of As
Re: [GENERAL] Full Text Index Scanning
Matt Warner writes: > Doesn't seem to work either. Maybe something changed in 9.1? > create index test_idx on testtable using gin(to_tsvector(wordcolumn||' > '||reverse(wordcolumn))); > ERROR: functions in index expression must be marked IMMUTABLE That's not the same case he tested. The single-parameter form of to_tsvector isn't immutable, because it depends on the default text search configuration parameter. It should work, AFAICS, with the two-parameter form. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Index Scanning
Aha! Thanks for pointing that out. It's indexing now. Thanks! Matt On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane wrote: > Matt Warner writes: > > Doesn't seem to work either. Maybe something changed in 9.1? > > create index test_idx on testtable using gin(to_tsvector(wordcolumn||' > > '||reverse(wordcolumn))); > > ERROR: functions in index expression must be marked IMMUTABLE > > That's not the same case he tested. The single-parameter form of > to_tsvector isn't immutable, because it depends on the default text > search configuration parameter. It should work, AFAICS, with the > two-parameter form. > >regards, tom lane >
Re: [GENERAL] Full Text Index Scanning
If I understand this, it looks like this approach allows me to match the beginnings and endings of words, but not the middle sections. Is that correct? That is, if I search for "jag" I will find "jaeger" but not "lobenjager". Or am I (again) not understanding how this works? TIA, Matt On Sun, Jan 30, 2011 at 9:59 AM, Matt Warner wrote: > Aha! Thanks for pointing that out. It's indexing now. > > Thanks! > > Matt > > > On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane wrote: > >> Matt Warner writes: >> > Doesn't seem to work either. Maybe something changed in 9.1? >> > create index test_idx on testtable using gin(to_tsvector(wordcolumn||' >> > '||reverse(wordcolumn))); >> > ERROR: functions in index expression must be marked IMMUTABLE >> >> That's not the same case he tested. The single-parameter form of >> to_tsvector isn't immutable, because it depends on the default text >> search configuration parameter. It should work, AFAICS, with the >> two-parameter form. >> >>regards, tom lane >> > >
Re: [GENERAL] Full Text Index Scanning
Matt Warner writes: > If I understand this, it looks like this approach allows me to match the > beginnings and endings of words, but not the middle sections. Yeah, probably. You might consider using contrib/pg_trgm instead if you need arbitrary substrings. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] New index structure with Open MP
Hi pgsql-general mailing list users, I have a question related to the development of a new index structure. I am a writing my masters thesis regarding index structures and the possibility to parallize them. I already found a post in the archives regarding Open MP, but my question is somehow different. I am currently not aiming at a production ready implementation, a prototype is sufficient. I already checked if I can use a different database, e.g. Apache Derby or MySQL (because they are already multithreaded), but it is rather complicated to extend them, I think it is not intended to add new index structures within one of them. So, long story short, PostgreSQL is optimal for development of a new index structure and well documented (yeah! really! great! thanks a lot for that!). I am not aiming for full parallelization, only some parts of the algorithm regarding build, insert and search are going to be extended by Open MP. E.g. I want to parallelize searching in multiple pages by directing some portions to one thread and other portions to another thread. Do you think that this small amount of parallelization is possible? Or will there be complications with the used methods by the buffer manager and so on? What do you think? What are your thoughts? Greets, Yves W. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] New index structure with Open MP
=?ISO-8859-15?Q?Yves_Wei=DFig?= writes: > I am not aiming for full parallelization, only some > parts of the algorithm regarding build, insert and search are going to > be extended by Open MP. E.g. I want to parallelize searching in multiple > pages by directing some portions to one thread and other portions to > another thread. Do you think that this small amount of parallelization > is possible? Or will there be complications with the used methods by the > buffer manager and so on? What do you think? What are your thoughts? The backend code is not designed for thread safety. This is not a case where "only a little bit" of parallelism is going to be safe. It *will* break. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] iPad and Pg revisited...
Trapped in Steve Jobs Reality Distortion Field On Jan 25, 2011, at 8:21 AM, John DeSoi wrote: > > On Jan 24, 2011, at 3:25 PM, Jerry LeVan wrote: > >> I assume that if I were to jump to Pg 9.x.x that phpPgAdmim would die, yes? > > I have not tried it, but my guess is it will work. I don't recall seeing that > there were any major protocol changes for version 9, so I suspect whatever > libpq version is linked to PHP should work just fine with Postgres 9. > You are correct :) I upgraded three of my Mac systems to 9.0.2 from 8.4.4. All went smoothly except that one of my gui apps quit displaying selection results... It turns out that the status I was checking now returns 'SELECT ' instead of just 'SELECT'. It took more than a few minutes to find and fix the problem. Jerry > > > > John DeSoi, Ph.D. > > > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Adding more space, and a vacuum question.
On 01/31/2011 12:14 AM, Herouth Maoz wrote: On 30/01/2011, at 12:27, Craig Ringer wrote: OK, so you're pre-8.4 , which means you have the max_fsm settings to play with. Have you seen any messages in the logs about the free space map (fsm)? If your install didn't have a big enough fsm to keep track of deleted tuples, you'd face massive table bloat that a regular vacuum couldn't fix. Ouch. You're absolutely right. There are messages about max_fsm_pages in the postgres log. It's currently set to 153600. According to the documentation, I can increase it up to 20. Will that even help? How do I find out how many I need to set it to? I think the logs suggest what to set. I haven't used 8.3 in ages and don't remember well. Increasing it won't help after the fact. You almost certainly have badly bloated tables. Fixing that will be interesting in your current low-disk-space situation. VACUUM FULL would work - but will exclusively lock the table being vacuumed for *ages*, so nothing else can do any work, not even reads. CLUSTER will do the same, and while it's much faster, to work it requires enough free disk space to store a complete copy of the still-valid parts of the table while the bloated original is still on disk. You may have to look into some of the lockless fake vacuum full approaches. I think table bloat identification and management is one of the worst problems PostgreSQL has remaining. It's too hard, out of the box, to discover bloat developing, and it's too disruptive to fix it if and when it does happen. The automatic free space map management in 8.4, and the ongoing autovacuum improvements, help reduce the chances of bloat happening, but it's still a pain to monitor for and a pain to fix when it does happen. For approaches to possibly fixing your problem, see: http://www.depesz.com/index.php/2010/10/17/reduce-bloat-of-table-without-longexclusive-locks/ http://blog.endpoint.com/2010/09/reducing-bloat-without-locking.html -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Index Scanning
Thanks. pg_trgm looks interesting, but after installing the pg_trgm.sql, I get error messages when following the documentation. sggeeorg=> create index test_idx on test using gist(columnname gist_trgm_ops); ERROR: operator class "gist_trgm_ops" does not exist for access method "gist" STATEMENT: create index test_idx on test using gist(columnname gist_trgm_ops); ERROR: operator class "gist_trgm_ops" does not exist for access method "gist" On Sun, Jan 30, 2011 at 10:36 AM, Tom Lane wrote: > Matt Warner writes: > > If I understand this, it looks like this approach allows me to match the > > beginnings and endings of words, but not the middle sections. > > Yeah, probably. You might consider using contrib/pg_trgm instead if > you need arbitrary substrings. > >regards, tom lane >
Re: [GENERAL] One last Ruby question for tonight - Regarding UUID type
Each database adapter in ActiveRecord sets up a mapping between ActiveRecord types and the native database types. If the type is not defined, it just defaults it as a string. If you are using Rails, in one of your environment initializers, you can add the following code: ActiveRecord::Base.connection.native_database_types[:uuid] = 'uuid' # this is using ActiveRecord 3.0.3, but also works with 2.3.8 Then, when you add the column type in the migration as "uuid", it should work. The reverse mapping, that allows AR to use the UUID as a string is already handled: https://github.com/rails/rails/blob/master/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb#L67 Jonathan On Jan 28, 2011, at 9:28 PM, Mike Christensen wrote: >> My goal is to learn Ruby by porting one of my existing PG web >> applications over to Rails.. However, my existing data heavily relies >> on the UUID data type. I've noticed when I create a new model with >> something like: >> >> guidtest name:string value:uuid >> >> And then do a rake:migrate, the CREATE TABLE that gets generated looks like: >> >> CREATE TABLE guidtests >> ( >> id serial NOT NULL, >> "name" character varying(255), >> created_at timestamp without time zone, >> updated_at timestamp without time zone, >> CONSTRAINT guidtests_pkey PRIMARY KEY (id) >> ) >> ... >> >> In other words, it just ignores my "uuid" type. However, the views >> and stuff do include this column so the page will crash when I load it >> since the column doesn't exist in the DB. >> Is there some special thing I have to do to use the uuid type in >> ActiveRecord? Thanks! > > Update: If I manually add the column in using pgAdmin (as a uuid type > of course), the program actually runs (I can create new rows and > display data).. So RoR does support this type (probably gets > marshalled as a string??) but I guess the ActiveRecord schema > generation stuff just doesn't support uuid. Hmmm. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] New index structure with Open MP
Hi, thanks for the answer. I understand that the backend is not thread safe, but it would be possible to parallelize, let's say a big for-loop without breaking anything, or? Greets, Yves W. -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Sunday, January 30, 2011 10:38 PM To: weis...@rbg.informatik.tu-darmstadt.de Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] New index structure with Open MP =?ISO-8859-15?Q?Yves_Wei=DFig?= writes: > I am not aiming for full parallelization, only some parts of the > algorithm regarding build, insert and search are going to be extended > by Open MP. E.g. I want to parallelize searching in multiple pages by > directing some portions to one thread and other portions to another > thread. Do you think that this small amount of parallelization is > possible? Or will there be complications with the used methods by the > buffer manager and so on? What do you think? What are your thoughts? The backend code is not designed for thread safety. This is not a case where "only a little bit" of parallelism is going to be safe. It *will* break. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general