Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Carlo Stonebanks
> Maybe "group by", "order by", "distinct on" and hand-written functions > and aggregates (like first() or best()) may help. We use these - we have lexical analysis functions which assign a rating to each row in a set, and the likelyhood that the data is a match, and then we sort our results. I

Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Carlo Stonebanks
> I still think that using a PL in the backend might be more performant > than having an external client, alone being the SPI interface more > efficient compared to the network serialization for external applications. I would actually love for this to work better, as this is technology that I wou

Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Markus Schaber
Hi, Carlo, Carlo Stonebanks wrote: >> Trying to achieve a high level of data quality in one large project is >> not often possible. Focus on the most critical areas of checking and get >> that working first with acceptable performance, then layer on additional >> checks while tuning. The complexi

Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Markus Schaber
Hi, Carlo, Carlo Stonebanks wrote: >> Did you think about putting the whole data into PostgreSQL using COPY in >> a nearly unprocessed manner, index it properly, and then use SQL and >> stored functions to transform the data inside the database to the >> desired result? > > This is actually what

Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
> 1. fork your import somhow to get all 4 cores running This is already happening, albeit only 3. No improvement - it appears we have taken the same problem, and divided it by 3. Same projected completion time. this is really curious, to say the least. > 2. write the code that actually does the

Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
> My experience with that type of load process is that doing this > row-by-row is a very expensive approach and your results bear that out. I expected this, and had warned the client before the project started that this is exactly where SQL underperforms. > It is often better to write each step

Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
> Did you think about putting the whole data into PostgreSQL using COPY in > a nearly unprocessed manner, index it properly, and then use SQL and > stored functions to transform the data inside the database to the > desired result? This is actually what we are doing. The slowness is on the row-by-

Re: [PERFORM] Performace Optimization for Dummies

2006-10-02 Thread Carlo Stonebanks
I have loaded three of the four cores by running three different versions of the import program to import three different segments of the table to import. The server jumps to 75% usage, with three postgresql processes eating up 25% each., the actual client itself taking up just a few ticks. "He

Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Merlin Moncure
On 9/29/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: For reasons I've exlained elsewhere, the import process is not well suited to breaking up the data into smaller segments. However, I'm interested in what can be indexed. I am used to the idea that indexing only applies to expressions that al

Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Bill Moran
In response to "Carlo Stonebanks" <[EMAIL PROTECTED]>: > >> indexes. I don't know whether autovacuum will also analyze tables > >> for you automagically, but it would be a good idea to analyze the table > > > > It does. > > So, I have checked my log and I see an autovacuum running once every minu

Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Markus Schaber
Hi, Carlo, Carlo Stonebanks wrote: > From what I can see, autovacuum is hitting the db's in question about once > every five minutes. Does this imply an ANALYZE is being done automatically > that would meet the requirements we are talking about here? Is there any > benefit ot explicitly perfor

Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Markus Schaber
Hi, Carlo, Carlo Stonebanks wrote: > We urgently need a major performance improvement. Did you think about putting the whole data into PostgreSQL using COPY in a nearly unprocessed manner, index it properly, and then use SQL and stored functions to transform the data inside the database to the d

Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Simon Riggs
On Thu, 2006-09-28 at 12:44 -0400, Carlo Stonebanks wrote: > At this early stage in the project, we are initializing our portal's > database with millions of rows of imported data in over 50 different > flattened tables; each table's structure is unique to the data provider. > This requires a p

Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Heikki Linnakangas
Carlo Stonebanks wrote: We urgently need a major performance improvement. We are running the PostgreSQL 8.1.4 on a Windows 2003 x64 Server on a dual processor, dual core 3.2Ghz Xeon box with 4gb RAM and a RAID (sorry, I don't know what type) disc subsystem. Sorry about the long intro, but here

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
>> indexes. I don't know whether autovacuum will also analyze tables >> for you automagically, but it would be a good idea to analyze the table > > It does. So, I have checked my log and I see an autovacuum running once every minute on our various databases being hosted on the server - once every

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Tom Lane
"Carlo Stonebanks" <[EMAIL PROTECTED]> writes: > But is there any way to create an index expression that will help with: > where foo like '%bar%'? If you are concerned about that, what you are probably really looking for is full-text indexing. See contrib/tsearch2 for our current best answer to t

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> Don't forget about EXPLAIN ANALYZE, which is related to EXPLAIN but has > nothing to do with the ANALYZE command. Ah, hence my confusion. Thanks for the clarification... I never knew about ANALYZE as a seperate command. ---(end of broadcast)--

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> by the way, stats_command_string is a known performance killer that > iirc was improved in 8.2. just fyi. This is a handy fact, I will get on this right away. > bulk load denomalized tables into scratch tables into the postgresql > database. create indexes appropriate to the nomalization proces

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> imo, the key to high performance big data movements in postgresql is > mastering sql and pl/pgsql, especially the latter. once you get good > at it, your net time of copy+plpgsql is going to be less than > insert+tcl. If this implies bulk inserts, I'm afraid I have to consider something else.

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> Something else to consider... databases love doing bulk operations. It > might be useful to load prospective data into a temporary table, and > then do as many operations as you can locally (ie: within the database) > on that table, hopefully eleminating as many candidate rows as possible > along

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Matthew Nuzum
Carlo Stonebanks wrote: > Lots of great info here, I will see what applies to my situation. However, I > don't see bulk inserts of the tables working, because all of the tables need > to be refreshed as values to deduplicate and match will change with every > row added. In order for this to work

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Jim C. Nasby
On Thu, Sep 28, 2006 at 02:04:21PM -0700, Steve Atkins wrote: > I think you're confusing "explain" and "analyze". "Explain" gives you > human readable output as to what the planner decided to do with the > query you give it. Don't forget about EXPLAIN ANALYZE, which is related to EXPLAIN but has

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Steve Atkins
On Sep 28, 2006, at 12:10 PM, Carlo Stonebanks wrote: Are you wrapping all this in a transaction? Yes, the transactions can typically wrap 1 to 10 single-table, single-row inserts and updates. You're doing some dynamically generated selects as part of the "de-duping" process? They're

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
Lots of great info here, I will see what applies to my situation. However, I don't see bulk inserts of the tables working, because all of the tables need to be refreshed as values to deduplicate and match will change with every row added. In order for this to work, i would have to write queries

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Merlin Moncure
On 9/28/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: The deduplication process requires so many programmed procedures that it runs on the client. Most of the de-dupe lookups are not "straight" lookups, but calculated ones emplying fuzzy logic. This is because we cannot dictate the format of ou

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
The deduplication process requires so many programmed procedures that it runs on the client. Most of the de-dupe lookups are not "straight" lookups, but calculated ones emplying fuzzy logic. This is because we cannot dictate the format of our input data and must deduplicate with what we get. Th

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Merlin Moncure
On 9/28/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: > are you using the 'copy' interface? Straightforward inserts - the import data has to transformed, normalised and de-duped by the import program. I imagine the copy interface is for more straightforward data importing. These are - buy nece

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> So your program first transforms the data and then inserts it? And it is > the transforming process which is running select statements that is slow? There are cross-referencing and deduplication processes. Does this person have an office at this exact address? In a similarily named building in

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> Are you wrapping all this in a transaction? Yes, the transactions can typically wrap 1 to 10 single-table, single-row inserts and updates. > You're doing some dynamically generated selects as part of the > "de-duping" process? They're probably the expensive bit. What > do those queries tend

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Matthew Nuzum
Carlo Stonebanks wrote: >> are you using the 'copy' interface? > > Straightforward inserts - the import data has to transformed, normalised and > de-duped by the import program. I imagine the copy interface is for more > straightforward data importing. These are - buy necessity - single row > i

RES: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Leandro GuimarĂ£es dos Santos
Hi All, I have a Java application using hiber nate that connects to PostgreSQl 8.1.4. So, looking forward the log files I got the following error: 2006-09-28 09:24:25 LOG: unexpected EOF on client connection 2006-09-28 09:26:06 LOG: unexpected EOF on client connection 2006-09-28 09:48:24 LOG:

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Jim C. Nasby
On Thu, Sep 28, 2006 at 01:53:22PM -0400, Carlo Stonebanks wrote: > > are you using the 'copy' interface? > > Straightforward inserts - the import data has to transformed, normalised and > de-duped by the import program. I imagine the copy interface is for more > straightforward data importing.

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Jim C. Nasby
On Thu, Sep 28, 2006 at 01:47:44PM -0400, Carlo Stonebanks wrote: > > How are you loading the tables? Copy? Insert? > > Once the data is transformed, it is inserted. I don't have stats, but the > programs visual feedback does not spend a lot of time on the "inserting > data" message. Then again,

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Jim C. Nasby
On Thu, Sep 28, 2006 at 10:11:31AM -0700, Joshua D. Drake wrote: > > 4) Can anyone recommend any commercial PostgreSQL service providers that > > may > > be able to swiftly come in and assist us with our performance issues? > > http://www.commandprompt.com/ (disclaimer, I am an employee) You f

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Dave Dutcher
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Carlo Stonebanks > Subject: [PERFORM] Performace Optimization for Dummies > > At this early stage in the project, we are initializing our portal's > database with millions of r

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Steve Atkins
On Sep 28, 2006, at 10:53 AM, Carlo Stonebanks wrote: are you using the 'copy' interface? Straightforward inserts - the import data has to transformed, normalised and de-duped by the import program. I imagine the copy interface is for more straightforward data importing. These are - buy

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> are you using the 'copy' interface? Straightforward inserts - the import data has to transformed, normalised and de-duped by the import program. I imagine the copy interface is for more straightforward data importing. These are - buy necessity - single row inserts. > thats a tough question.

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
> How are you loading the tables? Copy? Insert? Once the data is transformed, it is inserted. I don't have stats, but the programs visual feedback does not spend a lot of time on the "inserting data" message. Then again, if there is an asynchronous component to an insert, perhaps I am not seein

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Merlin Moncure
On 9/28/06, Carlo Stonebanks <[EMAIL PROTECTED]> wrote: We urgently need a major performance improvement. We are running the PostgreSQL 8.1.4 on a Windows 2003 x64 Server on a dual processor, dual core 3.2Ghz Xeon box with 4gb RAM and a RAID (sorry, I don't know what type) disc subsystem. Sorry a

Re: [PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Joshua D. Drake
> The import is slow - and degrades as the tables grow. With even more > millions of rows in dozens of import tables to come, the imports will take > forever. My ability to analyse the queries is limited; because of the nature > of the import process, the SQL queries are mutable, every imported

[PERFORM] Performace Optimization for Dummies

2006-09-28 Thread Carlo Stonebanks
I am a software developer who is acting in a (temporary) dba role for a project. I had recommended PostgreSQL be brought in to replace the proposed MySQL DB - I chose PostgreSQL because of its reputation as a more stable solution than MySQL. At this early stage in the project, we are initializi