Shawn (and anyone else who will listen):
I'm already running out of RAM (actually, virtual memory/page file space) just trying to display all the job titles without even joining them with anything. I have to use a LIKE clause to just get a portion of them. So, I could: (1) Have multiple database connections going concurrently where the first one joins everything except the keywords. As I'm iterating thru the first result set with the fetch function, I could get a list of keyword foreign keys for each row with a second database connect and store this in a second result set. Is this a common approach? Are secondary database connections cheap? (2) I could try to store the first join in a hashmap first and then iterate but I've already demonstrated that the hashmap takes too much memory. (3) I could create a new column of type string for each job title. This would contain a comma separated list of integer foreign keys for the keywords. This is the non-normalized option and you discouraged this approach. (4) I could have a very wide result set. Let assume I have a jobtitle (joined with a job posting and company) with 26 keywords. That means 26 rows in the result set are identical except the keyword foreign key (fk) column. I have to then insert the logic to detect the fact that everything except the keyword fk column is identical. Are you advocating this approach? It seems like it requires a lot of computer space and computer time and (worst of all) my time. I believe this is the classical approach, however. Which would you choose? Thanks, Siegfried _____ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, July 25, 2005 9:31 AM To: Siegfried Heintze Cc: mysql@lists.mysql.com Subject: Re: Alternatives to performing join on normalized joins? "Siegfried Heintze" <[EMAIL PROTECTED]> wrote on 07/24/2005 11:35:36 AM: > I have a large number of job titles (40K). Each job title has multiple > keywords making a one-to-many parent-child relationship. > > If I join job title with company name, address, company url, company city, > job name, job location, job url (etc...) I have a mighty wide result set > that will be repeated for each keyword. > > What I have done in the past (in a different, much smaller, application) is > perform a join of everything except the keyword and store everything in a > hashmap. > > Then I iterate thru each wide row in the hashmap and perform a separate > SELECT statement foreach row in this hashmap to fetch the multiple keywords. > > Whew! That would be a lot of RAM (and paging) for this application. > > Are there any other more efficient approaches? > > Thanks, > Siegfried > > There are two major classes of efficiency when dealing with any RDBMS: time efficiency (faster results), space efficiency (stored data takes less room on the disk). Which one are you worried about? If it were me, I would start with all of the data normalized: * a Companies table (name, address, url, city, etc) * a Job Titles table (a list of names) * a Keywords table (a list of words used to describe Job Titles) * a JobPosting table ( Relates Companies to Job Titles. Should also be used to track things like dateposted, dateclosed, salary offered, etc.) * a Postings_Keywords table (matches a Posting to multiple Keywords). I would only denormalize if testing showed a dramatic improvement in performance by doing so. I would think that the Job Title to Keyword relationship would be different between Companies. One company posting for a "Programmer" may want VB while another wants PHP and PERL. By associating the Keywords with a Posting (and not just the Job Title), you can make that list Company-specific. Shawn Green Database Administrator Unimin Corporation - Spruce Pine