RE: Alternatives to performing join on normalized joins?

Siegfried Heintze Mon, 25 Jul 2005 16:40:03 -0700

Shawn (and anyone else who will listen):


I'm already running out of RAM (actually, virtual memory/page file space)
just trying to display all the job titles without even joining them with
anything. I have to use a LIKE clause to just get a portion of them.

 

So, I could:

(1)     Have multiple database connections going concurrently where the
first one joins everything except the keywords. As I'm iterating thru the
first result set with the fetch function, I could get a list of keyword
foreign keys for each row with a second database connect and store this in a
second result set. Is this a common approach? Are secondary database
connections cheap?

(2)     I could try to store the first join in a hashmap first and then
iterate but I've already demonstrated that the hashmap takes too much
memory.

(3)     I could create a new column of type string for each job title. This
would contain a comma separated list of integer foreign keys for the
keywords. This is the non-normalized option and you discouraged this
approach.

(4)     I could have a very wide result set. Let assume I have a jobtitle
(joined with a job posting and company) with 26 keywords. That means 26 rows
in the result set are identical except the keyword foreign key (fk) column.
I have to then insert the logic to detect the fact that everything except
the keyword fk column is identical. Are you advocating this approach? It
seems like it requires a lot of computer space and computer time and (worst
of all) my time. I believe this is the classical approach, however.

 

Which would you choose?

 

Thanks,

 

Siegfried

 

  _____  

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 25, 2005 9:31 AM
To: Siegfried Heintze
Cc: mysql@lists.mysql.com
Subject: Re: Alternatives to performing join on normalized joins?

 



"Siegfried Heintze" <[EMAIL PROTECTED]> wrote on 07/24/2005 11:35:36 AM:

> I have a large number of job titles (40K). Each job title has multiple
> keywords making a one-to-many parent-child relationship.
> 
> If I join job title with company name, address, company url, company city,
> job name, job location, job url (etc...) I have a mighty wide result set
> that will be repeated for each keyword.
> 
> What I have done in the past (in a different, much smaller, application)
is
> perform a join of everything except the keyword and store everything in a
> hashmap. 
> 
> Then I iterate thru each wide row in the hashmap and perform a separate
> SELECT statement foreach row in this hashmap to fetch the multiple
keywords.
> 
> Whew! That would be a lot of RAM (and paging) for this application.
> 
> Are there any other more efficient approaches?
> 
> Thanks,
> Siegfried
> 
> 

There are two major classes of efficiency when dealing with any RDBMS: time
efficiency (faster results), space efficiency (stored data takes less room
on the disk). Which one are you worried about? 

If it were me, I would start with all of the data normalized: 
        * a Companies table (name, address, url, city, etc) 
        * a Job Titles table (a list of names) 
        * a Keywords table (a list of words used to describe Job Titles) 
        * a JobPosting table ( Relates Companies to Job Titles. Should also
be used to track things like dateposted, dateclosed, salary offered, etc.) 
        * a Postings_Keywords table (matches a Posting to multiple
Keywords). 

I would only denormalize if testing showed a dramatic improvement in
performance by doing so. I would think that the Job Title to Keyword
relationship would be different between Companies. One company posting for a
"Programmer" may want VB while another wants PHP and PERL. By associating
the Keywords with a Posting (and not just the Job Title), you can make that
list Company-specific. 


Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

RE: Alternatives to performing join on normalized joins?

Reply via email to