Re: Query to select...

SGreen Tue, 24 May 2005 08:03:22 -0700

Michael Stassen <[EMAIL PROTECTED]> wrote on 05/24/2005 10:26:14 
AM:


> [EMAIL PROTECTED] wrote:
> 
> > Dan Bolser <[EMAIL PROTECTED]> wrote on 05/24/2005 06:08:32 AM:
> > 
> >>Hello,
> >>
> >>I have data like this
> >>
> >>PK   GRP_COL
> >>1   A
> >>2   A
> >>3   A
> >>4   B
> >>5   B
> >>6   B
> >>7   C
> >>8   C
> >>9   C
> >>
> >>And I want to write a query to select data like this...
> >>
> >>PK   FK   GRP_COL
> >>1   1   A
> >>2   1   A
> >>3   1   A
> >>4   4   B
> >>5   4   B
> >>6   4   B
> >>7   7   C
> >>8   7   C
> >>9   7   C
> >>
> >>Where FK is a random (or otherwise) member of PK from within the
> >>appropriate group given by GRP_COL. FK recreates the grouping from
> >>GRP_COL, but in terms of PK. I want to do this because GRP_COL is
> >>difficult to handle and I want to re-represent the grouping in terms 
of 
> > 
> > PK
> > 
> >>(this allows me to link data into the grouping more easily).
> >>
> >>Is there a simple way to do this?
> > 
> > I don't understand your choice of column name for your new column. PK 
> > generally means PRIMARY KEY and FK generally means FOREIGN KEY. 
FOREIGN 
> > KEYs are used to enforce relational data integrity between tables. 
What it 
> > looks like you want to do is to tag every row in a group with the 
lowest 
> > (minimum) PK value for that group. To me, that is not a FK.
> > 
> > To do what you want will either take a subquery or a separate table. I 

> > think the separate table will perform faster so I will show you that 
> > method.
> > 
> > CREATE TEMPORARY TABLE tmpPK (KEY GRP_COL)
> > SELECT GRP_COL,MIN(PK) as minpk
> > FROM datatable
> > GROUP BY GRP_COL;
> > 
> > ALTER TABLE datatable ADD COLUMN FK INT;
> > 
> > UPDATE datatable
> > INNER JOIN tmpPK
> >         ON tmpPK.GRP_COL = datatable.GRP_COL
> > SET datatable.FK = tmpPK.minpk;
> > 
> > DROP TEMPORARY TABLE tmpPK;
> > 
> > The slowest part of all of this will be adding the column to your 
table.
> > 
> > Shawn Green
> > Database Administrator
> > Unimin Corporation - Spruce Pine
> 
> I was writing a similar answer, but you beat me to it.
> 
> This is the right idea, but I don't think you go far enough.  This 
> solution does what Dan asks, but I don't think it's what he really 
> needs.  Surely, the problem is that the data isn't normalized.  GRP_COL 
> contains repeated strings and is "difficult to handle".  I'd suggest a 
> permanent, rather than temporary, fix.  Something like:
> 
>    CREATE TABLE groups
>    (id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY)
>    SELECT DISTINCT GRP_COL AS group_name
>    FROM datatable;
> 
>    ALTER TABLE datatable ADD grp_id INT UNSIGNED NOT NULL;
> 
>    UPDATE datatable
>    JOIN groups ON datatable.GRP_COL = groups.group_name
>    SET datatable.grp_id = groups.id;
> 
> Check first, then
> 
>    ALTER TABLE datatable DROP COLUMN GRP_COL;
> 
> Now you join to the new groups table when you need the group name.
> 
> Michael
> 

I agree, especially with the additional information the OP provided about 
his REAL table structure. A separate groups table makes better sense. 

Let this be an object lesson to others looking for assistance: If you want 
timely and useful assistance, provide real and complete information 
whenever possible. SHOW CREATE TABLE gives much better information than 
DESCRIBE or SHOW COLUMNS. If you do chose to obfuscate your real table and 
data layout then be prepared to translate whatever advice you receive. 

Sorry it took so long to get to the bottom of this design issue.

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: Query to select...

Reply via email to