On 13/02/15 01:06, andy van wrote:
Hi, I'm trying to compare two CSV files (and many more like these below). I
tried many ways, using lists, dictreader and more but nothing gave me the
output I require. I want to compare all those rows that have same
!Sample_title and !Sample_geo_accession values (whose positions vary). I've
been struggling with this for three days now and couldn't come to a
solution. I highly appreciate any help.

I'm afraid your sample data was insufficient for me to figure out your criteria for doing additions/deletions etc. However given that the fields have the same name but varying positions it sounds like a
dictreader would be the best starting point.

Can you show us what you tried using the dictreader approach?
Also can you write a function that does the comparison of two
lines and returns an action code (say: 0 = nothing, 1 = add,
2 = delete, 3 = change) That would encapsulate your "business logic"
and just leave the issue of comparing the two files.

CSV1:

!Sample_title,!Sample_geo_accession,!Sample_status,!Sample_type,!Sample_source_name_ch1
body,GSM501443,Public on july 22 2010,ribonucleic acid,FB_50_12wk
foreign,GSM501445,Public on july 22 2010,ribonucleic acid,FB_0_12wk
HJCENV,GSM501446,Public on july 22 2010,ribonucleic acid,FB_50_12wk
AsDW,GSM501444,Public on july 22 2010,ribonucleic acid,FB_0_12wk

CSV2:

!Sample_title,!Sample_type,!Sample_source_name_ch1,!Sample_geo_accession
AsDW,ribonucleic acid,FB_0,GSM501444
foreign,ribonucleic acid,FB,GSM501449
HJCENV,RNA,12wk,GSM501446

Desired output (with respect to CSV2):

Added:
{!Sample_status:{HJCENV:Public on july 22 2010,AsDW:Public on july 22
2010}} #Added columns, not rows.

Deleted:
{} #Since nothing's deleted with respect to CSV2

Changed:

{!Sample_title:AsDW,!Sample_source_name_ch1:(FB_0_12wk,FB_0),!Sample_geo_accession:GSM501444
!Sample_title:HJCENV,!Sample_type:(ribonucleic
acid,RNA),!Sample_source_name_ch1:(FB_50_12wk,12wk),!Sample_geo_accession:GSM501446}
#foreign,ribonucleic acid,FB,GSM501449 doesn't come here since the
!Sample_geo_accession column value didn't match.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to