On Friday 15 April 2011 02:13:51 christian wrote: > Hello, > > i'm not very experienced in python. Is there a way doing > below more memory efficient and maybe faster. > I import a 2-column file and then concat for every unique > value in the first column ( key) the value from the second > columns. > > So The ouptut is something like that. > A,1,2,3 > B,3,4 > C,9,10,11,12,90,34,322,21 > > > Thanks for advance & regards, > Christian > > > import csv > import random > import sys > from itertools import groupby > from operator import itemgetter > > f=csv.reader(open(sys.argv[1]),delimiter=';') > z=[[i[0],i[1]] for i in f] > z.sort(key=itemgetter(0)) > mydict = dict((k,','.join(map(itemgetter(1), it))) > for k, it in groupby(z, itemgetter(0))) > del(z) > > f = open(sys.argv[2], 'w') > for k,v in mydict.iteritems(): > f.write(v + "\n") > > f.close() Two alternative solutions - the second one with generators is probably the most economical as far as RAM usage is concerned.
For you example data1.txt is taken as follows: A, 1 B, 3 C, 9 A, 2 B, 4 C, 10 A, 3 C, 11 C, 12 C, 90 C, 34 C, 322 C, 21 The "two in one" program is: #!/usr/bin python '''generate.py - Example of reading long two column csv list and sorting. Thread "memory usage multi value hash" ''' # Determine a set of unique column 1 values unique_set = set() with open('data1.txt') as f: for line in f: unique_set.add(line.split(',')[0]) print(unique_set) with open('data1.txt') as f: for x in unique_set: ls = [line.split(',')[1].rstrip() for line in f if line.split(',')[0].rstrip() == x] print(x.rstrip(), ','.join(ls)) f.seek(0) print ('\n Alternative solution with generators') with open('data1.txt') as f: for x in unique_set: gs = (line.split(',')[1].rstrip() for line in f if line.split(',')[0].rstrip() == x) s = '' for ds in gs: s = s + ds print(x.rstrip(), s) f.seek(0) The output is: {'A', 'C', 'B'} A 1, 2, 3 C 9, 10, 11, 12, 90, 34, 322, 21 B 3, 4 Alternative solution with generators A 1 2 3 C 9 10 11 12 90 34 322 21 B 3 4 Notice that data sequence could be different, without any effect on output. OldAl. -- Algis http://akabaila.pcug.org.au/StructuralAnalysis.pdf -- http://mail.python.org/mailman/listinfo/python-list