In <[EMAIL PROTECTED]>, Farraige wrote: > Let's say we have a table T1: > > A B C D E > --------------- > 1 4 5 7 7 > 3 4 0 0 0 > > and we call a method mergeTable(T1, T2, [0,1], [2,4]) > > It means that we would like to update columns C and E of table T1 with > data from table T2 but only in case the key columns A and B are equal > in both tables.... I grant that the given key is unique in both tables > so if I find a row with the same key in table T2 I do merging, stop and > go to next row in table T1... > > Let's say T2 looks following: > > A B C D E > --------------- > 2 2 8 8 8 > 1 4 9 9 9 > > So after execution of our mergeTable method, the table T1 should look > like : > > A B C D E > 1 4 9 7 9 > 3 4 0 0 0 > > The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was > no row in table T2 with key = 3 ,4 > > The main part of my algorithm now looks something like ... > > merge(t1, t2, keyColumns, columnsToBeUpdated) > > ....... > > for row_t1 in t1: > for row_t2 in t2: > if [row_t1[i] for i in keyColumns] == [row_t2[j] for j > in keyColumns]: > # the keys are the same > for colName in columnsToBeUpdated: > row_t1[colName] = row_t2[colName] > > # go outside the inner loop - we found a row with > # the same key in the table > break > > In my algorithm I have 2 for loops and I have no idea how to optimise > it (maybe with map? ) > I call this method for very large data and the performance is a > critical issue for me :(
Just go through the first table once and build a mapping key->row and then go through the second table once and look for each row if the key is in the mapping. If yes: update columns. This runs in O(2*rows) instead if O(rows**2). def update_table(table_a, table_b, key_columns, columns_to_be_updated): def get_key(row): return tuple(row[x] for x in key_columns) key2row = dict((get_key(row), row) for row in table_a) for row in table_b: row_to_be_updated = key2row.get(get_key(row)) if row_to_be_updated is not None: for column in columns_to_be_updated: row_to_be_updated[column] = row[column] def main(): table_a = [[1, 4, 5, 7, 7], [3, 4, 0, 0, 0]] table_b = [[2, 2, 8, 8, 8], [1, 4, 9, 9, 9]] update_table(table_a, table_b, (0, 1), (2, 4)) for row in table_a: print row Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list