fix = {label: max(values, key=len)} group[:] = [record._replace(**fix) for record in group]
Peter Otten wrote, on Friday, April 14, 2017 2:16 PM > > def complete(group, label): > > values = {row[label] for row in group} > > # get "TypeError: tuple indices must be integers, not str" > > Yes, the function expects row to be dict-like. However when > you change > > row[label] > > to > > getattr(row, label) > > this part of the code will work... > > > has_empty = not min(values, key=len) > > if len(values) - has_empty != 1: > > # no value or multiple values; manual intervention needed > > return False > > elif has_empty: > > for row in group: > > row[label] = max(values, key=len) > > but here you'll get an error. I made the experiment to change > everything > necessary to make it work with namedtuples, but you'll > probably find the > result a bit hard to follow: > > import csv > from collections import namedtuple, defaultdict > > INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 > in - test.csv" OUTFILE = "tmp.csv" > > def get_title(row): > return row.title > > def complete(group, label): > values = {getattr(row, label) for row in group} > has_empty = not min(values, key=len) > if len(values) - has_empty != 1: > # no value or multiple values; manual intervention needed > return False > elif has_empty: > # replace namedtuples in the group. Yes, it's ugly > fix = {label: max(values, key=len)} > group[:] = [record._replace(**fix) for record in group] > return True > > with open(INFILE) as infile: > rows = csv.reader(infile) > fieldnames = next(rows) > Record = namedtuple("Record", fieldnames) > groups = defaultdict(list) > for row in rows: > record = Record._make(row) > groups[get_title(record)].append(record) > > LABELS = ['Location', 'Kind', 'Notes'] > > # add missing values > for group in groups.values(): > for label in LABELS: > complete(group, label) > > # dump data (as a demo that you do not need the list of all > records) with open(OUTFILE, "w") as outfile: > writer = csv.writer(outfile) > writer.writerow(fieldnames) > writer.writerows( > record for group in groups.values() for record in group > ) > > One alternative is to keep the original and try to replace > the namedtuple > with the class suggested by Gregory Ewing. Then it should > suffice to also > change > > > elif has_empty: > > for row in group: > > row[label] = max(values, key=len) > > to > > > elif has_empty: > > for row in group: > setattr(row, label, max(values, key=len)) > > PS: Personally I would probably take the opposite direction > and use dicts > throughout... Ok, thank you. I haven't run it on a real input file yet, but this seems to work with the test file. Because the earlier incarnation defined 'values' as values = {row[label] for row in group} I'd incorrectly guessed what was going on in has_empty = not min(values, key=len). Now that values = {getattr(row, label) for row in group} works properly as you intended it to, I see you get the set of unique values for that label in that group, which makes the rest of it make sense. I know it's your "ugly" answer, but can I ask what the '**' in fix = {label: max(values, key=len)} group[:] = [record._replace(**fix) for record in group] means? I haven't seen it before, and I imagine it's one of the possible 'kwargs' in 'somenamedtuple._replace(kwargs)', but I have no idea where to look up the possible 'kwargs'. (probably short for keyword args) Also, I don't see how you get a set for values with the notation you used. Looks like if anything you've got a comprehension that should give you a dict. (But I haven't worked a lot with sets either.) Thanks -- https://mail.python.org/mailman/listinfo/python-list