On Tue, 20 Jun 2023 at 12:39, Gary Gregory <garydgreg...@gmail.com> wrote: > > Hi All, > > This thread is a follow-up to > https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258 > > Bruno says: > "With Pandas it automatically deduplicates the column names. Maybe > that's a feature that we could have in Commons CSV too?" > > What does that mean and actually do? Say I have column A with row 1 > value of "X" and 2nd column A with row 1 value of 2. What do I get > when I ask for column A row 1? > > Seth says: > "HeaderStrategy Interface > Contains two functions: > > #normalizeHeaders(headings) - With given heading, output a list that > fits with whatever the strategy is going for. > #get(record, header) - Fetch value(s) based on given column name." > > I would see perhaps two interfaces so that lambdas might be used more > simply. Maybe, needs an example. > > "I'm also wary that this may screw up existing projects that depend on > allowing/disallowing duplicates. i.e. want to allow duplicates and > handle things through indexes / iteration, so this didn't cause a > problem for them and want to preserve header names, and so don't need > the headers deduplicated." > > As a first cut whatever we do could/should maintain the existing > behavior. We can change the default later by popular demand.
That will be a breaking change, so I would be against that. > Gary > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org