That's clever. So we could implement a new enum value DuplicateHeaderMode.DEDUPLICATE...
Gary On Tue, Jun 20, 2023, 14:09 Bruno Kinoshita <ki...@apache.org> wrote: > Hi, > > Bruno says: > > "With Pandas it automatically deduplicates the column names. Maybe > > that's a feature that we could have in Commons CSV too?" > > > > What does that mean and actually do? Say I have column A with row 1 > > value of "X" and 2nd column A with row 1 value of 2. What do I get > > when I ask for column A row 1? > > > > When you ask for column A, you get the first column A with row 1 value of > "X". Then Pandas renames the other A column as "A.1". If you want to access > rows in the second A column, then you will use "A.1" as index. > > This is useful when you work with CSV's with many headers so that you still > have a valid name to use as index to access data, instead of having to rely > on the column index, for instance (or if you are using other libraries that > work with the column names, etc.) > > As a first cut whatever we do could/should maintain the existing > > behavior. We can change the default later by popular demand. > > > > +1 > > Cheers > > Bruno > > On Tue, 20 Jun 2023 at 13:39, Gary Gregory <garydgreg...@gmail.com> wrote: > > > Hi All, > > > > This thread is a follow-up to > > https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258 > > > > Bruno says: > > "With Pandas it automatically deduplicates the column names. Maybe > > that's a feature that we could have in Commons CSV too?" > > > > What does that mean and actually do? Say I have column A with row 1 > > value of "X" and 2nd column A with row 1 value of 2. What do I get > > when I ask for column A row 1? > > > > Seth says: > > "HeaderStrategy Interface > > Contains two functions: > > > > #normalizeHeaders(headings) - With given heading, output a list that > > fits with whatever the strategy is going for. > > #get(record, header) - Fetch value(s) based on given column name." > > > > I would see perhaps two interfaces so that lambdas might be used more > > simply. Maybe, needs an example. > > > > "I'm also wary that this may screw up existing projects that depend on > > allowing/disallowing duplicates. i.e. want to allow duplicates and > > handle things through indexes / iteration, so this didn't cause a > > problem for them and want to preserve header names, and so don't need > > the headers deduplicated." > > > > As a first cut whatever we do could/should maintain the existing > > behavior. We can change the default later by popular demand. > > > > Gary > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > >