That's clever. So we could implement a new enum value
DuplicateHeaderMode.DEDUPLICATE...

Gary

On Tue, Jun 20, 2023, 14:09 Bruno Kinoshita <ki...@apache.org> wrote:

> Hi,
>
> Bruno says:
> > "With Pandas it automatically deduplicates the column names. Maybe
> > that's a feature that we could have in Commons CSV too?"
> >
> > What does that mean and actually do? Say I have column A with row 1
> > value of "X" and 2nd column A with row 1 value of 2. What do I get
> > when I ask for column A row 1?
> >
>
> When you ask for column A, you get the first column A with row 1 value of
> "X". Then Pandas renames the other A column as "A.1". If you want to access
> rows in the second A column, then you will use "A.1" as index.
>
> This is useful when you work with CSV's with many headers so that you still
> have a valid name to use as index to access data, instead of having to rely
> on the column index, for instance (or if you are using other libraries that
> work with the column names, etc.)
>
> As a first cut whatever we do could/should maintain the existing
> > behavior. We can change the default later by popular demand.
> >
>
> +1
>
> Cheers
>
> Bruno
>
> On Tue, 20 Jun 2023 at 13:39, Gary Gregory <garydgreg...@gmail.com> wrote:
>
> > Hi All,
> >
> > This thread is a follow-up to
> > https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258
> >
> > Bruno says:
> > "With Pandas it automatically deduplicates the column names. Maybe
> > that's a feature that we could have in Commons CSV too?"
> >
> > What does that mean and actually do? Say I have column A with row 1
> > value of "X" and 2nd column A with row 1 value of 2. What do I get
> > when I ask for column A row 1?
> >
> > Seth says:
> > "HeaderStrategy Interface
> > Contains two functions:
> >
> > #normalizeHeaders(headings) - With given heading, output a list that
> > fits with whatever the strategy is going for.
> > #get(record, header) - Fetch value(s) based on given column name."
> >
> > I would see perhaps two interfaces so that lambdas might be used more
> > simply. Maybe, needs an example.
> >
> > "I'm also wary that this may screw up existing projects that depend on
> > allowing/disallowing duplicates. i.e. want to allow duplicates and
> > handle things through indexes / iteration, so this didn't cause a
> > problem for them and want to preserve header names, and so don't need
> > the headers deduplicated."
> >
> > As a first cut whatever we do could/should maintain the existing
> > behavior. We can change the default later by popular demand.
> >
> > Gary
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>

Reply via email to