Hi,

Bruno says:
> "With Pandas it automatically deduplicates the column names. Maybe
> that's a feature that we could have in Commons CSV too?"
>
> What does that mean and actually do? Say I have column A with row 1
> value of "X" and 2nd column A with row 1 value of 2. What do I get
> when I ask for column A row 1?
>

When you ask for column A, you get the first column A with row 1 value of
"X". Then Pandas renames the other A column as "A.1". If you want to access
rows in the second A column, then you will use "A.1" as index.

This is useful when you work with CSV's with many headers so that you still
have a valid name to use as index to access data, instead of having to rely
on the column index, for instance (or if you are using other libraries that
work with the column names, etc.)

As a first cut whatever we do could/should maintain the existing
> behavior. We can change the default later by popular demand.
>

+1

Cheers

Bruno

On Tue, 20 Jun 2023 at 13:39, Gary Gregory <garydgreg...@gmail.com> wrote:

> Hi All,
>
> This thread is a follow-up to
> https://github.com/apache/commons-csv/pull/309#issuecomment-1441456258
>
> Bruno says:
> "With Pandas it automatically deduplicates the column names. Maybe
> that's a feature that we could have in Commons CSV too?"
>
> What does that mean and actually do? Say I have column A with row 1
> value of "X" and 2nd column A with row 1 value of 2. What do I get
> when I ask for column A row 1?
>
> Seth says:
> "HeaderStrategy Interface
> Contains two functions:
>
> #normalizeHeaders(headings) - With given heading, output a list that
> fits with whatever the strategy is going for.
> #get(record, header) - Fetch value(s) based on given column name."
>
> I would see perhaps two interfaces so that lambdas might be used more
> simply. Maybe, needs an example.
>
> "I'm also wary that this may screw up existing projects that depend on
> allowing/disallowing duplicates. i.e. want to allow duplicates and
> handle things through indexes / iteration, so this didn't cause a
> problem for them and want to preserve header names, and so don't need
> the headers deduplicated."
>
> As a first cut whatever we do could/should maintain the existing
> behavior. We can change the default later by popular demand.
>
> Gary
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to