[
https://issues.apache.org/jira/browse/SPARK-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413038#comment-15413038
]
Hyukjin Kwon commented on SPARK-16896:
--------------------------------------
I don't mind if you go ahead (I was looking at this problem though).
One thing I want to say is, we might better match the behaviour to
[read.csv|https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html]
in R if possible in this case.
In addition, we are handling {{nullValue}} in handling the header with making
numbers already. I guess we should clarify and write the behaviour in the PR
description including the cases in R.
Also, do not forget to follow
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark for
making a contribution.
> Loading csv with duplicate column names
> ---------------------------------------
>
> Key: SPARK-16896
> URL: https://issues.apache.org/jira/browse/SPARK-16896
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Aseem Bansal
>
> It would be great if the library allows us to load csv with duplicate column
> names. I understand that having duplicate columns in the data is odd but
> sometimes we get data that has duplicate columns. Getting upstream data like
> that can happen. We may choose to ignore them but currently there is no way
> to drop those as we are not able to load them at all. Currently as a
> pre-processing I loaded the data into R, changed the column names and then
> make a fixed version with which Spark Java API can work.
> But if talk about other options, e.g. R has read.csv which automatically
> takes care of such situation by appending a number to the column name.
> Also case sensitivity in column names can also cause problems. I mean if we
> have columns like
> ColumnName, columnName
> I may want to have them as separate. But the option to do this is not
> documented.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]