Re: duplicate names in sql allowed?

Koert Kuipers Fri, 03 Jul 2015 11:34:34 -0700

https://issues.apache.org/jira/browse/SPARK-8817


On Fri, Jul 3, 2015 at 11:43 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i see the relaxation to allow duplicate field names was done on purpose,
> since some data sources can have dupes due to case insensitive resolution.
>
> apparently the issue is now dealt with during query analysis.
>
> although this might work for sql it does not seem a good thing for
> DataFrame to me. it seems desirable that a DataFrame should have unique
> column names. not having this guarantee will complicate building other DSLs
> on top of DataFrame (this is how i ran into this issue). its also
> counterintuitive... do R dataframes and pandas allow dupes in column names?
>  On Jul 3, 2015 3:27 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:
>
>> I think you can open up a jira, not sure if this PR
>> <https://github.com/apache/spark/pull/2209/files> (SPARK-2890
>> <https://issues.apache.org/jira/browse/SPARK-2890>) broke the validation
>> piece.
>>
>> Thanks
>> Best Regards
>>
>> On Fri, Jul 3, 2015 at 4:29 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i am surprised this is allowed...
>>>
>>> scala> sqlContext.sql("select name as boo, score as boo from
>>> candidates").schema
>>>
>>> res7: org.apache.spark.sql.types.StructType =
>>> StructType(StructField(boo,StringType,true),
>>> StructField(boo,IntegerType,true))
>>>
>>>
>>> should StructType check for duplicate field names?
>>>
>>
>>

Re: duplicate names in sql allowed?

Reply via email to