Re: duplicate names in sql allowed?

Koert Kuipers Fri, 03 Jul 2015 08:45:07 -0700

i see the relaxation to allow duplicate field names was done on purpose,
since some data sources can have dupes due to case insensitive resolution.

apparently the issue is now dealt with during query analysis.

although this might work for sql it does not seem a good thing for
DataFrame to me. it seems desirable that a DataFrame should have unique
column names. not having this guarantee will complicate building other DSLs
on top of DataFrame (this is how i ran into this issue). its also
counterintuitive... do R dataframes and pandas allow dupes in column names?
 On Jul 3, 2015 3:27 AM, "Akhil Das" <[email protected]> wrote:

> I think you can open up a jira, not sure if this PR
> <https://github.com/apache/spark/pull/2209/files> (SPARK-2890
> <https://issues.apache.org/jira/browse/SPARK-2890>) broke the validation
> piece.
>
> Thanks
> Best Regards
>
> On Fri, Jul 3, 2015 at 4:29 AM, Koert Kuipers <[email protected]> wrote:
>
>> i am surprised this is allowed...
>>
>> scala> sqlContext.sql("select name as boo, score as boo from
>> candidates").schema
>>
>> res7: org.apache.spark.sql.types.StructType =
>> StructType(StructField(boo,StringType,true),
>> StructField(boo,IntegerType,true))
>>
>>
>> should StructType check for duplicate field names?
>>
>
>

Re: duplicate names in sql allowed?

Reply via email to