i see the relaxation to allow duplicate field names was done on purpose,
since some data sources can have dupes due to case insensitive resolution.

apparently the issue is now dealt with during query analysis.

although this might work for sql it does not seem a good thing for
DataFrame to me. it seems desirable that a DataFrame should have unique
column names. not having this guarantee will complicate building other DSLs
on top of DataFrame (this is how i ran into this issue). its also
counterintuitive... do R dataframes and pandas allow dupes in column names?
 On Jul 3, 2015 3:27 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:

> I think you can open up a jira, not sure if this PR
> <https://github.com/apache/spark/pull/2209/files> (SPARK-2890
> <https://issues.apache.org/jira/browse/SPARK-2890>) broke the validation
> piece.
>
> Thanks
> Best Regards
>
> On Fri, Jul 3, 2015 at 4:29 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i am surprised this is allowed...
>>
>> scala> sqlContext.sql("select name as boo, score as boo from
>> candidates").schema
>>
>> res7: org.apache.spark.sql.types.StructType =
>> StructType(StructField(boo,StringType,true),
>> StructField(boo,IntegerType,true))
>>
>>
>> should StructType check for duplicate field names?
>>
>
>

Reply via email to