https://issues.apache.org/jira/browse/SPARK-8817
On Fri, Jul 3, 2015 at 11:43 AM, Koert Kuipers <ko...@tresata.com> wrote: > i see the relaxation to allow duplicate field names was done on purpose, > since some data sources can have dupes due to case insensitive resolution. > > apparently the issue is now dealt with during query analysis. > > although this might work for sql it does not seem a good thing for > DataFrame to me. it seems desirable that a DataFrame should have unique > column names. not having this guarantee will complicate building other DSLs > on top of DataFrame (this is how i ran into this issue). its also > counterintuitive... do R dataframes and pandas allow dupes in column names? > On Jul 3, 2015 3:27 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote: > >> I think you can open up a jira, not sure if this PR >> <https://github.com/apache/spark/pull/2209/files> (SPARK-2890 >> <https://issues.apache.org/jira/browse/SPARK-2890>) broke the validation >> piece. >> >> Thanks >> Best Regards >> >> On Fri, Jul 3, 2015 at 4:29 AM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> i am surprised this is allowed... >>> >>> scala> sqlContext.sql("select name as boo, score as boo from >>> candidates").schema >>> >>> res7: org.apache.spark.sql.types.StructType = >>> StructType(StructField(boo,StringType,true), >>> StructField(boo,IntegerType,true)) >>> >>> >>> should StructType check for duplicate field names? >>> >> >>