I have the perfect counter example where some of the data scientists
prototype in Python and the production materials is done in Scala.
But I get your point, as a matter of fact I realised the toDF method took
parameters a little while after posting this.
However the toDF still needs you to go from a List to an RDD, or create a
useless Dataframe and call toDF on it re-creating a complete data
structure. I just feel that the createDataFrame(_: Seq) is not really
useful as it is, because I think there are practically no circumstances
where you'd want to create a DataFrame without column names.

I'm not implying a n-th overloaded method should be created, rather than
change the signature of the existing method with an optional Seq of column
names.

Regards,

Olivier.

Le dim. 3 mai 2015 à 07:44, Reynold Xin <r...@databricks.com> a écrit :

> Part of the reason is that it is really easy to just call toDF on Scala,
> and we already have a lot of createDataFrame functions.
>
> (You might find some of the cross-language differences confusing, but I'd
> argue most real users just stick to one language, and developers or
> trainers are the only ones that need to constantly switch between
> languages).
>
> On Sat, May 2, 2015 at 11:05 AM, Olivier Girardot <
> o.girar...@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> SQLContext.createDataFrame has different behaviour in Scala or Python :
>>
>> >>> l = [('Alice', 1)]
>> >>> sqlContext.createDataFrame(l).collect()
>> [Row(_1=u'Alice', _2=1)]
>> >>> sqlContext.createDataFrame(l, ['name', 'age']).collect()
>> [Row(name=u'Alice', age=1)]
>>
>> and in Scala :
>>
>> scala> val data = List(("Alice", 1), ("Wonderland", 0))
>> scala> sqlContext.createDataFrame(data, List("name", "score"))
>> <console>:28: error: overloaded method value createDataFrame with
>> alternatives: ... cannot be applied to ...
>>
>> What do you think about allowing in Scala too to have a Seq of column
>> names
>> for the sake of consistency ?
>>
>> Regards,
>>
>> Olivier.
>>
>
>

Reply via email to