Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-03 Thread Joshua Sorrell
>> >> Dataframes are essentially structured tables with schemas. So where does >>> the non typed data sit before it becomes structured if not in a traditional >>> RDD? >>> >>> For us almost all the processing comes before there is structure to it. >>&g

Does pyspark still lag far behind the Scala API in terms of features

2016-03-01 Thread Joshua Sorrell
I haven't used Spark in the last year and a half. I am about to start a project with a new team, and we need to decide whether to use pyspark or Scala. We are NOT a java shop. So some of the build tools/procedures will require some learning overhead if we go the Scala route. What I want to know is