subject:"coalesce on SchemaRDD in pyspark"

Re: coalesce on SchemaRDD in pyspark

2014-09-12 Thread Davies Liu

On Fri, Sep 12, 2014 at 8:55 AM, Brad Miller wrote: > Hi Davies, > > Thanks for the quick fix. I'm sorry to send out a bug report on release day > - 1.1.0 really is a great release. I've been running the 1.1 branch for a > while and there's definitely lots of good stuff. > > For the workaround, I

Re: coalesce on SchemaRDD in pyspark

2014-09-12 Thread Brad Miller

Hi Davies, Thanks for the quick fix. I'm sorry to send out a bug report on release day - 1.1.0 really is a great release. I've been running the 1.1 branch for a while and there's definitely lots of good stuff. For the workaround, I think you may have meant: srdd2 = SchemaRDD(srdd._jschema_rdd.c

Re: coalesce on SchemaRDD in pyspark

2014-09-11 Thread Davies Liu

This is a bug, I had create an issue to track this: https://issues.apache.org/jira/browse/SPARK-3500 Also, there is PR to fix this: https://github.com/apache/spark/pull/2369 Before next bugfix release, you can workaround this by: srdd = sqlCtx.jsonRDD(rdd) srdd2 = SchemaRDD(srdd._schema_rdd.coal

coalesce on SchemaRDD in pyspark

2014-09-11 Thread Brad Miller

Hi All, I'm having some trouble with the coalesce and repartition functions for SchemaRDD objects in pyspark. When I run: sqlCtx.jsonRDD(sc.parallelize(['{"foo":"bar"}', '{"foo":"baz"}'])).coalesce(1) I get this error: Py4JError: An error occurred while calling o94.coalesce. Trace: py4j.Py4JEx