Re: createDataframe from s3 results in error

2015-06-07 Thread Igor Costa
Hey there Ignacio Like Reynold said, It's related to your build of Spark, try to not compile with Thrift. Also, try to use this command to see what's the error and link to here. sc.wholeTextFile("s3://my-directory/2015*/ignacio/*") Ps( Are you using boto to connect? Which version?) Igor On

Re: createDataframe from s3 results in error

2015-06-02 Thread Reynold Xin
Maybe an incompatible Hive package or Hive metastore? On Tue, Jun 2, 2015 at 3:25 PM, Ignacio Zendejas wrote: > From RELEASE: > > "Spark 1.3.1 built for Hadoop 2.4.0 > > Build flags: -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests > -Pkinesis-asl -Pspark-ganglia-lgpl -Phadoop-provided -Ph

Re: createDataframe from s3 results in error

2015-06-02 Thread Ignacio Zendejas
>From RELEASE: "Spark 1.3.1 built for Hadoop 2.4.0 Build flags: -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests -Pkinesis-asl -Pspark-ganglia-lgpl -Phadoop-provided -Phive -Phive-thriftserver " And this stacktrace may be more useful: http://pastebin.ca/3016483 On Tue, Jun 2, 2015 at 3:13

Re: createDataframe from s3 results in error

2015-06-02 Thread Reynold Xin
What version of Spark is this? On Tue, Jun 2, 2015 at 3:13 PM, Ignacio Zendejas wrote: > I've run into an error when trying to create a dataframe. Here's the code: > > -- > from pyspark import StorageLevel > from pyspark.sql import Row > > table = 'blah' > ssc = HiveContext(sc) > > data = sc.tex

createDataframe from s3 results in error

2015-06-02 Thread Ignacio Zendejas
I've run into an error when trying to create a dataframe. Here's the code: -- from pyspark import StorageLevel from pyspark.sql import Row table = 'blah' ssc = HiveContext(sc) data = sc.textFile('s3://bucket/some.tsv') def deserialize(s): p = s.strip().split('\t') p[-1] = float(p[-1]) ret