Hey there Ignacio
Like Reynold said, It's related to your build of Spark, try to not compile
with Thrift.
Also, try to use this command to see what's the error and link to here.
sc.wholeTextFile("s3://my-directory/2015*/ignacio/*")
Ps( Are you using boto to connect? Which version?)
Igor
On
Maybe an incompatible Hive package or Hive metastore?
On Tue, Jun 2, 2015 at 3:25 PM, Ignacio Zendejas wrote:
> From RELEASE:
>
> "Spark 1.3.1 built for Hadoop 2.4.0
>
> Build flags: -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests
> -Pkinesis-asl -Pspark-ganglia-lgpl -Phadoop-provided -Ph
>From RELEASE:
"Spark 1.3.1 built for Hadoop 2.4.0
Build flags: -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests
-Pkinesis-asl -Pspark-ganglia-lgpl -Phadoop-provided -Phive
-Phive-thriftserver
"
And this stacktrace may be more useful:
http://pastebin.ca/3016483
On Tue, Jun 2, 2015 at 3:13
What version of Spark is this?
On Tue, Jun 2, 2015 at 3:13 PM, Ignacio Zendejas wrote:
> I've run into an error when trying to create a dataframe. Here's the code:
>
> --
> from pyspark import StorageLevel
> from pyspark.sql import Row
>
> table = 'blah'
> ssc = HiveContext(sc)
>
> data = sc.tex
I've run into an error when trying to create a dataframe. Here's the code:
--
from pyspark import StorageLevel
from pyspark.sql import Row
table = 'blah'
ssc = HiveContext(sc)
data = sc.textFile('s3://bucket/some.tsv')
def deserialize(s):
p = s.strip().split('\t')
p[-1] = float(p[-1])
ret