In Spark 1.2 I used to be able to do this:
scala>
org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType("struct")
res30: org.apache.spark.sql.catalyst.types.DataType =
StructType(List(StructField(int,LongType,true)))
That is, the name of a column can be a keyword like "int". This is no
longer t
his in your logs which indicates that it is a read that
>> starts from an offset and reading one split size (64MB) worth of data:
>>
>> 14/11/20 15:39:45 [Executor task launch worker-1 ] INFO HadoopRDD: Input
>> split: s3n://mybucket/myfile:335544320+67108864
>> On Nov 22, 2
Err I meant #1 :)
- Nitay
Founder & CTO
On Sat, Nov 22, 2014 at 10:20 AM, Nitay Joffe wrote:
> Anyone have any thoughts on this? Trying to understand especially #2 if
> it's a legit bug or something I'm doing wrong.
>
> - Nitay
> Founder & CTO
>
>
&g
Anyone have any thoughts on this? Trying to understand especially #2 if
it's a legit bug or something I'm doing wrong.
- Nitay
Founder & CTO
On Thu, Nov 20, 2014 at 11:54 AM, Nitay Joffe wrote:
> I have a simple S3 job to read a text file and do a line count.
> Sp
I have a simple S3 job to read a text file and do a line count.
Specifically I'm doing *sc.textFile("s3n://mybucket/myfile").count*.The
file is about 1.2GB. My setup is standalone spark cluster with 4 workers
each with 2 cores / 16GB ram. I'm using branch-1.2 code built against
hadoop 2.4 (though I