Not sure what's happening here, but I guess it's probably a dependency
version issue. Could you please give vanilla Apache Spark a try to see
whether its a CDH specific issue or not?
Cheng
On 9/17/15 11:44 PM, Chengi Liu wrote:
Hi,
I did some digging..
I believe the error is caused by jets3
Hi,
I did some digging..
I believe the error is caused by jets3t jar.
Essentially these lines
locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore',
'java/net/URI', 'org/apache/hadoop/conf/Configuration',
'org/apache/hadoop/fs/s3/S3Credentials',
'org/jets3t/service/security/AWSCr
Thanks for the detailed information!
Now I can confirm that this is a backwards-compatibility issue. The data
written by parquet 1.6rc7 follows the standard LIST structure. However,
Spark SQL still uses old parquet-avro style two-level structures, which
causes the problem.
Cheng
On 4/27/15
FYI,
Parquet schema output:
message pig_schema {
optional binary cust_id (UTF8);
optional int32 part_num;
optional group ip_list (LIST) {
repeated group ip_t {
optional binary ip (UTF8);
}
}
optional group vid_list (LIST) {
repeated group vid_t {
optional binary
Had an offline discussion with Jianshi, the dataset was generated by Pig.
Jianshi - Could you please attach the output of "parquet-schema
"? I guess this is a Parquet format
backwards-compatibility issue. Parquet hadn't standardized
representation of LIST and MAP until recently, thus many syst
Had an offline discussion with Jianshi, the dataset was generated by Pig.
Jianshi - Could you please attach the output of "parquet-schema
"? I guess this is a Parquet format
backwards-compatibility issue. Parquet hadn't standardized
representation of LIST and MAP until recently, thus many syst
Hi Huai,
I'm using Spark 1.3.1.
You're right. The dataset is not generated by Spark. It's generated by Pig
using Parquet 1.6.0rc7 jars.
Let me see if I can send a testing dataset to you...
Jianshi
On Sat, Apr 25, 2015 at 2:22 AM, Yin Huai wrote:
> oh, I missed that. It is fixed in 1.3.0.
>
oh, I missed that. It is fixed in 1.3.0.
Also, Jianshi, the dataset was not generated by Spark SQL, right?
On Fri, Apr 24, 2015 at 11:09 AM, Ted Yu wrote:
> Yin:
> Fix Version of SPARK-4520 is not set.
> I assume it was fixed in 1.3.0
>
> Cheers
> Fix Version
>
> On Fri, Apr 24, 2015 at 11:00 A
Yin:
Fix Version of SPARK-4520 is not set.
I assume it was fixed in 1.3.0
Cheers
Fix Version
On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai wrote:
> The exception looks like the one mentioned in
> https://issues.apache.org/jira/browse/SPARK-4520. What is the version of
> Spark?
>
> On Fri, Apr 24,
The exception looks like the one mentioned in
https://issues.apache.org/jira/browse/SPARK-4520. What is the version of
Spark?
On Fri, Apr 24, 2015 at 2:40 AM, Jianshi Huang
wrote:
> Hi,
>
> My data looks like this:
>
> +---++--+
> | col_name |
10 matches
Mail list logo