Hi Josh,
At this stage I don't know whether there's anything wrong with Hive or it's
just user error.
Perhaps if I go through what I have done you can see where the error lies.
Unfortunately this is going to be wordy. Apologies in advance for the long
email.
So I created a "normal" table in HDFS with a variety of column types like this:
CREATE TABLE employees4 (
rowid STRING,
flag BOOLEAN,
number INT,
bignum BIGINT,
name STRING,
salary FLOAT,
bigsalary DOUBLE,
numbers ARRAY<INT>,
floats ARRAY<DOUBLE>,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
namedNumbers MAP<STRING, INT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>);
And I put some data into it and I can see the data:
hive> SELECT * FROM employees4;
OK
row1 true 100 7 John Doe 100000.0 100000.0 [13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0] ["Mary Smith","Todd Jones"] {"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1} {"nameOne":123,"Name Two":49,"The Third Man":-1} {"street":"1 Michigan
Ave.","city":"Chicago","state":"IL","zip":60600}
row2 false 7 100 Mary Smith 100000.0 80000.0 [13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0,1001.0] ["Bill King"] {"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1}{"nameOne":123,"Name Two":49,"The Third Man":-1} {"street":"100 Ontario
St.","city":"Chicago","state":"IL","zip":60601}
row3 false 3245 877878 Todd Jones 100000.0 70000.0 [13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0,2.0] [] {"Federal Taxes":0.15,"State
Taxes":0.03,"Insurance":0.1} {"nameOne":123,"Name Two":49,"The Third Man":-1} {"street":"200 Chicago Ave.","city":"Oak
Park","state":"IL","zip":60700}
row4 true 877878 3245 Bill King 100000.0 60000.0 [13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0,1001.0,1001.0,1001.0] [] {"Federal Taxes":0.15,"State
Taxes":0.03,"Insurance":0.1} {"nameOne":123,"Name Two":49,"The Third Man":-1} {"street":"300 Obscure
Dr.","city":"Obscuria","state":"IL","zip":60100}
Time taken: 0.535 seconds, Fetched: 4 row(s)
Everything looks fine.
Now I create a Hive table stored in Accumulo:
DROP TABLE IF EXISTS accumulo_table4;
CREATE TABLE accumulo_table4 (
rowid STRING,
flag BOOLEAN,
number INT,
bignum BIGINT,
name STRING,
salary FLOAT,
bigsalary DOUBLE,
numbers ARRAY<INT>,
floats ARRAY<DOUBLE>,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
namednumbers MAP<STRING, INT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES('accumulo.columns.mapping' =
':rowid,person:flag#binary,person:number#binary,person:bignum#binary,person:name,person:salary#binary,person:bigsalary#binary,person:numbers#binary,person:floats,person:subordinates,deductions:*,namednumbers:*,person:address');
(Note that I am only really interested in storing the values in "binary".)
Now I can load the Accumulo table from the normal table:
INSERT OVERWRITE TABLE accumulo_table4 SELECT * FROM employees4;
And I can query the data from the Accumulo table.
hive> SELECT * FROM accumulo_table4;
OK
row1 true 100 7 John Doe 100000.0 100000.0 [null] [null] ["Mary Smith\u0003Todd Jones"] {"Federal
Taxes":0.2,"Insurance":0.1,"State Taxes":0.05} {"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"1 Michigan
Ave.\u0003Chicago\u0003IL\u000360600","city":null,"state":null,"zip":null}
row2 false 7 100 Mary Smith 100000.0 80000.0 [null] [null] ["Bill King"] {"Federal Taxes":0.2,"Insurance":0.1,"State
Taxes":0.05} {"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"100 Ontario
St.\u0003Chicago\u0003IL\u000360601","city":null,"state":null,"zip":null}
row3 false 3245 877878 Todd Jones 100000.0 70000.0 [null] [null] [] {"Federal Taxes":0.15,"Insurance":0.1,"State Taxes":0.03}
{"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"200 Chicago Ave.\u0003Oak
Park\u0003IL\u000360700","city":null,"state":null,"zip":null}
row4 true 877878 3245 Bill King 100000.0 60000.0 [null] [null] [] {"Federal Taxes":0.15,"Insurance":0.1,"State Taxes":0.03}
{"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"300 Obscure
Dr.\u0003Obscuria\u0003IL\u000360100","city":null,"state":null,"zip":null}
Time taken: 0.109 seconds, Fetched: 4 row(s)
Notice that the columns with type ARRAY<INT>and ARRAY<DOUBLE> are empty.
I assume that this means that there is something wrong and the Hive Storage
Handler is returning a null?
When I use the accumulo shell to look at the data stored in Accumulo
root@accumulo> scan -t accumulo_table4
row1 deductions:Federal Taxes [] 0.2
row1 deductions:Insurance [] 0.1
row1 deductions:State Taxes [] 0.05
row1 namednumbers:Name Two [] 49
row1 namednumbers:The Third Man [] -1
row1 namednumbers:nameOne [] 123
row1 person:address [] 1 Michigan Ave.\x03Chicago\x03IL\x0360600
row1 person:bignum [] \x00\x00\x00\x00\x00\x00\x00\x07
row1 person:bigsalary [] @\xF8j\x00\x00\x00\x00\x00
row1 person:flag [] \x01
row1 person:floats [] 3.14159\x032.71828\x03-1.1\x031001.0
row1 person:name [] John Doe
row1 person:number [] \x00\x00\x00d
row1 person:numbers []
\x00\x00\x00\x0D\x03\x00\x00\x00\x17\x03\xFF\xFF\xFF\xFF\x03\x00\x00\x03\xE9
row1 person:salary [] G\xC3P\x00
row1 person:subordinates [] Mary Smith\x03Todd Jones
This shows that the columns of type INT and FLOAT have been converted to
binary, which is great.
However the column with type ARRAY<INT> has had the individual values
converted, but still has the field separator (0x03) present.
I thought that this might just be a conversion problem and so I hacked the Accumulo table
to have the "correct" value:
row1 person:numbers []
\x00\x00\x00\x0D\x00\x00\x00\x17\xFF\xFF\xFF\xFF\x00\x00\x03\xE9
However when I run the query the numbers field is still "[null]"
I'm happy to arrange to store whatever is need in Accumulo to make it work, I
just need to know what that is.
The second issue is to do with the MAP<STRING,INT> column, in this case called
namednumbers.
As you can so far it works fine and I am very happy :)
However, as I stated before, I really want everything stored in binary.
However when I change the table defintion to have a #binary I get an error:
hive> CREATE TABLE accumulo_table4 (
> rowid STRING,
> flag BOOLEAN,
> number INT,
> bignum BIGINT,
> name STRING,
> salary FLOAT,
> bigsalary DOUBLE,
> numbers ARRAY<INT>,
> floats ARRAY<DOUBLE>,
> subordinates ARRAY<STRING>,
> deductions MAP<STRING, FLOAT>,
> namednumbers MAP<STRING, INT>,
> address STRUCT<street:STRING, city:STRING, state:STRING,
zip:INT>)
> STORED BY
'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
> WITH SERDEPROPERTIES('accumulo.columns.mapping' =
':rowid,person:flag#binary,person:number#binary,person:bignum#binary,person:name,person:salary#binary,person:bigsalary#binary,person:numbers#binary,person:floats,person:subordinates,deductions:*,namednumbers:*#binary,person:address');
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException:
Expected map encoding for a map specification, namednumbers:* with encoding
binary
I thought that maybe this is because the syntax "column_family:*#binary" is too
much. So I try using a default.
DROP TABLE IF EXISTS accumulo_table4;
CREATE TABLE accumulo_table4 (
rowid STRING,
flag BOOLEAN,
number INT,
bignum BIGINT,
name STRING,
salary FLOAT,
bigsalary DOUBLE,
numbers ARRAY<INT>,
floats ARRAY<DOUBLE>,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
namednumbers MAP<STRING, INT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES('accumulo.columns.mapping' =
':rowid,person:flag,person:number,person:bignum,person:name,person:salary,person:bigsalary,person:numbers,person:floats,person:subordinates,deductions:*,namednumbers:*,person:address',
"accumulo.default.storage" = "binary");
This table creation works, however when I try to insert the data I get a long
error message, which follows.
However, before that I just want to say that I'm happy to look at the source if
I have to.
I guess that I would appreciate a pointer as to the file name/path for the Hive
Storage Manager code.
Many thanks in advance for any help.
Z
PS. I never thought of using column_family with sequence numbers in the
qualifiers for an array.
I will try that and get back to you.
Here's the conversion error:
hive> INSERT OVERWRITE TABLE accumulo_table4 SELECT * FROM employees4;
Query ID = hive_20150910125252_f6fb143e-13df-4e81-98d0-fe8391025dc7
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id
application_1441875240043_0005)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED
KILLED
--------------------------------------------------------------------------------
Map 1 FAILED 1 0 0 1 4 0
--------------------------------------------------------------------------------
VERTICES: 00/01 [>>--------------------------] 0% ELAPSED TIME: 24.54 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1441875240043_0005_1_00, diagnostics=[Task failed, taskId=task_1441875240043_0005_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
{"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd
Jones"],"deductions":{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan
Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
at
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
at
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
at
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The Third Man":-1},"address":{"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
at
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
at
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 17 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex
vertex_1441875240043_0005_1_00 [Map 1] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask
hive> DROP TABLE IF EXISTS accumulo_table4;
OK
Time taken: 1.101 seconds
-----Original Message-----
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 08 September 2015 22:15
To: user@hive.apache.org
Subject: Re: Accumulo Storage Manager
For the Array support: it might have just been a missed test case and is just a
bug. I don't recall how off the top of my head Arrays are intended to be
serialized (if it's some numeric counter in the Accumulo CQ or just serializing
all the elements in the array into the Accumulo Value). If it isn't working for
you, feel free to open up a JIRA issue with the details and mention me so I
notice it :). I can try to help figure out what's busted, and, if necessary, a
fix.
For the Map support, what are you trying to do differently? Going from memory,
I believe the support is for a fixed column family and an optional column
qualifier prefix. This limits the entries in a Map to that column family, and
allows you to place multiple maps into a given family for locality purposes
(identifying the maps by qualifier-prefix, and getting Key uniqueness from the
qualifier-suffix). There isn't much flexibility in this regard for alternate
serialization approaches -- the considerations at the time were for a
general-purpose schema that you don't really have to think about (you just
think SQL).
- Josh
Please consider the environment before printing this email. This message should
be regarded as confidential. If you have received this email in error please
notify the sender and destroy it immediately. Statements of intent shall only
become binding when confirmed in hard copy by an authorised signatory. The
contents of this email may relate to dealings with other companies under the
control of BAE Systems Applied Intelligence Limited, details of which can be
found at http://www.baesystems.com/Businesses/index.htm.