RE: Accumulo Storage Manager

peter.mar...@baesystems.com Thu, 10 Sep 2015 06:03:23 -0700

Hi Josh,

At this stage I don't know whether there's anything wrong with Hive or it's 
just user error.
Perhaps if I go through what I have done you can see where the error lies.
Unfortunately this is going to be wordy. Apologies in advance for the long 
email.


So I created a "normal" table in HDFS with a variety of column types like this:

        CREATE TABLE employees4 (
         rowid STRING,
         flag BOOLEAN,
         number INT,
         bignum BIGINT,
         name STRING,
         salary FLOAT,
         bigsalary DOUBLE,
         numbers ARRAY<INT>,
         floats ARRAY<DOUBLE>,
         subordinates ARRAY<STRING>,
         deductions MAP<STRING, FLOAT>,
         namedNumbers MAP<STRING, INT>,
         address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>);

And I put some data into it and I can see the data:

hive> SELECT * FROM employees4;
OK
row1    true    100     7       John Doe        100000.0        100000.0        
[13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0]   ["Mary Smith","Todd Jones"]     
{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1} {"nameOne":123,"Name 
Two":49,"The Third Man":-1}        {"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}
row2    false   7       100     Mary Smith      100000.0        80000.0 
[13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0,1001.0]    ["Bill King"]   
{"Federal Taxes":0.2,"State Taxes":0.05,"Insurance":0.1}{"nameOne":123,"Name 
Two":49,"The Third Man":-1} {"street":"100 Ontario 
St.","city":"Chicago","state":"IL","zip":60601}
row3    false   3245    877878  Todd Jones      100000.0        70000.0 
[13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0,2.0]       []      {"Federal 
Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}       {"nameOne":123,"Name 
Two":49,"The Third Man":-1} {"street":"200 Chicago Ave.","city":"Oak 
Park","state":"IL","zip":60700}
row4    true    877878  3245    Bill King       100000.0        60000.0 
[13,23,-1,1001] [3.14159,2.71828,-1.1,1001.0,1001.0,1001.0,1001.0]      []      
{"Federal Taxes":0.15,"State Taxes":0.03,"Insurance":0.1}        
{"nameOne":123,"Name Two":49,"The Third Man":-1}        {"street":"300 Obscure 
Dr.","city":"Obscuria","state":"IL","zip":60100}
Time taken: 0.535 seconds, Fetched: 4 row(s)

Everything looks fine.
Now I create a Hive table stored in Accumulo:

        DROP TABLE IF EXISTS accumulo_table4;
        CREATE TABLE accumulo_table4 (
         rowid STRING,
         flag BOOLEAN,
         number INT,
         bignum BIGINT,
         name STRING,
         salary FLOAT,
         bigsalary DOUBLE,
         numbers ARRAY<INT>,
         floats ARRAY<DOUBLE>,
         subordinates ARRAY<STRING>,
         deductions MAP<STRING, FLOAT>,
         namednumbers MAP<STRING, INT>,
         address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
        STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
        WITH SERDEPROPERTIES('accumulo.columns.mapping' = 
':rowid,person:flag#binary,person:number#binary,person:bignum#binary,person:name,person:salary#binary,person:bigsalary#binary,person:numbers#binary,person:floats,person:subordinates,deductions:*,namednumbers:*,person:address');

(Note that I am only really interested in storing the values in "binary".)
Now I can load the Accumulo table from the normal table:

        INSERT OVERWRITE TABLE accumulo_table4 SELECT * FROM employees4;

And I can query the data from the Accumulo table.

hive> SELECT * FROM accumulo_table4;
OK
row1    true    100     7       John Doe        100000.0        100000.0        
[null]  [null]  ["Mary Smith\u0003Todd Jones"]  {"Federal 
Taxes":0.2,"Insurance":0.1,"State Taxes":0.05}        {"Name Two":49,"The Third 
Man":-1,"nameOne":123} {"street":"1 Michigan 
Ave.\u0003Chicago\u0003IL\u000360600","city":null,"state":null,"zip":null}
row2    false   7       100     Mary Smith      100000.0        80000.0 [null]  
[null]  ["Bill King"]   {"Federal Taxes":0.2,"Insurance":0.1,"State 
Taxes":0.05}        {"Name Two":49,"The Third Man":-1,"nameOne":123} 
{"street":"100 Ontario 
St.\u0003Chicago\u0003IL\u000360601","city":null,"state":null,"zip":null}
row3    false   3245    877878  Todd Jones      100000.0        70000.0 [null]  
[null]  []      {"Federal Taxes":0.15,"Insurance":0.1,"State Taxes":0.03}       
{"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"200 Chicago 
Ave.\u0003Oak Park\u0003IL\u000360700","city":null,"state":null,"zip":null}
row4    true    877878  3245    Bill King       100000.0        60000.0 [null]  
[null]  []      {"Federal Taxes":0.15,"Insurance":0.1,"State Taxes":0.03}       
{"Name Two":49,"The Third Man":-1,"nameOne":123} {"street":"300 Obscure 
Dr.\u0003Obscuria\u0003IL\u000360100","city":null,"state":null,"zip":null}
Time taken: 0.109 seconds, Fetched: 4 row(s)

Notice that the columns with type ARRAY<INT>and ARRAY<DOUBLE> are empty.
I assume that this means that there is something wrong and the Hive Storage 
Handler is returning a null?
When I use the accumulo shell to look at the data stored in Accumulo

root@accumulo> scan -t accumulo_table4
row1 deductions:Federal Taxes []    0.2
row1 deductions:Insurance []    0.1
row1 deductions:State Taxes []    0.05
row1 namednumbers:Name Two []    49
row1 namednumbers:The Third Man []    -1
row1 namednumbers:nameOne []    123
row1 person:address []    1 Michigan Ave.\x03Chicago\x03IL\x0360600
row1 person:bignum []    \x00\x00\x00\x00\x00\x00\x00\x07
row1 person:bigsalary []    @\xF8j\x00\x00\x00\x00\x00
row1 person:flag []    \x01
row1 person:floats []    3.14159\x032.71828\x03-1.1\x031001.0
row1 person:name []    John Doe
row1 person:number []    \x00\x00\x00d
row1 person:numbers []    
\x00\x00\x00\x0D\x03\x00\x00\x00\x17\x03\xFF\xFF\xFF\xFF\x03\x00\x00\x03\xE9
row1 person:salary []    G\xC3P\x00
row1 person:subordinates []    Mary Smith\x03Todd Jones

This shows that the columns of type INT and FLOAT have been converted to 
binary, which is great.
However the column with type ARRAY<INT> has had the individual values 
converted, but still has the field separator (0x03) present.
I thought that this might just be a conversion problem and so I hacked the 
Accumulo table to have the "correct" value:

row1 person:numbers []    
\x00\x00\x00\x0D\x00\x00\x00\x17\xFF\xFF\xFF\xFF\x00\x00\x03\xE9

However when I run the query the numbers field is still "[null]"
I'm happy to arrange to store whatever is need in Accumulo to make it work, I 
just need to know what that is.

The second issue is to do with the MAP<STRING,INT> column, in this case called 
namednumbers.
As you can so far it works fine and I am very happy :)
However, as I stated before, I really want everything stored in binary.
However when I change the table defintion to have a #binary I get an error:

        hive> CREATE TABLE accumulo_table4 (
                >  rowid STRING,
                >  flag BOOLEAN,
                >  number INT,
                >  bignum BIGINT,
                >  name STRING,
                >  salary FLOAT,
                >  bigsalary DOUBLE,
                >  numbers ARRAY<INT>,
                >  floats ARRAY<DOUBLE>,
                >  subordinates ARRAY<STRING>,
                >  deductions MAP<STRING, FLOAT>,
                >  namednumbers MAP<STRING, INT>,
                >  address STRUCT<street:STRING, city:STRING, state:STRING, 
zip:INT>)
                > STORED BY 
'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
                > WITH SERDEPROPERTIES('accumulo.columns.mapping' = 
':rowid,person:flag#binary,person:number#binary,person:bignum#binary,person:name,person:salary#binary,person:bigsalary#binary,person:numbers#binary,person:floats,person:subordinates,deductions:*,namednumbers:*#binary,person:address');
        FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: 
Expected map encoding for a map specification, namednumbers:* with encoding 
binary

I thought that maybe this is because the syntax "column_family:*#binary" is too 
much. So I try using a default.

        DROP TABLE IF EXISTS accumulo_table4;
        CREATE TABLE accumulo_table4 (
         rowid STRING,
         flag BOOLEAN,
         number INT,
         bignum BIGINT,
         name STRING,
         salary FLOAT,
         bigsalary DOUBLE,
         numbers ARRAY<INT>,
         floats ARRAY<DOUBLE>,
         subordinates ARRAY<STRING>,
         deductions MAP<STRING, FLOAT>,
         namednumbers MAP<STRING, INT>,
         address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>)
        STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
        WITH SERDEPROPERTIES('accumulo.columns.mapping' = 
':rowid,person:flag,person:number,person:bignum,person:name,person:salary,person:bigsalary,person:numbers,person:floats,person:subordinates,deductions:*,namednumbers:*,person:address',
  "accumulo.default.storage" = "binary");

This table creation works, however when I try to insert the data I get a long 
error message, which follows.
However, before that I just want to say that I'm happy to look at the source if 
I have to.
I guess that I would appreciate a pointer as to the file name/path for the Hive 
Storage Manager code.
Many thanks in advance for any help.

Z

PS. I never thought of using  column_family with sequence numbers in the 
qualifiers for an array.
I will try that and get back to you.

Here's the conversion error:


hive> INSERT OVERWRITE TABLE accumulo_table4 SELECT * FROM employees4;
Query ID = hive_20150910125252_f6fb143e-13df-4e81-98d0-fe8391025dc7
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.


Status: Running (Executing on YARN cluster with App id 
application_1441875240043_0005)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED      1          0        0        1       4       0
--------------------------------------------------------------------------------
VERTICES: 00/01  [>>--------------------------] 0%    ELAPSED TIME: 24.54 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1441875240043_0005_1_00, 
diagnostics=[Task failed, taskId=task_1441875240043_0005_1_00_000000, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
        at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
        ... 17 more
], TaskAttempt 1 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
        at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
        ... 17 more
], TaskAttempt 2 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
        at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
        ... 17 more
], TaskAttempt 3 failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
        ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{"rowid":"row1","flag":true,"number":100,"bignum":7,"name":"John 
Doe","salary":100000.0,"bigsalary":100000.0,"numbers":[13,23,-1,1001],"floats":[3.14159,2.71828,-1.1,1001.0],"subordinates":["Mary
 Smith","Todd Jones"],"deductions":{"Federal Taxes":0.2,"State 
Taxes":0.05,"Insurance":0.1},"namednumbers":{"nameOne":123,"Name Two":49,"The 
Third Man":-1},"address":{"street":"1 Michigan 
Ave.","city":"Chicago","state":"IL","zip":60600}}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 16 more
Caused by: java.lang.RuntimeException: Hive internal error.
        at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:327)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeBinary(AccumuloRowSerializer.java:368)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:270)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.writeWithLevel(AccumuloRowSerializer.java:288)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.getSerializedValue(AccumuloRowSerializer.java:249)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serializeColumnMapping(AccumuloRowSerializer.java:148)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloRowSerializer.serialize(AccumuloRowSerializer.java:130)
        at 
org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe.serialize(AccumuloSerDe.java:119)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:660)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
        ... 17 more
]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
vertex_1441875240043_0005_1_00 [Map 1] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask
hive> DROP TABLE IF EXISTS accumulo_table4;                                     
                                                                                
                                         OK
Time taken: 1.101 seconds


-----Original Message-----
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: 08 September 2015 22:15
To: user@hive.apache.org
Subject: Re: Accumulo Storage Manager

For the Array support: it might have just been a missed test case and is just a 
bug. I don't recall how off the top of my head Arrays are intended to be 
serialized (if it's some numeric counter in the Accumulo CQ or just serializing 
all the elements in the array into the Accumulo Value). If it isn't working for 
you, feel free to open up a JIRA issue with the details and mention me so I 
notice it :). I can try to help figure out what's busted, and, if necessary, a 
fix.

For the Map support, what are you trying to do differently? Going from memory, 
I believe the support is for a fixed column family and an optional column 
qualifier prefix. This limits the entries in a Map to that column family, and 
allows you to place multiple maps into a given family for locality purposes 
(identifying the maps by qualifier-prefix, and getting Key uniqueness from the 
qualifier-suffix). There isn't much flexibility in this regard for alternate 
serialization approaches -- the considerations at the time were for a 
general-purpose schema that you don't really have to think about (you just 
think SQL).

- Josh
Please consider the environment before printing this email. This message should 
be regarded as confidential. If you have received this email in error please 
notify the sender and destroy it immediately. Statements of intent shall only 
become binding when confirmed in hard copy by an authorised signatory. The 
contents of this email may relate to dealings with other companies under the 
control of BAE Systems Applied Intelligence Limited, details of which can be 
found at http://www.baesystems.com/Businesses/index.htm.

RE: Accumulo Storage Manager

Reply via email to