Hi,

There is  a storage handler that allows a table in Hive created to load
data into Google BigQuery table. There is relevant GitHub link here

GoogleCloudDataproc/hive-bigquery-storage-handler: Hive Storage Handler for
interoperability between BigQuery and Apache Hive (github.com)
<https://github.com/GoogleCloudDataproc/hive-bigquery-storage-handler>

Ok so basically create a table in Hive with available jar file as below

DROP TABLE IF EXISTS test.testmebq;
CREATE TABLE test.testmebq(
       ID BIGINT
    )
STORED BY
 'com.google.cloud.hadoop.io.bigquery.hive.HiveBigQueryStorageHandler'
 TBLPROPERTIES (
 'mapred.bq.input.dataset.id'='test',
 'mapred.bq.input.table.id'='testme',
 'mapred.bq.project.id'='xxx',
 'mapred.bq.temp.gcs.path'='gs://tmp_storage_bucket/tmp',
 'mapred.bq.gcs.bucket'='etcbucket/hive/test'
)
;

Very simple table with one column called ID

In Google BigQuery I have also created a table called testme in test
dataset.

The specific jar file added to hive is ADD JAR
hive-bigquery-storage-handler-1.0-shaded.jar;

as per document in that link

Now if I start by adding the row in igQuery table test.testme, I can see
the row in Hive table test.tesmebq

insert into test.testme values(99)

and let us check that Hive table (running Hive in DEBUG mode to console)

hive> select * from test.testmebq; 2022-03-08 15:49:16,846 INFO [main]
conf.HiveConf: Using the default value passed in for log id:
ec1f7b38-7286-481a-b48b-44f7ed54c20a 2022-03-08 15:49:16,846 INFO [main]
session.SessionState: Updating thread name to
ec1f7b38-7286-481a-b48b-44f7ed54c20a main 2022-03-08 15:49:16,846 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: Compiling
command(queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b):
select * from test.testmebq 2022-03-08 15:49:16,892 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Starting
Semantic Analysis 2022-03-08 15:49:16,892 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Completed
phase 1 of Semantic Analysis 2022-03-08 15:49:16,892 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Get
metadata for source tables 2022-03-08 15:49:16,909 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Get
metadata for subqueries 2022-03-08 15:49:16,909 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Get
metadata for destination tables 2022-03-08 15:49:16,913 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Completed
getting MetaData in Semantic Analysis 2022-03-08 15:49:16,914 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] avro.AvroSerDe:
AvroSerde::initialize(): Preset value of avro.schema.literal == null
2022-03-08 15:49:16,969 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
parse.CalcitePlanner: Get metadata for source tables 2022-03-08
15:49:16,986 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
parse.CalcitePlanner: Get metadata for subqueries 2022-03-08 15:49:16,986
INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Get
metadata for destination tables 2022-03-08 15:49:16,988 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] avro.AvroSerDe:
AvroSerde::initialize(): Preset value of avro.schema.literal == null
2022-03-08 15:49:16,988 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
common.FileUtils: Creating directory if it doesn't exist:
file:/tmp/hive/ec1f7b38-7286-481a-b48b-44f7ed54c20a/hive_2022-03-08_15-49-16_847_7187167996261551164-1/-mr-10001/.hive-staging_hive_2022-03-08_15-49-16_847_7187167996261551164-1
2022-03-08 15:49:16,988 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
parse.CalcitePlanner: CBO Succeeded; optimized logical plan. 2022-03-08
15:49:16,988 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
ppd.OpProcFactory: Processing for FS(2) 2022-03-08 15:49:16,988 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ppd.OpProcFactory: Processing
for SEL(1) 2022-03-08 15:49:16,988 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ppd.OpProcFactory: Processing
for TS(0) 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
hive.HiveBigQueryStorageHandler: Configuring MapReduce Job Input Properties
(if not provided) from Hive Table properties 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Completed
plan generation 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] parse.CalcitePlanner: Not
eligible for results caching - no mr/tez/spark jobs 2022-03-08 15:49:16,989
INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: Semantic
Analysis Completed (retrial = false) 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: Returning Hive
schema: Schema(fieldSchemas:[FieldSchema(name:testmebq.id, type:bigint,
comment:null)], properties:null) 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] avro.AvroSerDe:
AvroSerde::initialize(): Preset value of avro.schema.literal == null
2022-03-08 15:49:16,989 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
exec.TableScanOperator: Initializing operator TS[0] 2022-03-08 15:49:16,989
INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main] exec.SelectOperator:
Initializing operator SEL[1] 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] exec.SelectOperator: SELECT
struct<id:bigint> 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] exec.ListSinkOperator:
Initializing operator LIST_SINK[3] 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: Completed compiling
command(queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b);
Time taken: 0.143 seconds 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] reexec.ReExecDriver: Execution
#1 of query 2022-03-08 15:49:16,989 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] lockmgr.DbTxnManager: Setting
lock request transaction to txnid:18396 for
queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b
2022-03-08 15:49:16,990 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
lockmgr.DbLockManager: Requesting:
queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b
LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE,
dbname:test, tablename:testmebq, operationType:SELECT,
isTransactional:false)], txnid:18396, user:hduser, hostname:rhes76,
agentInfo:hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b)
2022-03-08 15:49:17,019 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
lockmgr.DbLockManager: Response to
queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b
LockResponse(lockid:116445, state:ACQUIRED) 2022-03-08 15:49:17,046 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: Executing
command(queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b):
select * from test.testmebq 2022-03-08 15:49:17,046 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: Completed executing
command(queryId=hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b);
Time taken: 0.0 seconds OK 2022-03-08 15:49:17,046 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] ql.Driver: OK 2022-03-08
15:49:17,046 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
lockmgr.DbTxnManager: Stopped heartbeat for query:
hduser_20220308154916_a573a00c-9de7-49d9-9e4d-689f3e81559b 2022-03-08
15:49:17,061 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
hive.WrappedBigQueryAvroInputFormat: Column projection:id and filter
text:null and BQ Filter text: {} 2022-03-08 15:49:19,026 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
hive.WrappedBigQueryAvroInputFormat: Split[0] =
(name='projects/axial-glow-224522/locations/us/streams/EgxfT1MwU0FJMGtvUUoaAmpkGgJpcigB',
schema='{ "type": "record", "name": "__root__", "fields": [ { "name": "id",
"type": [ "null", "long" ] } ] }', limit='2') 2022-03-08 15:49:19,026 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
hive.WrappedBigQueryAvroInputFormat: mapreduceInputSplit is
(name='projects/axial-glow-224522/locations/us/streams/EgxfT1MwU0FJMGtvUUoaAmpkGgJpcigB',
schema='{ "type": "record", "name": "__root__", "fields": [ { "name": "id",
"type": [ "null", "long" ] } ] }', limit='2'), class is
com.google.cloud.hadoop.io.bigquery.DirectBigQueryInputFormat$DirectBigQueryInputSplit
2022-03-08 15:49:19,434 WARN [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
avro.AvroDeserializer: Received different schemas. Have to re-encode:
{"type":"record","name":"__root__","fields":[{"name":"id","type":["null","long"]}]}
SIZE{null=org.apache.hadoop.hive.serde2.avro.AvroDeserializer$SchemaReEncoder@3077e4aa}
ID null 2022-03-08 15:49:19,692 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a
main] exec.TableScanOperator: RECORDS_OUT_INTERMEDIATE:0,
RECORDS_OUT_OPERATOR_TS_0:1, 2022-03-08 15:49:19,692 INFO
[ec1f7b38-7286-481a-b48b-44f7ed54c20a main] exec.SelectOperator:
RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_SEL_1:1, 2022-03-08
15:49:19,692 INFO [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
exec.ListSinkOperator: RECORDS_OUT_INTERMEDIATE:0,
RECORDS_OUT_OPERATOR_LIST_SINK_3:1, 99 Time taken: 0.215 seconds, Fetched:
1 row(s)

OK I can see that id = 99 there


Now let me insert a row to Hive table


    >  insert into test.testmebq values(11);

exec.MapOperator: Initializing operator MAP[0]

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
mr.ExecMapper:

<MAP>Id =0

  <Children>null

  <\Children>

  <Parent><\Parent>

<\MAP>

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.TableScanOperator: Initializing operator TS[0]

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.SelectOperator: Initializing operator SEL[1]

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.SelectOperator: SELECT null

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.UDTFOperator: Initializing operator UDTF[2]

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.SelectOperator: Initializing operator SEL[3]

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.SelectOperator: SELECT struct<col1:int>

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: Initializing operator FS[5]

2022-03-08 15:51:56,348 INFO  [LocalJobRunner Map Task Executor #0]
avro.AvroSerDe: AvroSerde::initialize(): Preset value of avro.schema.literal
==
{"type":"record","name":"testmebq","namespace":"test","fields":[{"name":"id","type":["null","long"],"default":null}]}

2022-03-08 15:51:56,349 INFO  [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: Using serializer :
org.apache.hadoop.hive.serde2.avro.AvroSerDe@619f4105 and formatter :
com.google.cloud.hadoop.io.bigquery.hive.WrappedBigQueryAvroOutputFormat@2274f95f

2022-03-08 15:51:56,349 INFO  [LocalJobRunner Map Task Executor #0]
exec.Utilities: PLAN PATH =
file:/tmp/hive/ec1f7b38-7286-481a-b48b-44f7ed54c20a/hive_2022-03-08_15-51-55_732_3190209878982735306-1/-mr-10002/160d3b10-e96a-4b60-bbf6-f2fbdef3936d/map.xml

2022-03-08 15:51:56,349 INFO  [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: New Final Path: FS
hdfs://rhes75:9000/user/hive/warehouse/test.db/testmebq

2022-03-08 15:51:56,349 INFO  [LocalJobRunner Map Task Executor #0]
hive.WrappedBigQueryAvroOutputFormat: Set temporary output file to
test_testmebq_a85ef2ba8644

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: FS[5]: records written - 1

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.MapOperator: MAP[0]: records read - 1

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.MapOperator: DESERIALIZE_ERRORS:0, RECORDS_OUT_INTERMEDIATE:0,
RECORDS_IN:3, RECORDS_OUT_OPERATOR_MAP_0:0,

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.TableScanOperator: RECORDS_OUT_INTERMEDIATE:0,
RECORDS_OUT_OPERATOR_TS_0:1,

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.SelectOperator: RECORDS_OUT_INTERMEDIATE:0,
RECORDS_OUT_OPERATOR_SEL_1:1,

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.UDTFOperator: RECORDS_OUT_OPERATOR_UDTF_2:1,
RECORDS_OUT_INTERMEDIATE:0,

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.SelectOperator: RECORDS_OUT_OPERATOR_SEL_3:1,
RECORDS_OUT_INTERMEDIATE:0,

2022-03-08 15:51:56,919 INFO  [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: FS[5]: records written - 1

2022-03-08 15:51:57,272 WARN  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is
deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

2022-03-08 15:51:57,272 Stage-2 map = 0%,  reduce = 0%

2022-03-08 15:51:57,272 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
exec.Task: 2022-03-08 15:51:57,272 Stage-2 map = 0%,  reduce = 0%

2022-03-08 15:51:57,646 INFO  [LocalJobRunner Map Task Executor #0]
exec.FileSinkOperator: RECORDS_OUT_1_test.testmebq:1,
RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_FS_5:1,

2022-03-08 15:51:57,647 INFO  [LocalJobRunner Map Task Executor #0]
mapred.LocalJobRunner:

2022-03-08 15:51:57,647 INFO  [LocalJobRunner Map Task Executor #0]
mapred.Task:
Task:attempt_local1263701903_0004_m_000000_0 is done. And is in the process
of committing

2022-03-08 15:51:57,649 INFO  [LocalJobRunner Map Task Executor #0]
mapred.LocalJobRunner: map

2022-03-08 15:51:57,649 INFO  [LocalJobRunner Map Task Executor #0]
mapred.Task: Task 'attempt_local1263701903_0004_m_000000_0' done.

2022-03-08 15:51:57,649 INFO  [LocalJobRunner Map Task Executor #0]
mapred.LocalJobRunner: Finishing task:
attempt_local1263701903_0004_m_000000_0

2022-03-08 15:51:57,649 INFO  [Thread-385] mapred.LocalJobRunner: map task
executor complete.

2022-03-08 15:51:58,276 Stage-2 map = 100%,  reduce = 0%

2022-03-08 15:51:58,276 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
exec.Task: 2022-03-08 15:51:58,276 Stage-2 map = 100%,  reduce = 0%

Ended Job = job_local1263701903_0004

2022-03-08 15:51:58,276 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
exec.Task: Ended Job = job_local1263701903_0004

MapReduce Jobs Launched:

2022-03-08 15:51:58,277 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
ql.Driver: MapReduce Jobs Launched:

Stage-Stage-2:  HDFS Read: 1030549 HDFS Write: 0 SUCCESS

2022-03-08 15:51:58,277 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
ql.Driver: Stage-Stage-2:  HDFS Read: 1030549 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 0 msec

2022-03-08 15:51:58,277 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
ql.Driver: Total MapReduce CPU Time Spent: 0 msec

2022-03-08 15:51:58,277 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a
main] ql.Driver:
Completed executing
command(queryId=hduser_20220308155155_9c3bc173-6893-4a5b-a061-226e2ac9b42a);
Time taken: 2.297 seconds

OK

2022-03-08 15:51:58,277 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
ql.Driver: OK

2022-03-08 15:51:58,277 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
lockmgr.DbTxnManager: Stopped heartbeat for query:
hduser_20220308155155_9c3bc173-6893-4a5b-a061-226e2ac9b42a

Time taken: 2.558 seconds

2022-03-08 15:51:58,296 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
CliDriver: Time taken: 2.558 seconds

2022-03-08 15:51:58,296 INFO  [ec1f7b38-7286-481a-b48b-44f7ed54c20a main]
conf.HiveConf: Using the default value passed in for log id:
ec1f7b38-7286-481a-


So it shows OK and no errors


But the record is NOT added to the local Hive table!


So in summary we can read rows added to the BigQuery table through Hive but
cannot add any rows to Hive table itself?


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to