Yes, It is very strange and also very opposite to my belief on Spark SQL on
hive tables.
I am facing this issue on HDP setup on which COMPACTION is required only
once.
On the other hand, Apache setup doesn't required compaction even once.
May be something got triggered on meta-store after compact
Try this,
hive> create table default.foo(id int) clustered by (id) into 2 buckets
STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive> insert into default.foo values(10);
scala> sqlContext.table("default.foo").count // Gives 0, which is wrong
because data is still in delta files
Now run
Hi Varadharajan,
That is the point, Spark SQL is able to recognize delta files. See below
directory structure, ONE BASE (43 records) and one DELTA (created after
last insert). And I am able see last insert through Spark SQL.
*See below complete scenario :*
*Steps:*
- Inserted 43 records in
Hi Varadharajan,
Can you elaborate on (you quoted on previous mail) :
"I observed that hive transaction storage structure do not work with spark
yet"
If it is related to delta files created after each transaction and spark
would not be able recognize them. then I have a table *mytable *(ORC ,
BU
Compaction would have been triggered automatically as following properties
already set in *hive-site.xml*. and also *NO_AUTO_COMPACTION* property not
been set for these tables.
hive.compactor.initiator.on
true
hive.compactor.worker.threads
1
Do
Hi Varadharajan,
Thanks for your response.
Yes it is transnational table; See below *show create table. *
Table hardly have 3 records , and after triggering minor compaction on
tables , it start showing results on spark SQL.
> *ALTER TABLE hivespark COMPACT 'major';*
> *show create table hiv