Yes, It is very strange and also very opposite to my belief on Spark SQL on
hive tables.
I am facing this issue on HDP setup on which COMPACTION is required only
once.
On the other hand, Apache setup doesn't required compaction even once.
May be something got triggered on meta-store after compact
That's interesting. I'm not sure why first compaction is needed but not on
the subsequent inserts. May be its just to create few metadata. Thanks for
clarifying this :)
On Tue, Feb 23, 2016 at 2:15 PM, @Sanjiv Singh
wrote:
> Try this,
>
>
> hive> create table default.foo(id int) clustered by (id
Try this,
hive> create table default.foo(id int) clustered by (id) into 2 buckets
STORED AS ORC TBLPROPERTIES ('transactional'='true');
hive> insert into default.foo values(10);
scala> sqlContext.table("default.foo").count // Gives 0, which is wrong
because data is still in delta files
Now run
This is the scenario i'm mentioning.. I'm not using Spark JDBC. Not sure if
its different.
Please walkthrough the below commands in the same order to understand the
sequence.
hive> create table default.foo(id int) clustered by (id) into 2 buckets
STORED AS ORC TBLPROPERTIES ('transactional'='true
Hi Varadharajan,
That is the point, Spark SQL is able to recognize delta files. See below
directory structure, ONE BASE (43 records) and one DELTA (created after
last insert). And I am able see last insert through Spark SQL.
*See below complete scenario :*
*Steps:*
- Inserted 43 records in
Hi Sanjiv,
Yes.. If we make use of Hive JDBC we should be able to retrieve all the
rows since it is hive which processes the query. But i think the problem
with Hive JDBC is that there are two layers of processing, hive and then at
spark with the result set. And another one is performance is limit
Hi Varadharajan,
Can you elaborate on (you quoted on previous mail) :
"I observed that hive transaction storage structure do not work with spark
yet"
If it is related to delta files created after each transaction and spark
would not be able recognize them. then I have a table *mytable *(ORC ,
BU
Actually the auto compaction if enabled is triggered based on the volume of
changes. It doesn't automatically run after every insert. I think its
possible to reduce the thresholds but that might reduce performance by a
big margin. As of now, we do compaction after the batch insert completes.
The o
Compaction would have been triggered automatically as following properties
already set in *hive-site.xml*. and also *NO_AUTO_COMPACTION* property not
been set for these tables.
hive.compactor.initiator.on
true
hive.compactor.worker.threads
1
Do
Yes, I was burned down by this issue couple of weeks back. This also means
that after every insert job, compaction should be run to access new rows
from Spark. Sad that this issue is not documented / mentioned anywhere.
On Mon, Feb 22, 2016 at 9:27 AM, @Sanjiv Singh
wrote:
> Hi Varadharajan,
>
>
Hi Varadharajan,
Thanks for your response.
Yes it is transnational table; See below *show create table. *
Table hardly have 3 records , and after triggering minor compaction on
tables , it start showing results on spark SQL.
> *ALTER TABLE hivespark COMPACT 'major';*
> *show create table hiv
Hi,
Is the transaction attribute set on your table? I observed that hive
transaction storage structure do not work with spark yet. You can confirm
this by looking at the transactional attribute in the output of "desc
extended " in hive console.
If you'd need to access transactional table, conside
Hi,
I have observed that Spark SQL is not returning records for hive bucketed
ORC tables on HDP.
On spark SQL , I am able to list all tables , but queries on hive bucketed
tables are not returning records.
I have also tried the same for non-bucketed hive tables. it is working fine.
Same is
13 matches
Mail list logo