Re: Spark SQL is not returning records for hive bucketed tables on HDP

Varadharajan Mukundan Mon, 22 Feb 2016 00:09:07 -0800

Actually the auto compaction if enabled is triggered based on the volume of
changes. It doesn't automatically run after every insert. I think its
possible to reduce the thresholds but that might reduce performance by a
big margin. As of now, we do compaction after the batch insert completes.


The only other way to solve this problem as of now is to use Hive JDBC API.

On Mon, Feb 22, 2016 at 11:39 AM, @Sanjiv Singh <sanjiv.is...@gmail.com>
wrote:

> Compaction would have been triggered automatically as following properties
> already set in *hive-site.xml*. and also *NO_AUTO_COMPACTION* property
> not been set for these tables.
>
>
>     <property>
>
>       <name>hive.compactor.initiator.on</name>
>
>       <value>true</value>
>
>     </property>
>
>     <property>
>
>       <name>hive.compactor.worker.threads</name>
>
>       <value>1</value>
>
>     </property>
>
>
> Documentation is upset sometimes.
>
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Mon, Feb 22, 2016 at 9:49 AM, Varadharajan Mukundan <
> srinath...@gmail.com> wrote:
>
>> Yes, I was burned down by this issue couple of weeks back. This also
>> means that after every insert job, compaction should be run to access new
>> rows from Spark. Sad that this issue is not documented / mentioned anywhere.
>>
>> On Mon, Feb 22, 2016 at 9:27 AM, @Sanjiv Singh <sanjiv.is...@gmail.com>
>> wrote:
>>
>>> Hi Varadharajan,
>>>
>>> Thanks for your response.
>>>
>>> Yes it is transnational table; See below *show create table. *
>>>
>>> Table hardly have 3 records , and after triggering minor compaction on
>>> tables , it start showing results on spark SQL.
>>>
>>>
>>> > *ALTER TABLE hivespark COMPACT 'major';*
>>>
>>>
>>> > *show create table hivespark;*
>>>
>>>   CREATE TABLE `hivespark`(
>>>
>>>     `id` int,
>>>
>>>     `name` string)
>>>
>>>   CLUSTERED BY (
>>>
>>>     id)
>>>
>>>   INTO 32 BUCKETS
>>>
>>>   ROW FORMAT SERDE
>>>
>>>     'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
>>>
>>>   STORED AS INPUTFORMAT
>>>
>>>     'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>>>
>>>   OUTPUTFORMAT
>>>
>>>     'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
>>>
>>>   LOCATION
>>>
>>>     'hdfs://myhost:8020/apps/hive/warehouse/mydb.db/hivespark'
>>>   TBLPROPERTIES (
>>>
>>>     'COLUMN_STATS_ACCURATE'='true',
>>>
>>>     'last_modified_by'='root',
>>>
>>>     'last_modified_time'='1455859079',
>>>
>>>     'numFiles'='37',
>>>
>>>     'numRows'='3',
>>>
>>>     'rawDataSize'='0',
>>>
>>>     'totalSize'='11383',
>>>
>>>     'transactional'='true',
>>>
>>>     'transient_lastDdlTime'='1455864121') ;
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>> On Mon, Feb 22, 2016 at 9:01 AM, Varadharajan Mukundan <
>>> srinath...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is the transaction attribute set on your table? I observed that hive
>>>> transaction storage structure do not work with spark yet. You can confirm
>>>> this by looking at the transactional attribute in the output of "desc
>>>> extended <tablename>" in hive console.
>>>>
>>>> If you'd need to access transactional table, consider doing a major
>>>> compaction and then try accessing the tables
>>>>
>>>> On Mon, Feb 22, 2016 at 8:57 AM, @Sanjiv Singh <sanjiv.is...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> I have observed that Spark SQL is not returning records for hive
>>>>> bucketed ORC tables on HDP.
>>>>>
>>>>>
>>>>>
>>>>> On spark SQL , I am able to list all tables , but queries on hive
>>>>> bucketed tables are not returning records.
>>>>>
>>>>> I have also tried the same for non-bucketed hive tables. it is working
>>>>> fine.
>>>>>
>>>>>
>>>>>
>>>>> Same is working on plain Apache setup.
>>>>>
>>>>> Let me know if needs other details.
>>>>>
>>>>> Regards
>>>>> Sanjiv Singh
>>>>> Mob :  +091 9990-447-339
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> M. Varadharajan
>>>>
>>>> ------------------------------------------------
>>>>
>>>> "Experience is what you get when you didn't get what you wanted"
>>>>                -By Prof. Randy Pausch in "The Last Lecture"
>>>>
>>>> My Journal :- http://varadharajan.in
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks,
>> M. Varadharajan
>>
>> ------------------------------------------------
>>
>> "Experience is what you get when you didn't get what you wanted"
>>                -By Prof. Randy Pausch in "The Last Lecture"
>>
>> My Journal :- http://varadharajan.in
>>
>
>


-- 
Thanks,
M. Varadharajan

------------------------------------------------

"Experience is what you get when you didn't get what you wanted"
               -By Prof. Randy Pausch in "The Last Lecture"

My Journal :- http://varadharajan.in

Re: Spark SQL is not returning records for hive bucketed tables on HDP

Reply via email to