Hi Deepak,
Thanks for your response. The table is not bucketed or clustered. It can be 
seen below.

DROP TABLE IF EXISTS ${SCHEMA_NM}. daily_summary;
CREATE EXTERNAL TABLE ${SCHEMA_NM}.daily_summary
(
  bouncer VARCHAR(12),
  device_type VARCHAR(52),
  visitor_type VARCHAR(10),
  visit_origination_type VARCHAR(65),
  visit_origination_name VARCHAR(260),
  pg_domain_name VARCHAR(215),
  class1_id VARCHAR(650),
  class2_id VARCHAR(650),
  bouncers INT,
  rv_revenue DECIMAL(17,2),
  visits INT,
  active_page_view_time INT,
  total_page_view_time BIGINT,
  average_visit_duration INT,
  co_conversions INT,
  page_views INT,
  landing_page_url VARCHAR(1332),
  dt DATE
  
)
PARTITIONED BY (datelocal DATE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
LOCATION '${OUTPUT_PATH}/daily_summary/'
TBLPROPERTIES ('serialization.null.format'='');

MSCK REPAIR TABLE ${SCHEMA_NM}.daily_summary;

Regards,
Sujeet Singh Pardeshi
Software Specialist
SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 
013
off: +91-20-30418810  

 "When the solution is simple, God is answering…" 

-----Original Message-----
From: Deepak Jaiswal <djais...@hortonworks.com> 
Sent: 07 August 2018 PM 11:19
To: user@hive.apache.org
Subject: Re: Hive output file 000000_0

EXTERNAL

Hi Sujeet,

I am assuming that the table is bucketed? If so, then the name represents which 
bucket the file belongs to as Hive creates 1 file per bucket for each operation.

In this case, the file 000003_0 belongs to bucket 3.
To always have files named 000000_0, the table must be unbucketed.
I hope it helps.

Regards,
Deepak

On 8/7/18, 1:33 AM, "Sujeet Pardeshi" <sujeet.parde...@sas.com> wrote:

    Hi All,
    I am doing an Insert overwrite operation through a hive external table onto 
AWS S3. Hive creates a output file 000000_0 onto S3. However at times I am 
noticing that it creates file with other names like 0000003_0 etc. I always 
need to overwrite the existing file but with inconsistent file names I am 
unable to do so. How do I force hive to always create a consistent filename 
like 000000_0? Below is an example of how my code looks like, where tab_content 
is a hive external table.

    INSERT OVERWRITE TABLE tab_content
    PARTITION(datekey)
    select * from source

    Regards,
    Sujeet Singh Pardeshi
    Software Specialist
    SAS Research and Development (India) Pvt. Ltd.
    Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 
411 013
    off: +91-20-30418810

     "When the solution is simple, God is answering…"


Reply via email to