Assuming that special characters have been added by Windows platform as 
mentioned by  Shakti Singh, one easy way to cleanup the file is using the 
command “dos2unix filename”. 

 

From: shakti singh Shekhawat [mailto:shaktisingh.shekhawa...@gmail.com] 
Sent: 30 May 2017 10:02
To: user@hive.apache.org
Subject: Re: Table count is more than File count after loading in hive

 

Hi Balajee,

 

The best way will be to clean the file in Unix(or perl or python) before 
loading the file in HIVE. The root cause should be most probably carriage 
return(occurs as mostly the files generated on Microsoft platform consists of 
^M characters in file). To identify whether carriage return is the problem, try 
the below few steps:

1. `file` command will give you all Line terminators(\n,etc) in your file but 
it will be in ASCII value.

Ex: file yourfilename

yourfilename: UTF-8 Unicode text, with CRLF, CR, LF line terminators

2. To find what CR(\r), LF(\n) and CRLF(\r\n) mean, try:

man ascii

Till here you will know whether there are carriage returns(\r) in your file 
which breaks the record in HIVE.

3. To identify at which place the carriage return is there, open the file in 
vi-editor

Press Esc

Type   :set list

This should display all the ^M characters highlighted. Find the record where 
you can see ^M in between the record. Go to Hive table do a select on this 
record, you will see that the HIVE record is broken exactly where the ^M is 
seen in the file.

 

Please let us know if this works in identifying the issue, if carriage return 
is the problem, next step is to remove carriage return from your file(you can 
easily find commands in stack overflow, let me know if nothing works).

 

Thanks,

Shakti

Reply via email to