Assuming that special characters have been added by Windows platform as mentioned by Shakti Singh, one easy way to cleanup the file is using the command “dos2unix filename”.
From: shakti singh Shekhawat [mailto:shaktisingh.shekhawa...@gmail.com] Sent: 30 May 2017 10:02 To: user@hive.apache.org Subject: Re: Table count is more than File count after loading in hive Hi Balajee, The best way will be to clean the file in Unix(or perl or python) before loading the file in HIVE. The root cause should be most probably carriage return(occurs as mostly the files generated on Microsoft platform consists of ^M characters in file). To identify whether carriage return is the problem, try the below few steps: 1. `file` command will give you all Line terminators(\n,etc) in your file but it will be in ASCII value. Ex: file yourfilename yourfilename: UTF-8 Unicode text, with CRLF, CR, LF line terminators 2. To find what CR(\r), LF(\n) and CRLF(\r\n) mean, try: man ascii Till here you will know whether there are carriage returns(\r) in your file which breaks the record in HIVE. 3. To identify at which place the carriage return is there, open the file in vi-editor Press Esc Type :set list This should display all the ^M characters highlighted. Find the record where you can see ^M in between the record. Go to Hive table do a select on this record, you will see that the HIVE record is broken exactly where the ^M is seen in the file. Please let us know if this works in identifying the issue, if carriage return is the problem, next step is to remove carriage return from your file(you can easily find commands in stack overflow, let me know if nothing works). Thanks, Shakti