Hi, Now when I am trying to load a csv file to any table I created, its not working.
I created a table :- CREATE EXTERNAL TABLE someidtable ( someid STRING, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' LOCATION 's3://location/'; Then LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable; It gives this error:- "Error in semantic analysis: Line 1:17 Invalid path ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems accepted" Please help me in resolving this issue. Thanks, Chunky. On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote: > Okay Mark, I will be looking into this JIRA regularly. > Thanks again for helping. > Chunky. > > > On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover > <grover.markgro...@gmail.com>wrote: > >> Chunky, >> I just tried it myself. It turns out that the directory you are adding as >> partition has to be empty for msck repair to work. This is obviously >> sub-optimal and there is a JIRA in place ( >> https://issues.apache.org/jira/browse/HIVE-3231) to fix it. >> >> So, I'd suggest you keep an eye out for the next version for that fix to >> come in. In the meanwhile, run msck after you create your partition >> directory but before you populate your directory with data. >> >> Mark >> >> >> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote: >> >>> Hi Mark, >>> Sorry, I forgot to mention. I have also tried >>> msck repair table <Table name>; >>> and same output I got which I got from msck only. >>> Do I need to do any other settings for this to work, because I have >>> prepared Hadoop and Hive setup from start on EC2. >>> >>> Thanks, >>> Chunky. >>> >>> >>> >>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover < >>> grover.markgro...@gmail.com> wrote: >>> >>>> Chunky, >>>> You should have run: >>>> msck repair table <Table name>; >>>> >>>> Sorry, I should have made it clear in my last reply. I have added an >>>> entry to Hive wiki for benefit of others: >>>> >>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions >>>> >>>> Mark >>>> >>>> >>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta >>>> <chunky.gu...@vizury.com>wrote: >>>> >>>>> Hi Mark, >>>>> I didn't get any error. >>>>> I ran this on hive console:- >>>>> "msck table Table_Name;" >>>>> It says Ok and showed the execution time as 1.050 sec. >>>>> But when I checked partitions for table using >>>>> "show partitions Table_Name;" >>>>> It didn't show me any partitions. >>>>> >>>>> Thanks, >>>>> Chunky. >>>>> >>>>> >>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover < >>>>> grover.markgro...@gmail.com> wrote: >>>>> >>>>>> Glad to hear, Chunky. >>>>>> >>>>>> Out of curiosity, what errors did you get when using msck? >>>>>> >>>>>> >>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <chunky.gu...@vizury.com >>>>>> > wrote: >>>>>> >>>>>>> Hi Mark, >>>>>>> I tried msck, but it is not working for me. I have written a python >>>>>>> script to partition the data individually. >>>>>>> >>>>>>> Thank you Edward, Mark and Dean. >>>>>>> Chunky. >>>>>>> >>>>>>> >>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover < >>>>>>> grover.markgro...@gmail.com> wrote: >>>>>>> >>>>>>>> Chunky, >>>>>>>> I have used "recover partitions" command on EMR, and that worked >>>>>>>> fine. >>>>>>>> >>>>>>>> However, take a look at >>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck >>>>>>>> command in Apache Hive does the same thing. Try it out and let us know >>>>>>>> it >>>>>>>> goes. >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo < >>>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Recover partitions should work the same way for different file >>>>>>>>> systems. >>>>>>>>> >>>>>>>>> Edward >>>>>>>>> >>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler >>>>>>>>> <dean.wamp...@thinkbiganalytics.com> wrote: >>>>>>>>> > Writing a script to add the external partitions individually is >>>>>>>>> the only way >>>>>>>>> > I know of. >>>>>>>>> > >>>>>>>>> > Sent from my rotary phone. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta < >>>>>>>>> chunky.gu...@vizury.com> wrote: >>>>>>>>> > >>>>>>>>> > Hi Dean, >>>>>>>>> > >>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have >>>>>>>>> S3 storage >>>>>>>>> > containing logs which updates daily and having partition with >>>>>>>>> date(dt). And >>>>>>>>> > I was using this recover partition. >>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive >>>>>>>>> cluster. So, >>>>>>>>> > what is the alternate of using recover partition in this case, >>>>>>>>> if you have >>>>>>>>> > any idea ? >>>>>>>>> > I found one way of individually partitioning all dates, so I >>>>>>>>> have to write >>>>>>>>> > script for that to do so for all dates. Is there any easiest way >>>>>>>>> other than >>>>>>>>> > this ? >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > Chunky >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler >>>>>>>>> > <dean.wamp...@thinkbiganalytics.com> wrote: >>>>>>>>> >> >>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to >>>>>>>>> their version >>>>>>>>> >> of Hive. >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html >>>>>>>>> >> >>>>>>>>> >> <shameless-plus> >>>>>>>>> >> Chapter 21 of Programming Hive discusses this feature and >>>>>>>>> other aspects >>>>>>>>> >> of using Hive in EMR. >>>>>>>>> >> </shameless-plug> >>>>>>>>> >> >>>>>>>>> >> dean >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta < >>>>>>>>> chunky.gu...@vizury.com> >>>>>>>>> >> wrote: >>>>>>>>> >>> >>>>>>>>> >>> Hi, >>>>>>>>> >>> >>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 >>>>>>>>> and Hive >>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a >>>>>>>>> table using :- >>>>>>>>> >>> >>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW >>>>>>>>> FORMAT >>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION >>>>>>>>> 's3://my-location/data/'; >>>>>>>>> >>> >>>>>>>>> >>> Now I am trying to recover partition using :- >>>>>>>>> >>> >>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS; >>>>>>>>> >>> >>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12 >>>>>>>>> cannot >>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter >>>>>>>>> table statement" >>>>>>>>> >>> >>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version >>>>>>>>> 1.0.3 and >>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine. >>>>>>>>> >>> >>>>>>>>> >>> So is this a version issue or am I missing some configuration >>>>>>>>> changes in >>>>>>>>> >>> EC2 setup ? >>>>>>>>> >>> I am not able to find exact solution for this problem on >>>>>>>>> internet. Please >>>>>>>>> >>> help me. >>>>>>>>> >>> >>>>>>>>> >>> Thanks, >>>>>>>>> >>> Chunky. >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> -- >>>>>>>>> >> Dean Wampler, Ph.D. >>>>>>>>> >> thinkbiganalytics.com >>>>>>>>> >> +1-312-339-1330 >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >