Chunky, You have an external table that points at the location s3://location/
No need to load the data. All files (or partitions folders) under s3://location/ should be available via the table. Just run your queries on it. Load data will move the data from one HDFS location to another. You don't need/want to do that in this case. Mark On Tue, Nov 27, 2012 at 12:18 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote: > Hi, > > Now when I am trying to load a csv file to any table I created, its not > working. > > I created a table :- > CREATE EXTERNAL TABLE someidtable ( > someid STRING, > ) > ROW FORMAT > DELIMITED FIELDS TERMINATED BY '\t' > LINES TERMINATED BY '\n' > LOCATION 's3://location/'; > > Then > > LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable; > > It gives this error:- > "Error in semantic analysis: Line 1:17 Invalid path > ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems > accepted" > > Please help me in resolving this issue. > Thanks, > Chunky. > > > On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote: > >> Okay Mark, I will be looking into this JIRA regularly. >> Thanks again for helping. >> Chunky. >> >> >> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <grover.markgro...@gmail.com >> > wrote: >> >>> Chunky, >>> I just tried it myself. It turns out that the directory you are adding >>> as partition has to be empty for msck repair to work. This is obviously >>> sub-optimal and there is a JIRA in place ( >>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it. >>> >>> So, I'd suggest you keep an eye out for the next version for that fix to >>> come in. In the meanwhile, run msck after you create your partition >>> directory but before you populate your directory with data. >>> >>> Mark >>> >>> >>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta >>> <chunky.gu...@vizury.com>wrote: >>> >>>> Hi Mark, >>>> Sorry, I forgot to mention. I have also tried >>>> msck repair table <Table name>; >>>> and same output I got which I got from msck only. >>>> Do I need to do any other settings for this to work, because I have >>>> prepared Hadoop and Hive setup from start on EC2. >>>> >>>> Thanks, >>>> Chunky. >>>> >>>> >>>> >>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover < >>>> grover.markgro...@gmail.com> wrote: >>>> >>>>> Chunky, >>>>> You should have run: >>>>> msck repair table <Table name>; >>>>> >>>>> Sorry, I should have made it clear in my last reply. I have added an >>>>> entry to Hive wiki for benefit of others: >>>>> >>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions >>>>> >>>>> Mark >>>>> >>>>> >>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta >>>>> <chunky.gu...@vizury.com>wrote: >>>>> >>>>>> Hi Mark, >>>>>> I didn't get any error. >>>>>> I ran this on hive console:- >>>>>> "msck table Table_Name;" >>>>>> It says Ok and showed the execution time as 1.050 sec. >>>>>> But when I checked partitions for table using >>>>>> "show partitions Table_Name;" >>>>>> It didn't show me any partitions. >>>>>> >>>>>> Thanks, >>>>>> Chunky. >>>>>> >>>>>> >>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover < >>>>>> grover.markgro...@gmail.com> wrote: >>>>>> >>>>>>> Glad to hear, Chunky. >>>>>>> >>>>>>> Out of curiosity, what errors did you get when using msck? >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta < >>>>>>> chunky.gu...@vizury.com> wrote: >>>>>>> >>>>>>>> Hi Mark, >>>>>>>> I tried msck, but it is not working for me. I have written a python >>>>>>>> script to partition the data individually. >>>>>>>> >>>>>>>> Thank you Edward, Mark and Dean. >>>>>>>> Chunky. >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover < >>>>>>>> grover.markgro...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Chunky, >>>>>>>>> I have used "recover partitions" command on EMR, and that worked >>>>>>>>> fine. >>>>>>>>> >>>>>>>>> However, take a look at >>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck >>>>>>>>> command in Apache Hive does the same thing. Try it out and let us >>>>>>>>> know it >>>>>>>>> goes. >>>>>>>>> >>>>>>>>> Mark >>>>>>>>> >>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo < >>>>>>>>> edlinuxg...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Recover partitions should work the same way for different file >>>>>>>>>> systems. >>>>>>>>>> >>>>>>>>>> Edward >>>>>>>>>> >>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler >>>>>>>>>> <dean.wamp...@thinkbiganalytics.com> wrote: >>>>>>>>>> > Writing a script to add the external partitions individually is >>>>>>>>>> the only way >>>>>>>>>> > I know of. >>>>>>>>>> > >>>>>>>>>> > Sent from my rotary phone. >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta < >>>>>>>>>> chunky.gu...@vizury.com> wrote: >>>>>>>>>> > >>>>>>>>>> > Hi Dean, >>>>>>>>>> > >>>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have >>>>>>>>>> S3 storage >>>>>>>>>> > containing logs which updates daily and having partition with >>>>>>>>>> date(dt). And >>>>>>>>>> > I was using this recover partition. >>>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive >>>>>>>>>> cluster. So, >>>>>>>>>> > what is the alternate of using recover partition in this case, >>>>>>>>>> if you have >>>>>>>>>> > any idea ? >>>>>>>>>> > I found one way of individually partitioning all dates, so I >>>>>>>>>> have to write >>>>>>>>>> > script for that to do so for all dates. Is there any easiest >>>>>>>>>> way other than >>>>>>>>>> > this ? >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > Chunky >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler >>>>>>>>>> > <dean.wamp...@thinkbiganalytics.com> wrote: >>>>>>>>>> >> >>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to >>>>>>>>>> their version >>>>>>>>>> >> of Hive. >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html >>>>>>>>>> >> >>>>>>>>>> >> <shameless-plus> >>>>>>>>>> >> Chapter 21 of Programming Hive discusses this feature and >>>>>>>>>> other aspects >>>>>>>>>> >> of using Hive in EMR. >>>>>>>>>> >> </shameless-plug> >>>>>>>>>> >> >>>>>>>>>> >> dean >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta < >>>>>>>>>> chunky.gu...@vizury.com> >>>>>>>>>> >> wrote: >>>>>>>>>> >>> >>>>>>>>>> >>> Hi, >>>>>>>>>> >>> >>>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 >>>>>>>>>> and Hive >>>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a >>>>>>>>>> table using :- >>>>>>>>>> >>> >>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW >>>>>>>>>> FORMAT >>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION >>>>>>>>>> 's3://my-location/data/'; >>>>>>>>>> >>> >>>>>>>>>> >>> Now I am trying to recover partition using :- >>>>>>>>>> >>> >>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS; >>>>>>>>>> >>> >>>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line >>>>>>>>>> 1:12 cannot >>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter >>>>>>>>>> table statement" >>>>>>>>>> >>> >>>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop >>>>>>>>>> version 1.0.3 and >>>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine. >>>>>>>>>> >>> >>>>>>>>>> >>> So is this a version issue or am I missing some configuration >>>>>>>>>> changes in >>>>>>>>>> >>> EC2 setup ? >>>>>>>>>> >>> I am not able to find exact solution for this problem on >>>>>>>>>> internet. Please >>>>>>>>>> >>> help me. >>>>>>>>>> >>> >>>>>>>>>> >>> Thanks, >>>>>>>>>> >>> Chunky. >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> -- >>>>>>>>>> >> Dean Wampler, Ph.D. >>>>>>>>>> >> thinkbiganalytics.com >>>>>>>>>> >> +1-312-339-1330 >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >