Chunky,
You have an external table that points at the location s3://location/

No need to load the data. All files (or partitions folders) under
s3://location/ should be available via the table.
Just run your queries on it.

Load data will move the data from one HDFS location to another. You don't
need/want to do that in this case.

Mark

On Tue, Nov 27, 2012 at 12:18 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote:

> Hi,
>
> Now when I am trying to load a csv file to any table I created, its not
> working.
>
> I created a table :-
> CREATE EXTERNAL TABLE someidtable (
> someid STRING,
> )
> ROW FORMAT
> DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> LOCATION 's3://location/';
>
> Then
>
> LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable;
>
> It gives this error:-
> "Error in semantic analysis: Line 1:17 Invalid path
> ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems
> accepted"
>
> Please help me in resolving this issue.
> Thanks,
> Chunky.
>
>
> On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote:
>
>> Okay Mark, I will be looking into this JIRA regularly.
>> Thanks again for helping.
>> Chunky.
>>
>>
>> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <grover.markgro...@gmail.com
>> > wrote:
>>
>>> Chunky,
>>> I just tried it myself. It turns out that the directory you are adding
>>> as partition has to be empty for msck repair to work. This is obviously
>>> sub-optimal and there is a JIRA in place (
>>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>>>
>>> So, I'd suggest you keep an eye out for the next version for that fix to
>>> come in. In the meanwhile, run msck after you create your partition
>>> directory but before you populate your directory with data.
>>>
>>> Mark
>>>
>>>
>>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta 
>>> <chunky.gu...@vizury.com>wrote:
>>>
>>>> Hi Mark,
>>>> Sorry, I forgot to mention. I have also tried
>>>>                 msck repair table <Table name>;
>>>> and same output I got which I got from msck only.
>>>> Do I need to do any other settings for this to work, because I have
>>>> prepared Hadoop and Hive setup from start on EC2.
>>>>
>>>> Thanks,
>>>> Chunky.
>>>>
>>>>
>>>>
>>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <
>>>> grover.markgro...@gmail.com> wrote:
>>>>
>>>>> Chunky,
>>>>> You should have run:
>>>>> msck repair table <Table name>;
>>>>>
>>>>> Sorry, I should have made it clear in my last reply. I have added an
>>>>> entry to Hive wiki for benefit of others:
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta 
>>>>> <chunky.gu...@vizury.com>wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>> I didn't get any error.
>>>>>> I ran this on hive console:-
>>>>>>          "msck table Table_Name;"
>>>>>> It says Ok and showed the execution time as 1.050 sec.
>>>>>> But when I checked partitions for table using
>>>>>>           "show partitions Table_Name;"
>>>>>> It didn't show me any partitions.
>>>>>>
>>>>>> Thanks,
>>>>>> Chunky.
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>>>> grover.markgro...@gmail.com> wrote:
>>>>>>
>>>>>>> Glad to hear, Chunky.
>>>>>>>
>>>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <
>>>>>>> chunky.gu...@vizury.com> wrote:
>>>>>>>
>>>>>>>> Hi Mark,
>>>>>>>> I tried msck, but it is not working for me. I have written a python
>>>>>>>> script to partition the data individually.
>>>>>>>>
>>>>>>>> Thank you Edward, Mark and Dean.
>>>>>>>> Chunky.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>>>> grover.markgro...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Chunky,
>>>>>>>>> I have used "recover partitions" command on EMR, and that worked
>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>> However, take a look at
>>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>>>>> command in Apache Hive does the same thing. Try it out and let us 
>>>>>>>>> know it
>>>>>>>>> goes.
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Recover partitions should work the same way for different file
>>>>>>>>>> systems.
>>>>>>>>>>
>>>>>>>>>> Edward
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>>>> <dean.wamp...@thinkbiganalytics.com> wrote:
>>>>>>>>>> > Writing a script to add the external partitions individually is
>>>>>>>>>> the only way
>>>>>>>>>> > I know of.
>>>>>>>>>> >
>>>>>>>>>> > Sent from my rotary phone.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <
>>>>>>>>>> chunky.gu...@vizury.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi Dean,
>>>>>>>>>> >
>>>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have
>>>>>>>>>> S3 storage
>>>>>>>>>> > containing logs which updates daily and having partition with
>>>>>>>>>> date(dt). And
>>>>>>>>>> > I was using this recover partition.
>>>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>>>>> cluster. So,
>>>>>>>>>> > what is the alternate of using recover partition in this case,
>>>>>>>>>> if you have
>>>>>>>>>> > any idea ?
>>>>>>>>>> > I found one way of individually partitioning all dates, so I
>>>>>>>>>> have to write
>>>>>>>>>> > script for that to do so for all dates. Is there any easiest
>>>>>>>>>> way other than
>>>>>>>>>> > this ?
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Chunky
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>>>> > <dean.wamp...@thinkbiganalytics.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to
>>>>>>>>>> their version
>>>>>>>>>> >> of Hive.
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>>>> >>
>>>>>>>>>> >> <shameless-plus>
>>>>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and
>>>>>>>>>> other aspects
>>>>>>>>>> >> of using Hive in EMR.
>>>>>>>>>> >> </shameless-plug>
>>>>>>>>>> >>
>>>>>>>>>> >> dean
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>>>>> chunky.gu...@vizury.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Hi,
>>>>>>>>>> >>>
>>>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2
>>>>>>>>>> and Hive
>>>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a
>>>>>>>>>> table using :-
>>>>>>>>>> >>>
>>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW
>>>>>>>>>> FORMAT
>>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>>>>> 's3://my-location/data/';
>>>>>>>>>> >>>
>>>>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>>>>> >>>
>>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>>>> >>>
>>>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line
>>>>>>>>>> 1:12 cannot
>>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter
>>>>>>>>>> table statement"
>>>>>>>>>> >>>
>>>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop
>>>>>>>>>> version 1.0.3 and
>>>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>>>>> >>>
>>>>>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>>>>>> changes in
>>>>>>>>>> >>> EC2 setup ?
>>>>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>>>>> internet. Please
>>>>>>>>>> >>> help me.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Thanks,
>>>>>>>>>> >>> Chunky.
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>>>> >> thinkbiganalytics.com
>>>>>>>>>> >> +1-312-339-1330
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to