Hi,

Now when I am trying to load a csv file to any table I created, its not
working.

I created a table :-
CREATE EXTERNAL TABLE someidtable (
someid STRING,
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
LOCATION 's3://location/';

Then

LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable;

It gives this error:-
"Error in semantic analysis: Line 1:17 Invalid path
''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems
accepted"

Please help me in resolving this issue.
Thanks,
Chunky.

On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote:

> Okay Mark, I will be looking into this JIRA regularly.
> Thanks again for helping.
> Chunky.
>
>
> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover 
> <grover.markgro...@gmail.com>wrote:
>
>> Chunky,
>> I just tried it myself. It turns out that the directory you are adding as
>> partition has to be empty for msck repair to work. This is obviously
>> sub-optimal and there is a JIRA in place (
>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>>
>> So, I'd suggest you keep an eye out for the next version for that fix to
>> come in. In the meanwhile, run msck after you create your partition
>> directory but before you populate your directory with data.
>>
>> Mark
>>
>>
>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <chunky.gu...@vizury.com>wrote:
>>
>>> Hi Mark,
>>> Sorry, I forgot to mention. I have also tried
>>>                 msck repair table <Table name>;
>>> and same output I got which I got from msck only.
>>> Do I need to do any other settings for this to work, because I have
>>> prepared Hadoop and Hive setup from start on EC2.
>>>
>>> Thanks,
>>> Chunky.
>>>
>>>
>>>
>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <
>>> grover.markgro...@gmail.com> wrote:
>>>
>>>> Chunky,
>>>> You should have run:
>>>> msck repair table <Table name>;
>>>>
>>>> Sorry, I should have made it clear in my last reply. I have added an
>>>> entry to Hive wiki for benefit of others:
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta 
>>>> <chunky.gu...@vizury.com>wrote:
>>>>
>>>>> Hi Mark,
>>>>> I didn't get any error.
>>>>> I ran this on hive console:-
>>>>>          "msck table Table_Name;"
>>>>> It says Ok and showed the execution time as 1.050 sec.
>>>>> But when I checked partitions for table using
>>>>>           "show partitions Table_Name;"
>>>>> It didn't show me any partitions.
>>>>>
>>>>> Thanks,
>>>>> Chunky.
>>>>>
>>>>>
>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>>> grover.markgro...@gmail.com> wrote:
>>>>>
>>>>>> Glad to hear, Chunky.
>>>>>>
>>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <chunky.gu...@vizury.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>> I tried msck, but it is not working for me. I have written a python
>>>>>>> script to partition the data individually.
>>>>>>>
>>>>>>> Thank you Edward, Mark and Dean.
>>>>>>> Chunky.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>>> grover.markgro...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Chunky,
>>>>>>>> I have used "recover partitions" command on EMR, and that worked
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> However, take a look at
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>>>> command in Apache Hive does the same thing. Try it out and let us know 
>>>>>>>> it
>>>>>>>> goes.
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>>> edlinuxg...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Recover partitions should work the same way for different file
>>>>>>>>> systems.
>>>>>>>>>
>>>>>>>>> Edward
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>>> <dean.wamp...@thinkbiganalytics.com> wrote:
>>>>>>>>> > Writing a script to add the external partitions individually is
>>>>>>>>> the only way
>>>>>>>>> > I know of.
>>>>>>>>> >
>>>>>>>>> > Sent from my rotary phone.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <
>>>>>>>>> chunky.gu...@vizury.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi Dean,
>>>>>>>>> >
>>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have
>>>>>>>>> S3 storage
>>>>>>>>> > containing logs which updates daily and having partition with
>>>>>>>>> date(dt). And
>>>>>>>>> > I was using this recover partition.
>>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>>>> cluster. So,
>>>>>>>>> > what is the alternate of using recover partition in this case,
>>>>>>>>> if you have
>>>>>>>>> > any idea ?
>>>>>>>>> > I found one way of individually partitioning all dates, so I
>>>>>>>>> have to write
>>>>>>>>> > script for that to do so for all dates. Is there any easiest way
>>>>>>>>> other than
>>>>>>>>> > this ?
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Chunky
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>>> > <dean.wamp...@thinkbiganalytics.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to
>>>>>>>>> their version
>>>>>>>>> >> of Hive.
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>>> >>
>>>>>>>>> >> <shameless-plus>
>>>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and
>>>>>>>>> other aspects
>>>>>>>>> >> of using Hive in EMR.
>>>>>>>>> >> </shameless-plug>
>>>>>>>>> >>
>>>>>>>>> >> dean
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>>>> chunky.gu...@vizury.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hi,
>>>>>>>>> >>>
>>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2
>>>>>>>>> and Hive
>>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a
>>>>>>>>> table using :-
>>>>>>>>> >>>
>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW
>>>>>>>>> FORMAT
>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>>>> 's3://my-location/data/';
>>>>>>>>> >>>
>>>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>>>> >>>
>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>>> >>>
>>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>>>>>>> cannot
>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter
>>>>>>>>> table statement"
>>>>>>>>> >>>
>>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>>>>>>> 1.0.3 and
>>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>>>> >>>
>>>>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>>>>> changes in
>>>>>>>>> >>> EC2 setup ?
>>>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>>>> internet. Please
>>>>>>>>> >>> help me.
>>>>>>>>> >>>
>>>>>>>>> >>> Thanks,
>>>>>>>>> >>> Chunky.
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>>> >> thinkbiganalytics.com
>>>>>>>>> >> +1-312-339-1330
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to