Re: One information about the Hive

Nitin Pawar Mon, 13 Jan 2014 03:05:37 -0800

The best way to answer your queries is,

1) set up a single node hadoop VM (there are readily available images from
hortonworks and cloudera)
2) try to load data and see where it is stored (hive is a data access
framework .. it does not store any data, information related to data is
stored in metastore .. mainly hcatalog)
3) With hive its just writing queries and doing numbers, there are lot of
file formats which do better with different kind of workloads.


If you have basic understanding of hive and tried few queries you will find
that hive is not a stand alone system (for now). It has hadoop mapreduce1
and hdfs then it has metastore then it has hive framework.

You will need to understand bit more of hdfs as well.

to answer your queries

how the hive will connect with hadoop cluster,

.. when you setup hive you can point it to a hadoop cluster or you can
change these properties at table level.


how the hive will  get the request,

.. not sure what you mean by request .. if you mean the query then there
are ways like hive cli (as I am aware development on this is getting less),
then there are clients like beeline and then u have options of jdbc
connections etc


how the hive will process the request,
.. how converts your query into an optimal mapreduce program and processes
the data using that mapreduce program. How to convert a sql query to
mapreduce program, you can look at ysmart framework from ohio university .

after analysis ,where the analyzed data will be stored for further decision
making
.. hive does store any data automatically. You have to specifically mention
where you want to save the data. a table or a file or something like that.


On Mon, Jan 13, 2014 at 4:14 PM, Vikas Parashar <para.vi...@gmail.com>wrote:

> Thanks Prashant, Definitely i shall go through that if needed. But from
>  my experience, what i have faced is that user will have some integration
> problem with HADOOP 2.
>
>
>  Hi Vikas
>>
>>  Welcome to the world of Hive !
>>
>>  The first book u should read is by Capriolo , Wampler, Rutherglen
>> Programming Hive
>> http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335
>>
>>  This is a must read. I have immensely benefited from this book and the
>> hive user group (the group is kickass).
>>
>>  If u r not sure of the details of HDFS/Hadoop then the Hadoop
>> Definitive Guide (Tom White) is a must read.
>> My view would be u should know both very well eventually...
>>
>
>
>
>>  I have setup Hadoop and Hive cluster in three ways
>> [1] manually thru tarballs (lightweight but u need to know what u r
>> installing and where)
>> [2] CDH & Cloudera manager (heavyweight but it does things in the
>> background....easy to install and quick to setup on a sandbox and
>> learn)...Plus Beeswax is s great starter UI for Hive queries
>> [3] Using Amazon EMR Hive (I realize this is the easiest and the fastest
>> to setup to learn Hive)
>>
>>  My suggestion , Don't go for option [1] - u learn a lot there but it
>> could take time and u might feel frustrated as well
>>
>> using option [2] above , then I suggest
>> - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with
>> 16-32GB RAM
>> - download and install Cloudera manager
>>
>>  If u don't have access to box(es) to install hadoop/hive then the
>> cheapest way  to learn is by using Amazon EMR
>> - First create a S3 bucket and a folder to store a data file called
>> songs.txt
>>
>>    1,2,lennon,john,nowhere man
>>   1,3,lennon,john,strawberry fields forever
>>   2,1,mccartney,paul,penny lane
>>   2,2,mccartney,paul,michelle
>>   2,3,mccartney,paul,yesterday
>>    3,1,harrison,george,while my guitar gently weeps
>>     3,2,harrison,george,i want to tell you
>>    3,3,harrison,george,think for yourself
>>    3,4,harrison,george,something
>>     4,1,starr,ringo,octopuss garden
>>     4,2,starr,ringo,with a liitle help from my friends
>>
>>  - Create a key pair from the AWS console and save the private key on
>> your local desktop
>>
>>  - Create a EMR cluster with Hive installed
>>
>>  - ssh -i /path/on/your/desktop/to/amazonkeypair.pem   hadoop@
>> <some-ec2-instance-name>.compute.amazonaws.com
>>
>>  - One the linux prompt
>>    -->   hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT,
>> SEQID INT, LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT
>> DELIMITED FIELDS TERMINATED BY ',' "
>>   --> hive -e "select songname from songs where lastname='lennon' OR
>> lastname = 'harrison'"
>>
>>  Hope this helps
>>
>>  Hive on !!!
>>
>>  sanjay
>>
>>
>>
>>
>>
>>
>>    id,seq,lastname,firstname,songname
>>
>>
>>
>>
>>   ------------------------------
>> *From:* Vikas Parashar [para.vi...@gmail.com]
>> *Sent:* Sunday, January 12, 2014 10:50 PM
>> *To:* Prashant Kumar - ERS, HCL Tech
>> *Cc:* user@hive.apache.org
>>
>> *Subject:* Re: One information about the Hive
>>
>>   Prashant,
>>
>>
>>    Actually I just started reading and understanding the Hive. Could you
>>> please tell me how you learnt the Hive, you did any training. Is there any
>>> institute which is reliable for specifically Hive  Training. I read alots
>>> of tutorial on net, but still not able to co-relate the file which is
>>> stored on the hadoop cluster and how the hive actually works. The complete
>>> end to end transaction and its storage.Can you take some class on the pay
>>> basis  and clear my question. Pl help me .
>>>
>>
>> i have learnt from community and my personal experience. What i can do, i
>> just fwd your request to some known member of Big Data.
>>
>>
>>>
>>>
>>> Note: One imp thing, can I post the question directly to you, if you do
>>> not mind and if I am not disturbing you.
>>>
>>
>>  Please put all question's on community only.
>>
>>
>>>
>>>
>>> Thanks
>>>
>>> Prashant
>>>
>>>
>>>
>>> *From:* Vikas Parashar [mailto:para.vi...@gmail.com]
>>> *Sent:* Monday, January 13, 2014 11:07 AM
>>> *To:* user@hive.apache.org
>>> *Subject:* Re: One information about the Hive
>>>
>>>
>>>
>>> Prashant,
>>>
>>>
>>>
>>>
>>>
>>> I am new to Hive, I am reading the doc which is available on Apache site
>>> and try to create a correlation between hadoop and Hive. so please help me
>>> to understand this:
>>>
>>> As per my understanding, all the files where unstructured data are
>>> stored in HDFS system across the hadoop cluster. Now when we have to
>>> analyze those data we use Hive.
>>>
>>> Now I have some question which I am not able to get :
>>>
>>>
>>>
>>> 1.When engineer/buisnessuser want to analyze the data, which is
>>> available on any of the file on HDFS cluster, so what is the steps to get
>>> the desired file and analyze the file using hive.
>>>
>>>
>>>
>>> You need to map it with hdfs. With the help of map-reduce, initially you
>>> need to create some meta data in h catalog.
>>>
>>>
>>>
>>> May be it will help you..
>>> http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/
>>>
>>>
>>>
>>>  2.Is Hive stores all the data in their tables after the analysis
>>> permanently?
>>>
>>>
>>>
>>> Hive never store any data.
>>>
>>>
>>>
>>>  3.Is Hive itself a database?
>>>
>>>
>>>
>>> It is just a data-access framework.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Prashant
>>>
>>>
>>>
>>> ::DISCLAIMER::
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential and
>>> intended for the named recipient(s) only.
>>> E-mail transmission is not guaranteed to be secure or error-free as
>>> information could be intercepted, corrupted,
>>> lost, destroyed, arrive late or incomplete, or may contain viruses in
>>> transmission. The e mail and its contents
>>> (with or without referred errors) shall therefore not attach any
>>> liability on the originator or HCL or its affiliates.
>>> Views or opinions, if any, presented in this email are solely those of
>>> the author and may not necessarily reflect the
>>> views or opinions of HCL or its affiliates. Any form of reproduction,
>>> dissemination, copying, disclosure, modification,
>>> distribution and / or publication of this message without the prior
>>> written consent of authorized representative of
>>> HCL is strictly prohibited. If you have received this email in error
>>> please delete it and notify the sender immediately.
>>> Before opening any email and/or attachments, please check them for
>>> viruses and other defects.
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>>
>>
>>
>


-- 
Nitin Pawar

Re: One information about the Hive

Reply via email to