Sorry, where is the contributor information coming from?

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina <ajeetra...@gmail.com> wrote:

> > Hello,
> >
> >
> >
> > I have Hadoop running on HDFS with Hive installed. I am able to import
> Wikipedia dump into HDFS through the below command:
> >
> >
> >
> >
> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> >
> >
> >
> > $ hadoop jar out.jar
> edu.umd.cloud9.collection.wikipedia.DumpWikipediaToPlainText -input
> /home/wikimedia/input/ enwiki-latest-pages-articles.xml  -output
> /home/wikimedia/output/3
> >
> >
> >
> > I am able to run Hive for the Wikipedia dump through this command:
> >
> >
> >
> > I have created one sample hive table based on small data I converted:
> >
> >
> >
> > CREATE EXTERNAL TABLE wiki_page(page_title string, page_body string)
> >
> > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> >
> > STORED AS TEXTFILE
> >
> > LOCATION '/home/wikimedia/output/3';
> >
> >
> >
> > It created for me a record as shown below:
> >
> >
> >
> > Davy Jones (musician) Davy Jones (musician)           David Thomas
> "Davy" Jones (30 December 1945 – 29 February 2012) was an English
> recording artist and actor, best known as a member of The Monkees. Early
> lifeDavy Jones was born at 20 Leamington Street, Openshaw, Manchester,
> England, on 30 December 1945. At age 11, he began his acting career…
> >
> >
> >
> > My overall objective is to know how many contributors are from India and
> China.
> >
> > Any suggestion how to achieve that?
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

Reply via email to