Sorry, where is the contributor information coming from? Best Regards, Sonal Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal> On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina <ajeetra...@gmail.com> wrote: > > Hello, > > > > > > > > I have Hadoop running on HDFS with Hive installed. I am able to import > Wikipedia dump into HDFS through the below command: > > > > > > > > > http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 > > > > > > > > $ hadoop jar out.jar > edu.umd.cloud9.collection.wikipedia.DumpWikipediaToPlainText -input > /home/wikimedia/input/ enwiki-latest-pages-articles.xml -output > /home/wikimedia/output/3 > > > > > > > > I am able to run Hive for the Wikipedia dump through this command: > > > > > > > > I have created one sample hive table based on small data I converted: > > > > > > > > CREATE EXTERNAL TABLE wiki_page(page_title string, page_body string) > > > > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > > > > STORED AS TEXTFILE > > > > LOCATION '/home/wikimedia/output/3'; > > > > > > > > It created for me a record as shown below: > > > > > > > > Davy Jones (musician) Davy Jones (musician) David Thomas > "Davy" Jones (30 December 1945 – 29 February 2012) was an English > recording artist and actor, best known as a member of The Monkees. Early > lifeDavy Jones was born at 20 Leamington Street, Openshaw, Manchester, > England, on 30 December 1945. At age 11, he began his acting career… > > > > > > > > My overall objective is to know how many contributors are from India and > China. > > > > Any suggestion how to achieve that? > > > > > > > > > > > > > > > > > > > > >