I could see that revision history could be the target factor but no idea how to go for it. Any suggestion? On 7 Oct 2013 10:34, "Sonal Goyal" <sonalgoy...@gmail.com> wrote:
> Sorry, where is the contributor information coming from? > > Best Regards, > Sonal > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina <ajeetra...@gmail.com>wrote: > >> > Hello, >> > >> > >> > >> > I have Hadoop running on HDFS with Hive installed. I am able to import >> Wikipedia dump into HDFS through the below command: >> > >> > >> > >> > >> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 >> > >> > >> > >> > $ hadoop jar out.jar >> edu.umd.cloud9.collection.wikipedia.DumpWikipediaToPlainText -input >> /home/wikimedia/input/ enwiki-latest-pages-articles.xml -output >> /home/wikimedia/output/3 >> > >> > >> > >> > I am able to run Hive for the Wikipedia dump through this command: >> > >> > >> > >> > I have created one sample hive table based on small data I converted: >> > >> > >> > >> > CREATE EXTERNAL TABLE wiki_page(page_title string, page_body string) >> > >> > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' >> > >> > STORED AS TEXTFILE >> > >> > LOCATION '/home/wikimedia/output/3'; >> > >> > >> > >> > It created for me a record as shown below: >> > >> > >> > >> > Davy Jones (musician) Davy Jones (musician) David Thomas >> "Davy" Jones (30 December 1945 – 29 February 2012) was an English >> recording artist and actor, best known as a member of The Monkees. Early >> lifeDavy Jones was born at 20 Leamington Street, Openshaw, Manchester, >> England, on 30 December 1945. At age 11, he began his acting career… >> > >> > >> > >> > My overall objective is to know how many contributors are from India >> and China. >> > >> > Any suggestion how to achieve that? >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >