I could see that revision history could be the target factor but no idea
how to go for it. Any suggestion?
On 7 Oct 2013 10:34, "Sonal Goyal" <sonalgoy...@gmail.com> wrote:

> Sorry, where is the contributor information coming from?
>
> Best Regards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina <ajeetra...@gmail.com>wrote:
>
>>  > Hello,
>> >
>> >
>> >
>> > I have Hadoop running on HDFS with Hive installed. I am able to import
>> Wikipedia dump into HDFS through the below command:
>> >
>> >
>> >
>> >
>> http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
>> >
>> >
>> >
>> > $ hadoop jar out.jar
>> edu.umd.cloud9.collection.wikipedia.DumpWikipediaToPlainText -input
>> /home/wikimedia/input/ enwiki-latest-pages-articles.xml  -output
>> /home/wikimedia/output/3
>> >
>> >
>> >
>> > I am able to run Hive for the Wikipedia dump through this command:
>> >
>> >
>> >
>> > I have created one sample hive table based on small data I converted:
>> >
>> >
>> >
>> > CREATE EXTERNAL TABLE wiki_page(page_title string, page_body string)
>> >
>> > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
>> >
>> > STORED AS TEXTFILE
>> >
>> > LOCATION '/home/wikimedia/output/3';
>> >
>> >
>> >
>> > It created for me a record as shown below:
>> >
>> >
>> >
>> > Davy Jones (musician) Davy Jones (musician)           David Thomas
>> "Davy" Jones (30 December 1945 – 29 February 2012) was an English
>> recording artist and actor, best known as a member of The Monkees. Early
>> lifeDavy Jones was born at 20 Leamington Street, Openshaw, Manchester,
>> England, on 30 December 1945. At age 11, he began his acting career…
>> >
>> >
>> >
>> > My overall objective is to know how many contributors are from India
>> and China.
>> >
>> > Any suggestion how to achieve that?
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Reply via email to