Re: Wikipedia Dump Analysis..

2013-10-08 Thread Ajeet S Raina
I am not restricted to finding contributor location.That was just one thought which came to my mind. I would like to know what analysis could be done with Wikipedia. The Wikipedia is in the form of xml dump which is loaded into hdfs and hive created two column for it. On 8 Oct 2013 13:57, "Sonal

Re: Wikipedia Dump Analysis..

2013-10-08 Thread Sonal Goyal
Hi Ajeet, Unfortunately, many of us are not familiar with the Wikipedia format as to where the contributor information is coming from. If you could please highlight that and let us know where you are stuck with Hive, we could throw some ideas.. Sonal Best Regards, Sonal Nube Technologies

Re: Wikipedia Dump Analysis..

2013-10-07 Thread Ajeet S Raina
Any suggestion?? On 7 Oct 2013 11:24, "Ajeet S Raina" wrote: > I was just trying to see if some interesting analysis is possible or > not.one thing which came to mind was tracking contributors and just thought > about that. > > Is it really possible? > On 7 Oct 2013 11:13, "Ajeet S Raina" wrote:

Re: Wikipedia Dump Analysis..

2013-10-06 Thread Ajeet S Raina
I was just trying to see if some interesting analysis is possible or not.one thing which came to mind was tracking contributors and just thought about that. Is it really possible? On 7 Oct 2013 11:13, "Ajeet S Raina" wrote: > I could see that revision history could be the target factor but no id

Re: Wikipedia Dump Analysis..

2013-10-06 Thread Ajeet S Raina
I could see that revision history could be the target factor but no idea how to go for it. Any suggestion? On 7 Oct 2013 10:34, "Sonal Goyal" wrote: > Sorry, where is the contributor information coming from? > > Best Regards, > Sonal > Nube Technologies > >

Re: Wikipedia Dump Analysis..

2013-10-06 Thread Sonal Goyal
Sorry, where is the contributor information coming from? Best Regards, Sonal Nube Technologies On Thu, Oct 3, 2013 at 11:57 AM, Ajeet S Raina wrote: > > Hello, > > > > > > > > I have Hadoop running on HDFS with Hive installed.

Fwd: Wikipedia Dump Analysis..

2013-10-02 Thread Ajeet S Raina
> Hello, > > > > I have Hadoop running on HDFS with Hive installed. I am able to import Wikipedia dump into HDFS through the below command: > > > > http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 > > > > $ hadoop jar out.jar edu.umd.cloud9.collection.wikipedia.DumpWiki