hello Robin hello all
@ Randal - i have read your complains and i guess that you think that i want to do some harmeful. i do not want to do any thing harmefull. I am working on my PhD and i nedd to have (collect some more data ) i work in so what - i work in the filed of social resarch - and escpecially the fild of online-research see - http://opensource.mit.edu/online_papers.php http://opensource.mit.edu/ my current investigation includes some analysis of discussions - online discussions first of - i have to explain something; I have to grab some data out of a phpBB in order to do some field reseach. I need the data out of a forum that is runned by a user community. I need the data to analyze the discussions. to give an example - let us take this forum here. How can i grab all the data out of this forum - and get it local and then after wards put it in a local database - of a phpBB-forum - is this possible"?!"? to give an example - let us take this forum here - am i able to grabb and harvest data out of this forum here. How can i do that. What i have in mind - Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue is. i have to get the data - so what? I need to to take out forum messages and other data (foum topics, users) into database. Purpose: create forum copy for text analysis. Does anyone have approximate solution? It is needed to get data through HTTP for further analysis - in need to get the data through HTTP and put it into CSV - in order to get a dump that can fill a local database of a phpBB-board. I need the data in a allmost full and complete formate. So i need all the data like username .- forum thread topic text of the posting and so on and so on. see http://www.phpbbdoctor.com/doc_tables.php for a full overview: how to do that? i need some kind of a grabbing tool - can i do it with that kind of tool. How do i sove the storing-issue into the local mysql-database. Well you see that is a tricky work - and i am pretty sure taht i am getting help here. So for any and all help i am very very thankful many many thanks in advance And now Robin - Randal, please i am willing to discuss the implications that come with my ideas, my wish - but believe me. I could run my investigations with a browser - as well _- i could load 700 threads - THEY ARE ONLINE SO WHATS the the difficult. EVERYThing is online - i do not really understand the difference here... but i am open to the discussion with you look forward to hear your ideas , suggestions and - yes after the legal /(and ethical discussions ) i am looking forward to a technical discussion jobst - a Ethno-reseracher > -----Ursprüngliche Nachricht----- > Von: Robin Norwood <[EMAIL PROTECTED]> > Gesendet: 26.08.06 16:18:07 > An: merlyn@stonehenge.com (Randal L. Schwartz) > CC: beginners@perl.org > Betreff: Re: subroutine in LWP - in order to get 700 forum threads > merlyn@stonehenge.com (Randal L. Schwartz) writes: > > >>>>>> "jobst" == jobst müller <[EMAIL PROTECTED]> writes: > > > > jobst> to admit - i am a Perl-novice and ihave not so much experience in > > jobst> perl. But i am willing to learn. i want to learn perl. As for now i > > jobst> have to solve some tasks for the college. I have to do some > > jobst> investigations on a board where i have no access to the db. > > > > If you don't have access to the database, what makes you think you have > > permission to run a robot against the web API? > > > > Where are the ethics trainings when we need it? Sheesh. > > > > In other words, to make it very clear: > > > > DO NOT ATTEMPT TO DO THIS > > Really? If I understood the OP correctly, all he wants to do is 'screen > scrape' the (public) board in question. In other words, nothing > significantly different from what Google does when it indexes. I don't > really see an ethical (as opposed to legal - IANAL!) problem with that. > Of course, I would first email the admin for permission, and make *sure* > that such a bot is 'well behaved' - such as adding calls to sleep inside > some of those loops. After he gets the data, he could do something > unethical with it - like republish it. But just getting the data > doesn't seem wrong to me. > > As I said above, I am not a lawyer! The above should not be taken to > mean I think it is legal to do this. But it does sound ethical to me. > > -RN > > -- > Robin Norwood > Red Hat, Inc. > > "The Sage does nothing, yet nothing remains undone." > -Lao Tzu, Te Tao Ching > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > <http://learn.perl.org/> <http://learn.perl.org/first-response> > > > -----Ursprüngliche Nachricht----- > Von: merlyn@stonehenge.com (Randal L. Schwartz) > Gesendet: 26.08.06 16:19:44 > An: Robin Norwood <[EMAIL PROTECTED]> > CC: beginners@perl.org > Betreff: Re: subroutine in LWP - in order to get 700 forum threads > >>>>> "Robin" == Robin Norwood <[EMAIL PROTECTED]> writes: > > >> DO NOT ATTEMPT TO DO THIS > > Robin> Really? If I understood the OP correctly, all he wants to do is > 'screen > Robin> scrape' the (public) board in question. In other words, nothing > Robin> significantly different from what Google does when it indexes. I don't > Robin> really see an ethical (as opposed to legal - IANAL!) problem with that. > Robin> Of course, I would first email the admin for permission, and make > *sure* > Robin> that such a bot is 'well behaved' - such as adding calls to sleep > inside > Robin> some of those loops. After he gets the data, he could do something > Robin> unethical with it - like republish it. But just getting the data > Robin> doesn't seem wrong to me. > > It's one thing to be google, and index all the pages for public use. > > It's entirely another to do it for your own personal gain (knowledge > or commerce, doesn't matter). > > If you can't see the difference, you need to retune your ethics. > > -- > Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 > <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> > Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. > See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > <http://learn.perl.org/> <http://learn.perl.org/first-response> > > _____________________________________________________________________ Der WEB.DE SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! http://smartsurfer.web.de/?mc=100071&distributionid=000000000066 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>