jobst müller wrote: > hello list , hello Rob > > many thanks for the reply. > > to avoid confusion - i try a first reply to your adress - not to the list. I > am aware that i have to explain the issue, the problem and the needs more > clearly - i try to do so. > > Rob please give me feedback on that - if you need more input then please let > me know. I try to do all i can do! > > > so i start here to describe the problems; > > > I need collect some of the data out of a site - here is an example. > http://www.bamaclubgp.org/forum/sitemap.php > this is very similar to the site in am interested in...! > > why do i need to harvest - and collect some data. Why do i need to collect > the data, you may ask: i am an researcher and i want to do some > socio-ethnographic research (see the research field - describet at > http://opensourec.mit.edu and > http://opensource.mit.edu/online_papers.php ). Therefore i need the data: i > want to harvest the data. > > Harvest is an integrated set of tools to gather, extract, organize, search, > cache, and replicate relevant information. I need to gather information out > of a phpBB2. The question is: Can we tailor httrack to harvest and to digest > information in some different formats. I need to fetch data out of a > online-forum (a phpBB-board) and to store it locally in a mysql-db) > Is this possible with perl. > > > first snippets were available to solve it > http://forums.devshed.com/perl-programming-6/data-grabbing-and-mining-need-scripthelp-370550.html > http://forums.devshed.com/perl-programming-6/minor-change-in-lwp-need-ideas-how-to-accomplish-388061.html > > You allready reviewed it - at a first glance... Now the problem is. I have to > get the site above - in a allmost > full and complete data set. > > according my view: The problem is two folded: it has two major issues or > things... > > 1. Grabbing the data out of the site and then parsing it; finally > 2. storing the data in the new - (local ) database... > > Well the question of restoring is not too hard. if i can pull almost a full > thread-data-set out of the site > The tables are shown here in this site: > http://www.phpbbdoctor.com/doc_columns.php?id=24 > Well if we are able to do the first job very good: > > 1. Grabbing the data out of the site and then parsing it; then the second job > would be not too hard. Then i have as a result - a large file of CSV - data, > donŽt i? The final question was: how can the job of restoring be done!? Then > i am able to have a full set of data. Well i guess that it can be done with > some help of the guys from the http://www.phpBB.com -Team > > The question is: how should i get the data with the robot:USER-AGENT - does > the Agent give me back the most of the > data - so that i can use it for an investigation. BTW -the investigation > needs to be done with some retrieval operations. > Therefore i need to store the gathered datas in a mysql-db. > > Well thats it. I need to build u p an allmost 100 per cent COPY of the > original site - i need to store it locally - here on my machine. I need to > collect some of the data out of the site which i am interested in: > http://www.karakas-online.de/forum/sitemap.php > > If the data that is gained with an script - i have to set up some PERL::DBI > and try to store the data in a phpBB-DB. > > Rob, what do you think about it. Are we able to do so!? > > > Rob , perhaps with a good converter or at least a part of a converter i can > restore the whole cvs-dump with ease. > What do you think. So if we do the first job then i think the second part can > be done also. > > Rob, i look forward to hear from you > best regards > > martin aka jobst
Hi Martin I'm sorry, but your question is still unapproachable. We are happy to help you, but please summarize your problem in a brief message that doesn't include external links. Perhaps it would help to take your design further, and approach step 1 first? Have you been able to download the data you need from the site? Have you made sure that the owners of the site are happy with what you are doing? It is important that you get permission to copy other people's data, and in particular there are site rules for robot access that you must adhere to. If you show us a code fragment that simply downloads the site you are interested in, and also say what problems you have, then we can help. As -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/