Re: Parsing HTML in clojure

2011-06-06 Thread Mukul
Hi, I have worked on a similar project before and have found the following link useful http://blog.prashanthellina.com/2009/07/27/extracting-relevant-text-from-html-pages/ Best regards ~ Mukul Joshi Director & CEO, SpotOn Software Pvt. Ltd. _SpotOn : One stop spot for your mobile development

Re: Parsing HTML in clojure

2011-06-06 Thread Rasmus Svensson
2011/6/6 Base : > hi all, > > I am working on an app that will parse web pages to do some NLP and > statistics.  I am able to parse the HTML using several different tool > ( enlive, HTML parser, etc).  However I would like to discard all the > rest of the junk in the web page that is not pertinent

Re: Parsing HTML in clojure

2011-06-06 Thread Base
Hi All - Thanks for your help! I found this last night and it looks pretty promising. It is apparently part of Apache Tika (which I have never heard of until now) that has a lot of interesting functionality! https://boilerpipe-web.appspot.com/ Thanks! On Jun 5, 11:14 pm, Bruce Williams wrot

Re: Parsing HTML in clojure

2011-06-06 Thread Bruce Williams
I looked at HtmlCleaner and it pretty cleans up the 'syntax' of the html but does nothing with the 'semantics' - ads,etc Bruce Williams Concepts, like individuals, have their histories and are just as  incapable of withstanding the ravages of time as are individuals.  But in and through all this

Re: Parsing HTML in clojure

2011-06-05 Thread Myriam Abramson
Me too, starting in October. I still need to get up to speed with Clojure however. On Sun, Jun 5, 2011 at 11:04 PM, Andreas Kostler < andreas.koestler.le...@gmail.com> wrote: > There's a Java library called HtmlCleaner. You might wanna give that a > shot. > Btw, I'm working on quite a similar pro

Re: Parsing HTML in clojure

2011-06-05 Thread Andreas Kostler
There's a Java library called HtmlCleaner. You might wanna give that a shot. Btw, I'm working on quite a similar project so if you like email me and we can maybe join forces. Andreas On 06/06/2011, at 11:01 AM, Base wrote: > hi all, > > I am working on an app that will parse web pages to do so