from:"Nick Burch"

Re: Jar packaging issue

2013-02-04 Thread Nick Burch

On Mon, 4 Feb 2013, karl.wri...@nokia.com wrote: We recently ran into something people might not be fully aware of. Specifically, because codec jars require META-INF/services files in order to be discovered, and each codec has the same files, it's not a straightforward operation to glom all the

RE: Cannot instantiate SPI class

2013-01-09 Thread Nick Burch

On Wed, 9 Jan 2013, Igal Sapir wrote: The syntax is CFML / CFScript (ColdFusion Script). Railo is an open source, high performance, ColdFusion server. http://getrailo.arg/ I will re-download the Lucene jars and try again. I'll let you know what I find. It may be worth double-checking that

RE: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-15 Thread Nick Burch

On Mon, 14 Dec 2009, Uwe Schindler wrote: Can you open an issue? This is a problem in SnowballAnalyzer missing to add the set ctor. Sure, I have done - http://issues.apache.org/jira/browse/LUCENE-2165 Nick - To unsubscribe, e

SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

2009-12-14 Thread Nick Burch

Hi All I'm upgrading my code from 2.4 to 2.9, and I've hit an issue with deprecations. My old code was: new SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS); Looking at the JavaDocs, I'd expected that the new format would be: new SnowballAnalyzer(Version.LUCENE_CUR

Re: What does "out of order" mean?

2009-11-30 Thread Nick Burch

On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek wrote: I'd do, but was not successful to get the svn repo some months ago. I have to claim the sys admin for any svn repo to open a door through the firewall. Gave up due to $ nmap -p3690 svn.apache.org PORT STATE SERVICE 3690/tcp fi

Re: Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

2008-06-28 Thread Nick Burch

On Fri, 27 Jun 2008, Hasan Diwan wrote: The new ODF-compatible Office 2007 is not supported by POI. Actually, it is, just not the version in trunk. You can download nightly builds of the ooxml branch from http://encore.torchbox.com/poi-svn-build/OOXML-Branch/ And there ought to be a

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

2008-05-12 Thread Nick Burch

On Mon, 12 May 2008, Lukas Vlcek wrote: I need to find a reliable way how to extract content out of Word, Excel and PowerPoint formats prior to indexing and I am not sure if POI is the best way to go. Can anybody share experience with POI and/or other [commercial] Java library for text extracti

Re: PowerPoint Extraction

2007-09-12 Thread Nick Burch

On Wed, 12 Sep 2007, Krista Leopold wrote: I realize that I am asking a "just barely Lucene" question, but I am certain someone on this list knows the answer to what I am on a quest for. I want to use the HSLF portion of apache's POI to do text extraction for my index, but I am having a really

Re: Exchange/PST/Mail parsing

2007-07-02 Thread Nick Burch

On Sun, 1 Jul 2007, Grant Ingersoll wrote: Anyone have any recommendations on a decent, open (doesn't have to be Apache license, but would prefer non-GPL if possible), extractor for MS Exchange and/or PST files? There has been an offer to contribute a PST parser to Apache POI. We're hoping th

Re: Indexing MS Powerpoint files with Lucene

2006-09-07 Thread Nick Burch

On Thu, 7 Sep 2006, Tomi NA wrote: On 9/7/06, Venkateshprasanna <[EMAIL PROTECTED]> wrote: Is there any filter available for extracting text from MS Powerpoint files and indexing them? The lucene website suggests the POI project, which, it seems does not support PPT files as of now. http://jak

Re: Indexing PPT classes hslf

2006-07-03 Thread Nick Burch

On Mon, 3 Jul 2006, mcarcelen wrote: I´ve used the classes "org.apache.poi.hslf.extractor.PowerPointExtractor" and "org.apache.poi.hslf.extractor.QuickButCruddyTextExtractor" with lucene2.0 to extract text but when I try to use the other classes such as "org.apache.poi.hslf.HSLFSlideShow", "org.a

Re: How to index chm and ppt files?

2006-07-02 Thread Nick Burch

On Sun, 2 Jul 2006, wu fox wrote: > Has anyone indexed chm and ppt files? For powerpoint, use hslf from POI: http://jakarta.apache.org/poi/hslf/quick-guide.html > LIA doesn't have an answer :( Try the lucene wiki FAQ (esp section 3): http://wiki.apache.org/jakarta-lucene/LuceneFA

Re: Lucene indexing PPT

2006-06-30 Thread Nick Burch

On Fri, 30 Jun 2006, mcarcelen wrote: > I´m trying to build a index with PPT files. I have downloaded the api > POI, "poi.bin.3.0" and "poi.src.3.0", but I don´t know where may I have > to unzip them. I´d like to build the index by the command line, the same > way as I don't know about the lucene

Re: Word files & Build vs. Buy?

2006-02-14 Thread Nick Burch

On Thu, 9 Feb 2006, Christiaan Fluit wrote: Yes, that's exactly what I'm doing. Having this in POI would benefit me a lot though, as I hardly understand the POI basics to be honest (my fault, not POI's). OK, that's now in POI (you'll need a scratchpad build from late yesterday or today, see h

Re: Word files & Build vs. Buy?

2006-02-09 Thread Nick Burch

On Thu, 9 Feb 2006, Christiaan Fluit wrote: My experience is that the WordDocument class crashes on about 25% of the documents, i.e. it throws some sort of Exception. I've tested POI 2.5.1-final as well as the current code in CVS, but both produce this result. I even suspect the output to be 10

Re: http://www.textmining.org/ is "hacked"

2005-11-25 Thread Nick Burch

On Thu, 24 Nov 2005, Guilherme Barile wrote: The project seems somehow abandoned Ryan (the guy behind it) has gone to work for a firm that has the full word format documentation from Microsoft, so he's no longer able to contribute to open source projects working with word documents. Also if

Re: hslf ppt files

2005-08-23 Thread Nick Burch

On Tue, 23 Aug 2005, Derya Kasapoglu wrote: is there anybody who have the poi hslf classes to extract text from Power Point files. I know the classes are on the poi sites but they are not packaged in a jar! You'll need to either download it yourself from CVS and compile with ant, or grab a ni

How do lucene IDs work with MultiReader and MultiSearcher?

2005-06-21 Thread Nick Burch

Hi I've been scanning the JavaDocs, but I can't see anything on this. Let's say I have two indexes. Both have just been optimised, so each have lucene IDs 1-10 in them. Now, I build a MultiSearcher over these two indexes. I search, get back a Hits object, and from that a lucene ID From my

Re: Jar packaging issue

RE: Cannot instantiate SPI class

RE: SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

SnowballAnalyzer and StopAnalyzer.ENGLISH_STOP_WORDS_SET ?

Re: What does "out of order" mean?

Re: Does Lucene Java 2.3.2 supports parsing of Microsoft office 2007 documents...

Re: Can POI provide reliable text extraction results for production search engine for Word, Excel and PowerPoint formats?

Re: PowerPoint Extraction

Re: Exchange/PST/Mail parsing

Re: Indexing MS Powerpoint files with Lucene

Re: Indexing PPT classes hslf

Re: How to index chm and ppt files?

Re: Lucene indexing PPT

Re: Word files & Build vs. Buy?

Re: Word files & Build vs. Buy?

Re: http://www.textmining.org/ is "hacked"

Re: hslf ppt files

How do lucene IDs work with MultiReader and MultiSearcher?

18 matches

Site Navigation

Mail list logo

Footer information