Re: New tool: LSql

2009-04-14 Thread Greg Shackles
This could be very useful. I see you include Lucene v2.3 in your code...does it work correctly with indexes created on v2.4 as well? - Greg On Mon, Apr 13, 2009 at 6:49 PM, Glen Newton wrote: > As the creator of LuSql > [http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql] > I

Re: Indexing and Searching Web Application

2009-01-19 Thread Greg Shackles
} catch (Exception e) { >throw new IllegalStateException(e); >} >stopWatch.stop(); > >LOGGER.debug("total time taken for seach: " + > stopWatch.getTotalTimeMillis() + " ms"); >

Re: Indexing and Searching Web Application

2009-01-19 Thread Greg Shackles
After you make the commit to the index, are you reloading the index in the searchers? - Greg On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman wrote: > Hi > > I have recently worked on developing an application which allows you to > upload a file (which is indexed so you can search later).

Re: Determining index term count

2009-01-07 Thread Greg Shackles
I'm not sure offhand how to write the code to do it, but I know when you open an index in Luke, that is one of the numbers it gives you. If you want to just get the number once that would be an easy way to do it. If you want the code for it, Luke is open source so you could see how they do it. (

Re: Help with installing Lucene

2009-01-07 Thread Greg Shackles
You don't really "install" it as it is not its own standalone application. You write the software that interfaces with the Lucene API. The src zip you mentioned has all the Lucene source, so you can use that if you want to compile the library yourself. If you want to use the precompiled binary of

Re: Extract the text that was indexed

2008-12-30 Thread Greg Shackles
That is my understanding of it too. Terms in the index will point to the position of the tokens they map to. Since one index term can point at any number of tokens, this isn't a sequence map, but just a search map. If you still have the text that was indexed you could run it through an analyzer

Re: Filtering accents

2008-12-30 Thread Greg Shackles
Just thought I'd comment since I had to do word processing before indexing in my application as well. Matt's method is pretty similar to what I did. I wrote a filter that transforms the tokens as they get indexed (and also use that for searching). Since I am indexing a block of words, rather than

Re: Re: Re: Payloads

2008-12-29 Thread Greg Shackles
That sounds pretty cool Karl, and I also dig your use of Motorhead as an example : ) I recently built an application where payloads were a lifesaver, but my usage of them is pretty basic. I am indexing pages of text, so I use payloads to store metadata about each word on the page - size, color, r

Re: Re: Re: Payloads

2008-12-29 Thread Greg Shackles
That sounds pretty cool Karl, and I also dig your use of Motorhead as an example : ) I recently built an application where payloads were a lifesaver, but my usage of them is pretty basic. I am indexing pages of text, so I use payloads to store metadata about each word on the page - size, color, r

Re: Payload Question

2008-12-15 Thread Greg Shackles
Hey Todd, If you look for a thread I started a month or two ago, there was a pretty good discussion of payloads (it is where I initially learned about them). In that thread should also be an explanation of the solution I ended up using for implementing payloads, so maybe that would be helpful for

Re: How to search for "-2" in field?

2008-12-12 Thread Greg Shackles
I admit I only read through this thread quickly so maybe I missed something, but it sounds like you're trying different Analyzers for searching, when what you really need is to use the right analyzer during indexing. Generally you want to use the same analyzer for both indexing and searching so tha

Re: Looking for a way to customize how StandardAnalyzer handles punctuation

2008-12-11 Thread Greg Shackles
id, hacking up the grammar isn't as bad as you might think. > There are actually two examples of the "grammar" in Lucene, one is the > StdTokenizer and the other is the WikipediaTokenizer. They are similar, but > maybe by looking at two examples it might also help. > &

Looking for a way to customize how StandardAnalyzer handles punctuation

2008-12-09 Thread Greg Shackles
Hey everyone, I'm running into a problem where some punctuation that I would actually want to keep gets thrown out because they don't get tokenized. By far the most common case for this is ampersand, but it does happen with others as well. My concern isn't even so much in that I need to be able t

Re: Lucene implementation/performance question

2008-11-27 Thread Greg Shackles
The queries I'm doing really aren't anything clever...just searching for phrases on pages of text, sometimes narrowing results by other words that must appear on the page, or words that cannot appear on the same page. I don't have experience with those span queries so i can't say much about them.

Re: Lucene implementation/performance question

2008-11-26 Thread Greg Shackles
nually use the PayloadSpanUtil for each document separately? > How did you solve the problem with phrase results? > Thanks in advance for your time, > Eran. > On Tue, Nov 25, 2008 at 10:30 PM, Greg Shackles <[EMAIL PROTECTED]> > wrote: > > > Just wanted to post a littl

Re: Lucene implementation/performance question

2008-11-25 Thread Greg Shackles
Just wanted to post a little follow-up here now that I've gotten through implementing the system using payloads. Execution times are phenomenal! Things that took over a minute to run in my old system take fractions of a second to run now. I would also like to thank Mark for being very responsive

Re: Lucene implementation/performance question

2008-11-20 Thread Greg Shackles
Thanks for the update, Mark. I guess that means I'll have to do the sorting myself - that shouldn't be too hard, but the annoying part would just be knowing where one result ends and the next begins since there's no guarantee that they'll always be the same. Let me know if you find any information

Re: Lucene implementation/performance question

2008-11-20 Thread Greg Shackles
On Wed, Nov 19, 2008 at 12:33 PM, Greg Shackles <[EMAIL PROTECTED]> wrote: > In the searching phase, I would run the search across all page documents, > and then for each of those pages, do a search with > PayloadSpanUtil.getPayloadsForQuery that made it so it only got payloads for

Re: Lucene implementation/performance question

2008-11-19 Thread Greg Shackles
I have a couple quick questions...it might just be because I haven't looked at this in a week now (got pulled away onto some other stuff that had to take priority). In the searching phase, I would run the search across all page documents, and then for each of those pages, do a search with PayloadS

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
> > Right, sounds like you have it spot on. That second * from 3 looks like a > possible tricky part. I agree that it will be the tricky part but I think as long as I'm careful with counting as I iterate through it should be ok (I probably just doomed myself by saying that...) Right...you'd do i

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
, be sure to isolate the query down > to the exact doc you want the payloads from (the Span scoring mode of the > highlighter actually puts the doc in a fast MemoryIndex which only holds one > doc, and uses an IndexReader from the MemoryIndex). > > > Greg Shackles wrote: > >&g

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
loads to the terms that match). The > PayloadSpanUtil class is a bit experimental, but I'll fix anything you run > into with it. > > - Mark > > > Greg Shackles wrote: > >> Hi Erick, >> >> Thanks for the response, sorry that I was somewhat vague in the re

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
or you if we had > a better idea of what it is you're trying to accomplish... > > Best > Erick > > On Wed, Nov 12, 2008 at 10:47 AM, Greg Shackles <[EMAIL PROTECTED]> > wrote: > > > I hope this isn't a dumb question or anything, I'm fairly new to

Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
I hope this isn't a dumb question or anything, I'm fairly new to Lucene so I've been picking it up as I go pretty much. Without going into too much detail, I need to store pages of text, and for each word on each page, store detailed information about it. To do this, I have 2 indexes: 1) pages: