Re: Lucene implementation/performance question

2008-11-27 Thread Greg Shackles
The queries I'm doing really aren't anything clever...just searching for phrases on pages of text, sometimes narrowing results by other words that must appear on the page, or words that cannot appear on the same page. I don't have experience with those span queries so i can't say much about them.

Re: Lucene implementation/performance question

2008-11-27 Thread Eran Sevi
Hi Greg, Thanks for quick and detailed answer. What kind of queries do you run? Is it going to work for SpanNearQueries/SpanNotQueries as well? Do you also get the word itself at each position? It would be great if I could search on the content of each payload as well, but since the payload cont

Re: Lucene implementation/performance question

2008-11-26 Thread Greg Shackles
Sure, I'm happy to give some insight into this. My index itself has a few fields - one that uniquely identifies the page, one that stores all the text on the page, and then some others to store characteristics. At indexing time, the text field for each document is manually created by concatenatin

Re: Lucene implementation/performance question

2008-11-26 Thread Eran Sevi
Hi, Can you please shed some light on how your final architecture looks like? Do you manually use the PayloadSpanUtil for each document separately? How did you solve the problem with phrase results? Thanks in advance for your time, Eran. On Tue, Nov 25, 2008 at 10:30 PM, Greg Shackles <[EMAIL PROTE

Re: Lucene implementation/performance question

2008-11-25 Thread Greg Shackles
Just wanted to post a little follow-up here now that I've gotten through implementing the system using payloads. Execution times are phenomenal! Things that took over a minute to run in my old system take fractions of a second to run now. I would also like to thank Mark for being very responsive

Re: Lucene implementation/performance question

2008-11-20 Thread Greg Shackles
Thanks for the update, Mark. I guess that means I'll have to do the sorting myself - that shouldn't be too hard, but the annoying part would just be knowing where one result ends and the next begins since there's no guarantee that they'll always be the same. Let me know if you find any information

Re: Lucene implementation/performance question

2008-11-20 Thread Mark Miller
Yeah, discussion came up on order and I believe we punted - its up to you to track order and sort at the moment. I think that was to prevent those that didnt need it from paying the sort cost, but I have to go find that discussion again (maybe its in the issue?) I'll look at the whole idea agai

Re: Lucene implementation/performance question

2008-11-20 Thread Greg Shackles
On Wed, Nov 19, 2008 at 12:33 PM, Greg Shackles <[EMAIL PROTECTED]> wrote: > In the searching phase, I would run the search across all page documents, > and then for each of those pages, do a search with > PayloadSpanUtil.getPayloadsForQuery that made it so it only got payloads for > each page at

Re: Lucene implementation/performance question

2008-11-19 Thread Greg Shackles
I have a couple quick questions...it might just be because I haven't looked at this in a week now (got pulled away onto some other stuff that had to take priority). In the searching phase, I would run the search across all page documents, and then for each of those pages, do a search with PayloadS

Re: Lucene implementation/performance question

2008-11-13 Thread Eran Sevi
Hi, I have the same need - to obtain "attributes" for terms stored in some field. I also need all the results and can't take just the first few docs. I'm using an older version of lucene and the method i'm using right now is this: 1. Store the words as usual in some field. 2. Store the attributeso

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
> > Right, sounds like you have it spot on. That second * from 3 looks like a > possible tricky part. I agree that it will be the tricky part but I think as long as I'm careful with counting as I iterate through it should be ok (I probably just doomed myself by saying that...) Right...you'd do i

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller
Greg Shackles wrote: Thanks! This all actually sounds promising, I just want to make sure I'm thinking about this correctly. Does this make sense? Indexing process: 1) Get list of all words for a page and their attributes, stored in some sort of data structure 2) Concatenate the text from tho

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
Thanks! This all actually sounds promising, I just want to make sure I'm thinking about this correctly. Does this make sense? Indexing process: 1) Get list of all words for a page and their attributes, stored in some sort of data structure 2) Concatenate the text from those words (space separat

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller
Here is a great power point on payloads from Michael Busch: www.us.apachecon.com/us2007/downloads/AdvancedIndexing*Lucene*.ppt. Essentially, you can store metadata at each term position, so its an excellent place to store attributes of the term - they are very fast to load, efficient, etc. Yo

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
Hey Mark, This sounds very interesting. Is there any documentation or examples I could see? I did a quick search but didn't really find much. It might just be that I don't know how payloads work in Lucene, but I'm not sure how I would see this actually doing what I need. My reasoning is this..

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller
If your new to Lucene, this might be a little much (and maybe I am not fully understand the problem), but you might try: Add the attributes to the words in a payload with a PayloadAnalyzer. Do searching as normal. Use the new PayloadSpanUtil class to get the payloads for the matching words. (T

Re: Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
Hi Erick, Thanks for the response, sorry that I was somewhat vague in the reasoning for my implementation in the first post. I should have mentioned that the word details are not details of the Lucene document, but are attributes about the word that I am storing. Some examples are position on th

Re: Lucene implementation/performance question

2008-11-12 Thread Erick Erickson
If I may suggest, could you expand upon what you're trying to accomplish? Why do you care about the detailed information about each word? The reason I'm suggesting this is "the XY problem". That is, people often ask for details about a specific approach when what they really need is a different app

Lucene implementation/performance question

2008-11-12 Thread Greg Shackles
I hope this isn't a dumb question or anything, I'm fairly new to Lucene so I've been picking it up as I go pretty much. Without going into too much detail, I need to store pages of text, and for each word on each page, store detailed information about it. To do this, I have 2 indexes: 1) pages: