Related Article question

2007-07-06 Thread sdeck
Hello all, I have been trying out the MoreLikeThis and many other similarity types of queries, but still run into problems with content not being matched up. Let me give an example, as well as some question that, hopefully someone can answer, to help me refine my work. Example: 1) Document A m

Find related question

2007-03-09 Thread sdeck
Hello, I run Nutch and get a whole slew of articles and when I display search results, there may be 5-6 articles that have different titles, and most of the body text is the same, but I want to group them all under one result. These are usually AP articles that all newspapers repurpose. When usi

similarity and delete duplicates

2007-02-13 Thread sdeck
Hey everyone. I have been trying to get a certain kind of delete duplicates working, but I need a little help. Here is my problem. I have many documents, that after a web crawl, many different sites could have documents that have similar titles. I want to remove all of those documents except for

Re: Speed of grouped queries

2007-01-11 Thread sdeck
their searches and store that in a file. That way, I can at least show the user how many results are available in side of the interface without actually having to query for it. Hopefully in a month or so I will be able to give a link to the public website I am working on. Fun stuff. Scott sdeck

Re: Speed of grouped queries

2007-01-10 Thread sdeck
Me wrote: > > On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote: >> >> >> Thanks for advanced on any insight on this one. >> >> I have a fairly large query to run, and it takes roughly 20-40 seconds to >> complete the way that i have it. >> here i

Re: Speed of grouped queries

2007-01-03 Thread sdeck
somewhere. Thanks for being the sounding board. Scott Steven Rowe wrote: > > Hi Scott, > > sdeck wrote: >> I guess, any ideas why I would run out of heap memory by combining all of >> those boolean queries together and then running the query? What is >> happening &

Re: Speed of grouped queries

2007-01-03 Thread sdeck
prebuilt search indexes, which is no fun. I may just have to step through the lucene code to see if it is creating large arrays somewhere that it doesn't need to, or could just cache. Not sure. Will let you know more as I work on it tonight. Sdeck Steven Rowe wrote: > > Hi Scott, >

Re: Speed of grouped queries

2007-01-03 Thread sdeck
combining all of those boolean queries together and then running the query? What is happening in the background that would make that occur? Is it storing something in memory, like all of the common terms or something, to cause that to occur? Sdeck Steven Rowe wrote: > > Hi Scott, >

Re: Speed of grouped queries

2007-01-03 Thread sdeck
a memory error because of how many clauses there are (setting the clause higher did not help) Does this help refine the problem? Thanks for your help! Scott Steven Rowe wrote: > > Hi Sdeck, > > sdeck wrote: >> The query for collecting a specific actor is around 200-300 m

Speed of grouped queries

2007-01-02 Thread sdeck
wish there was an easier way to aggregate all of those documents together from all of those searches. After it is done, I cache the results, but the initial hit is bad. Any help would be much appreciated. Sdeck -- View this message in context: http://www.nabble.com/Speed-of-grouped-queries