Hello all,
I have been trying out the MoreLikeThis and many other similarity types of
queries, but still run into problems with content not being matched up.
Let me give an example, as well as some question that, hopefully someone can
answer, to help me refine my work.
Example:
1) Document A m
Hello,
I run Nutch and get a whole slew of articles and when I display search
results, there may be 5-6 articles that have different titles, and most of
the body text is the same, but I want to group them all under one result.
These are usually AP articles that all newspapers repurpose.
When usi
Hey everyone.
I have been trying to get a certain kind of delete duplicates working, but
I need a little help.
Here is my problem.
I have many documents, that after a web crawl, many different sites could
have documents that have similar titles. I want to remove all of those
documents except for
their
searches and store that in a file. That way, I can at least show the user
how many results are available in side of the interface without actually
having to query for it.
Hopefully in a month or so I will be able to give a link to the public
website I am working on.
Fun stuff.
Scott
sdeck
Me wrote:
>
> On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote:
>>
>>
>> Thanks for advanced on any insight on this one.
>>
>> I have a fairly large query to run, and it takes roughly 20-40 seconds to
>> complete the way that i have it.
>> here i
somewhere. Thanks for being
the sounding board.
Scott
Steven Rowe wrote:
>
> Hi Scott,
>
> sdeck wrote:
>> I guess, any ideas why I would run out of heap memory by combining all of
>> those boolean queries together and then running the query? What is
>> happening
&
prebuilt search indexes, which is no
fun.
I may just have to step through the lucene code to see if it is creating
large arrays somewhere that it doesn't need to, or could just cache. Not
sure.
Will let you know more as I work on it tonight.
Sdeck
Steven Rowe wrote:
>
> Hi Scott,
>
combining all of
those boolean queries together and then running the query? What is happening
in the background that would make that occur? Is it storing something in
memory, like all of the common terms or something, to cause that to occur?
Sdeck
Steven Rowe wrote:
>
> Hi Scott,
>
a
memory error because of how many clauses there are (setting the clause
higher did not help)
Does this help refine the problem?
Thanks for your help!
Scott
Steven Rowe wrote:
>
> Hi Sdeck,
>
> sdeck wrote:
>> The query for collecting a specific actor is around 200-300 m
wish there was an easier way to aggregate all of those
documents together from all of those searches. After it is done, I cache
the results, but the initial hit is bad.
Any help would be much appreciated.
Sdeck
--
View this message in context:
http://www.nabble.com/Speed-of-grouped-queries
10 matches
Mail list logo