Hi Vaijanath,
That would be great, thank you. And this will implement this formula ?
RRFscore(d ∈ D) = sum r∈R 1 /(k + r(d) )
where r(d) = rank of d i.e. 1,2,n
Thanks again and kind regards,
Dan
On Sat, 1 Apr 2023 at 17:41, Vaijanath Rao wrote:
> Hi Dan,
>
> This was a long
Hi Vaijanath,
Was it a custom rank function, or custom rrf function for the sorting? Are
you able to provide an example?
Many thanks,
Dan
On Fri, 31 Mar 2023 at 17:42, Vaijanath Rao wrote:
> We implemented it as a sort function instead of Q-parser. It was easier and
> speed wis
would this need to be a custom QParser ?
Cheers,
Dan
Aha! My version of Lucene was out of date. That should work perfectly.
Thanks,
-Dan
Original message
From: Michael McCandless
Date:08/31/2015 12:57 PM (GMT-08:00)
To: Lucene Users , dsm...@pivotal.io
Cc:
Subject: Re: Indexing a binary field
StringField now also
. Is there a better way?
-Dan
I have tried multiple times to unsubscribe, and it never works. Could you
unsubscribe me?
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Monday, August 27, 2012 11:12 AM
To: Lucene Users
Cc: java-user-ow...@lucene.apache.org
Subject: Seeking more modera
rs
Is this possible with a custom fragmenter?
Or does anyone know of any contrib fragmenter that might do this?
Many thanks
Dan
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
rs
Is this possible with a custom fragmenter?
Or does anyone know of any contrib fragmenter that might do this?
Many thanks
Dan
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
Also I believe solr calls addDocument with payLoads turned off. I'm
not sure why the size is much larger.
Cheers,
Dan
On Tue, Jan 18, 2011 at 12:41 PM, Shai Erera wrote:
> If I understand correctly, you compare the size of the .frq when
> WhitespaceTokenizer is used, vs the CJK ones?
ot; for the text
fields I'd have thought this should be : "If omitTf were true it would
be this sequence of VInts instead:"
(http://lucene.apache.org/java/2_9_1/fileformats.html#Frequencies)
Can anyone suggest how I can reduce the size of this file?
2)^5.0 +(body:term1 body:term2)^10.0;
Which is something different ... does anyone know how I can get it as the
fomer?
Cheers,
Dan
ek and a multi-index searcher that is aware of the date. When you get
to the end of "this" week, you would delete "last" week's index and create a
new "next" week index.
Regards
Dan
- Original Message -
From: Jeff Zhang
To: java-user@lucene.apache
each index
- when a search comes in, allocate a Searcher from each index to the search.
- perform the search in parallel across all indices.
- merge the results in your own code using an efficient merging algorithm.
Regards,
Dan
-Original Message-
From: Shelly_Singh [mailto:shel
I've been running some tests with Lucene 2.9.1 on a Linux box with a
Sun JVM and getting a sun.nio.ch.NateiveThreadSet Assertion error (see
below for stacktrace).
Does anyone know what this error means? Any suggestions for a workaround?
We used the following to open the index.
FSDirectory
iments. An index of only 200,000 documents will be
relatively small and are a great candidate for local disks.
Regards,
Dan
- Original Message -
From: Uwe Schindler
To: java-user@lucene.apache.org
Sent: Sat Sep 26 08:41:13 2009
Subject: RE: Storing a Lucene Index on a SAN Storage:
?
If this is a large index, have you optimized it recently?
Are there any searches going on while you are indexing?
Regards,
Dan
-Original Message-
From: paul_murd...@emainc.com [mailto:paul_murd...@emainc.com]
Sent: Friday, September 11, 2009 7:57 AM
To: java-user@lucene.apache.org
Hi Jamie,
I would appreciate if you could provide details on the hardware/OS you are
running this system on and what kind of search response time you are getting.
As well as how you add email data to your index.
Thanks,
Dan
-Original Message-
From: Jamie [mailto:ja
bad.
Regards,
Dan
- Original Message -
From: Jason Rutherglen
To: java-user@lucene.apache.org
Sent: Fri May 15 16:48:54 2009
Subject: Re: is there a way to control when merges happen?
Hi Dan,
You are looking to throttle the merging? I'd recommend se
Mike,
Thank for the reply.
A follow up question.
How can I tell the big merges from the small ones?
Regards,
Dan
- Original Message -
From: Michael McCandless
To: java-user@lucene.apache.org
Sent: Fri May 15 16:50:27 2009
Subject: Re: is there a way to control when merges happen
Thanks for the feed back Chris.
Can you (or someone else on the list) tell me about the IndexMerge tool?
Thanks
Dan
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Thursday, April 09, 2009 6:46 PM
To: java-user@lucene.apache.org
Subject: Re: Help to
So I have a requirement where I have a directory filled with xml files.
I wrote a parser to parse these files, and index all of the xml
attributes and properties into documents. An example of one of these
documents is below. I'm parsing sentences into words, and tagging the
sentences based on certa
priority)?
For warm-ups, since we sort on a couple of date fields within the document (in
addition to the straight relevance sort), I'm reading your suggestion that it
is important to issue warm up queries that date sort as well?
Thanks again for your time and effort.
Regards,
Dan
-O
.
Regards,
Dan
Dan O'Connor
SVP, Engineering
Acquire Media<http://www.acquiremedia.com/>
77 South Bedford Street, Suite
350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,8
complete
the optimization. Is there a reason why a 22 day index would be 10x the size of
an 8 day index when the document indexing rate is fairly constant? Also, is
there a way to shrink the index without regenerating it?
Any help/pointers would be greatly appreciated.
Thanks and Regards,
Dan
According to "Lucene in Action", running optimization is
not recommended in my case. So would you please advise?
Thanks,
Dan
Michael McCandless-2 wrote:
>
>
> Either optimize() or expungeDeletes() will reclaim the disk space used
> by deleted documents.
>
>
() after indexWriter.deleteDocuments() does not clean the
disk space either. So do they only mark the documents have been deleted,
then which method really do the "delete" the documents (from the disk)?
Thanks,
Dan
--
View this message in context:
http://www.nabble.com/How-did-Lucene-cle
1.6.0_03?
Thanks,
Dan
Michael McCandless-2 wrote:
>
>
> Yes, downgrading is really the only option now, unfortunately.
>
> That, and voting for this bug at Sun (note you can vote for the same
> bug 3 times), which seems to be the root cause of the corruption:
>
>
I can tell, you really
> can't be sure so downgrading is the safest course of action.
>
> Mike
>
> dan at gmail wrote:
>
>>
>> Hello,
>>
>> I don't have a good understanding of what options for avoid this
>> corrupted
>> in
http://issues.apache.org/jira/browse/LUCENE-1282
Do I have any other options?
Thanks,
Dan
--
View this message in context:
http://www.nabble.com/LUCENE-1282-tp18224180p18224180.html
Sent from the Lucene - Java Users mailing list archive at Nabbl
After upgrading to version 2.3.x from 2.2.0, we started experiencing
issues with our index searches. Some searches produced false positives,
while others produce no hits for terms known to be in specific documents
that where digested. After setting up tests that created indexes
containing single
Erick Erickson wrote:
Are you using NumberTools both at index and query time? Because
this works exactly as I expect
Yes, the code I posted showed the usage of NumberTools -- here it is
from my 2nd reply:
Taking your advice I'm now indexing using:
document.add( new Field(RateUtils.SF_F
quot; ... but that meant that the searching
didn't pick up on that field _at all_.
Surely "find me results where numeric field x is higher than y" can't be
an uncommon request? I can think of many areas where you want to do that
(age filtering for exampl
her than numerically.
Oddly enough, if I sort on that field ... it works as I expect.
Am I missing something?
--
Dan Hardiker
PS: I've been googling for well over an hour, if I'm not searching with
the right terms - please advise me! I tried to find a way to search the
a
Hi,
I have a custom Query class that provides a long list of lucene docIds (not for
filtering purposes), which is one clause in a standard BooleanQuery (which also
contains TermQuery instances).
I have a custom Scorer that goes along with the custom Query class.
What (if any) document orderi
a.
> Take a look at some of the Luke code. That tries to reconstruct
> document fields from the index, but it's lossy. So it depends
> upon what kind of fidelity you need.
>
> Erick
>
> On 9/12/07, Dan Luria <[EMAIL PROTECTED]> wrote:
> >
> > If I have
If I have a tokenized unstored field in a document, and I want to
transfer the document to another index, is it possible to carry of the
tokenization with terms?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
, and questions are most welcome!
P.S. A big thanks to the Lucene contributors for their hard work in
building a great piece of software.
--
Dan Callaghan <[EMAIL PROTECTED]>
-
To unsubscribe, e-mail: [EMAIL PROTECTE
Amazon.com's Darwin team is looking for exceptional software engineers
to develop algorithms and build systems to automatically detect
duplicate products for sale in the Amazon.com catalog.
Merchants on Amazon.com provide information about the products they want
to sell. Amazon attempts to match
I'm reading this list eventually but it a very active list and
useful,thks to all.
But so often, when a developer search how to work with lucene finds normally
the same code for same problems.
I think it will be useful create something like Best practices with Lucene
or something similar.
My
Is there a way to retrieve the tell which format an index is in? The file
formats documentation
http://lucene.apache.org/java/docs/fileformats.html#Segments%20File indicates
that the segments file stores a Format value that can be used to determine the
type.
Format is -1 as of Lucene 1.4 and
Hi luceners, I don't have any question today.
Simply I want to know how you test the efficiency of yours systems that
works over Lucene.
I think it will be a very interesting for all newbie (like me) in Lucene
world some advices in this point.
Thanks in advance.
It turns out, this is somehow related to an interaction between SWT and
the java Decompresser class - certainly not lucene related.
FYI:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=169484
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.arm
how do I track it down?
Any suggestions welcome,
Thanks,
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
-
To unsubscribe,
ux machine for the indexing process, I have been thinking that could be
the temporaly files of something , may be pdf box ???
Could you help me please ???
Greetings
It would be helpful if you knew what was filling your harddisk. What
files are filling the 120 GB? Where are they loca
more control at query time whether or not you wanted to match on
abbreviations.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
that leaves me wondering - has anyone written (or
modified) a QueryParser that can automatically use this?
Or do I have to manually create the query whenever I encounter a quoted
phrase with a wildcard? My hunch is that its not real easy, otherwise
it would already have been done...
the norm?
Now that omitNorms is part of the core, the impact of allowing a 2 byte (or
even 4 byte norm) is not nearly as severe on memory. Any suggestions for how
to create a multi-byte norm as an option, but enable the same code to
reading existing 1 byte norms?
Dan
own...
In our experience, JVM crashes are usually caused by a bad JVM.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
--
can you write it
out. More spinning disks will help more than extra RAM.
At least in my experience.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informa
cause this problem doesn't appear if I close my
aplication and restart it doing no rebuild, the problem doesn't exist.
2006/6/8, Daniel Naber <[EMAIL PROTECTED]>:
On Donnerstag 08 Juni 2006 19:59, Dan Wiggin wrote:
> java.lang.ArrayIndexOutOfBoundsException: -1
You get this whe
Hello everybody I have a new problem that I want share with you :D
The problem is doing a search in a Multisearcher.
Normally this multisearcher works well. When I start my tomcat and my two
index are empties, and if I do a search wtih this index I have no exception
and obviusly no hits, but afte
caching before
it writes to the Directory.
I do batches of documents to FSDirectories - and then merge all of the
FSDirectories into a new master index at the end - so I never have to
optimize during the indexing process.
Dan
--
Daniel Armbrust
Biomedical
I read about concurrency in Lucene but I'm not sure to understand well.
I can't do operations of delete and add simultaniouslly.If I've a writer
that I'm using to add new docs, I can't delete anything in Lucene index
until I close my opened writer. Or perhaps Did not close my writer?
Everytime tha
StackTrace
java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readBytes(InputStream.java
:57)
at org.apache.l
Thks for the reply, ut I don't know how to do this change in
SOLatin1AccentFilter.
Can you give me some advice in this action?
2006/5/25, Chris Hostetter <[EMAIL PROTECTED]>:
I think I'm missing something here. the whole point of the
ISOLatin1AccentFilter is to replace accented characters wit
My own solution until I have another one better, I use FuzzyQuery for every
term in the phrase.
For example "My work is the worst" ->> My~ work~ is~ the~ worst
What do you think about this uggly solution? I don't have anything more
ideas.
2006/5/24, Dan Wiggin <[EMAIL PR
Rahil wrote:
No I have around 50GB free on my extrenal disk in which Im creating the
indexes. So hopefully that shouldnt be the problem.
How is the external disk mounted? Samba from unix? NTFS? I wonder if
there isn't something strange going on here.
Have you tried building the index on a
I need some functionality and I don't know how to do.
The problem is special characters like à, ä , ç or ñ latin characters in the
text.
Now I use iso latin filter, but the problem is when I want to obtain most
term used. These term are stored without ` ´ ^ or another "character
attribute".
For ex
in" java.io.IOException: Access is denied
To me, that really seems like you have an issue with the location that
you are writing the index to. I would make sure you have full write
permissions to the location, and make sure there aren't some old /
invalid files sitting in there.
Da
If I work with groups, whats the best option do do? Use a multiple lucene
index for every group or is bettter an unique index.
For example:
I'm working with groups of people, and the action to add or delete is in
group level but the search is on all groups.
What do you think is the best implementa
Excuse me,
jejeje I searched and petition doesn't exist in english is a silly
traduction of spanish word that means request.
EXCUSE ME.
Hi luceners I'm looking "Lucene in Action" and proving the examples.
I have some questions:
If I have to index and I'm using MultiSearcher to search in my index, what I
have to do for every search?
Do I have a new Multisearcher for every search petition or Can I conserve my
Multisearcher object f
et to work)
Amazing.
Thanks again,
Dan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
pper to take advantage
of the lucene updates :)
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
-
To unsubscribe,
://issues.apache.org/jira/browse/LUCENE-540
JUnit test case is attached (although it may not be in the proper format
for lucene - but I think its pretty straight forward)
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http
Yonik Seeley wrote:
On 4/5/06, Dan Armbrust <[EMAIL PROTECTED]> wrote:
I'll continue to try to generate a test case that gets the docs out of
order... but if someone in the know could answer authoritatively whether
I browsed the code for IndexWriter.addIndexes(Dir[]), and it lo
ether
or not lucene is supposed to maintain document order when you merge
multiple indexes together, that would be great.
Thanks,
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)m
ocument
that I added. This continues until Document 910, where it suddenly
jumps to the 99720th document.
Is this a bug, or am I misusing something in the API?
Thanks,
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(a
itten in Java, and
using the same jvm with the same -Xmx settings as your lucene program?
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.may
aster for querying - but that was quite a while ago.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
-
To unsubscribe, e
ay not be quite as clean, but I doubt that there
will be any performance impact.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
-
plicate copy of my index.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
er work on
windows? As far as I can tell, nothing should be holding a lock on
those files. Yet, even when I shut down the only jvm that is using
these indexes, and then open a new one, and perform a search, they don't
go away.
Thanks,
Dan
--
Daniel Armb
keep it whenever the @ is not present. Does anyone
know how I would write this query?
Again, I apologize if sending
this to the wrong place and would be thankful for any help I can
get.
Dan Katz
Cymfony
Koji Sekiguchi wrote:
Hi Dan,
I've experienced same error you are facing. Check out:
http://www.gossamer-threads.com/lists/lucene/java-user/28554?search_string=T
ermVectorOffsetInfo;#28554
Hope this helps.
Koji
Thanks! That helped! I put all this in the FAQ entry:
3? If
it compiles for you, where do you get TermVectorOffsetInfo, and is that
in 1.4.3?
Thanks in advance.
Dan
Chris Hostetter wrote:
: Thanks for this tip! This should be in the FAQ. Is there a way to get it
: in there? I can't edit the wiki, I think.
1) People who create a Wiki a
Thanks for this tip! This should be in the FAQ. Is there a way to get it
in there? I can't edit the wiki, I think.
Dan
Otis Gospodnetic wrote:
Look for the Highlighter in contrib/ to get this effect:
http://www.lucenebook.com/search?query=highlighter+fragment
Otis
- Original Me
x27;d have to reimplement the query matcher if I wanted to
know which things matched.
Thanks in advance for any help.
Dan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Are there specific queries that cause the out of memory problem? Or will any
query do it?
How large is the index?
MultiSearcher allows you to search over multiple indexes, and is well
supported throughout the API. How you split your indexes is depends on what
you want to achieve. There are many
The latest binary "stable" release is 1.4.3. Though not officially
released, Lucene 1.9 is available from the source code repository, and,
IMHO, is more than ready for day to day use. You will need to check the
code out with subversion or cvs via the apache code repository and build it
your self.
That is certainly the behaviour I would expect. The "+" means the term or
phrase is required - you are requiring words that are not stored in your
index.
Why don't remove the "+"? Alternately you could run the search, and if no
matches are found, run it again without the second argument. I've fo
that
Highlighter class can make use of them if present.
Dan
-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Monday, December 12, 2005 9:08 PM
To: java-user@lucene.apache.org
Subject: Re: ApacheCon next week
Well done, Grant. Very informative.
Question on Term
e, I
would greatly appreciate it.
Thank you in advance!
Dan
for one reason or another, you will end up having the lucene
index outside the web-app directory - i think your best bet would be to go
ahead and start out that way.
On 12/10/05, Raul Raja Martinez <[EMAIL PROTECTED]> wrote:
>
> first thanks for your response Dan,
>
> It I
First, thank you Chris, Yonik, and Dan for your ideas as to what might be
causing this problem.
I tried moving things around so that the IndexReader is still open when it
calls TermFreqVector.getTerms()/TermFreqVector.getTermFrequencies(). It
didn't seem to make any difference.
I
If this is a small index and it won't change after install (you are just
using it to search, not to index), place it in a sub-directory of WEB-INF.
If it is a larger index (something you don't want to copy frequently), or it
will change after install, then you shouldn't keep it inside your web
app
7;t want in the new index?
Thanks,
Dan
um.com/gcviewer-vmflags.html
http://java.sun.com/docs/hotspot/gc/
http://www.unixville.com/~moazam/stories/2004/05/17/maxpermsizeAndHowItRelatesToTheOverallHeap.html
hth
Dan
-Original Message-
From: Dan Gould [mailto:[EMAIL PROTECTED]
Sent: 09 December 2005 01:49
To: java-user@lucene.apache.org
Sub
ly up
total occurs)
String[] termstrings = tv.getTerms();
int[] freqs = tv.getTermFrequencies();
Thank you for your help,
Dan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
There IS difference between something being marked as deleted and
something is actually deleted. As these marked as deleted can be
undeleted.
The document is marked as deleted even before the reader is closed.
There is an example in "Lucene in Action". /dan
-Original Message-----
I also had to close the Directory, but that may not be
true.
-Original Message-
From: Dan Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 08, 2005 3:09 PM
To: java-user@lucene.apache.org
Subject: RE: delete and optimize
The document is markded as "deleted" when rea
nts, thus reduces the size
(in bytes) of the index. Until you optimize documents stay in the index
only marked as deleted.
-----Original Message-
From: Dan Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 08, 2005 2:00 PM
To: java-user@lucene.apache.org
Subject: RE: delete and optimize
ration
HTH
Aviran
http://www.aviransplace.com
-Original Message-
From: Dan Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 08, 2005 11:24 AM
To: java-user@lucene.apache.org
Subject: delete and optimize
Hi,
What is the difference between following approaches?
Ap
affects the search speed, will I have any
performance diffenece between these approaches while searching? (note:
the documents are actually deleted when IndexReader is closed)
Thanks /dan
In the sandbox at
http://lucene.apache.org/java/docs/lucene-sandbox/
There is a link to the WordNet repository:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/WordNet
it should be:
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/wordnet
Where "wordnet" is not capitalized.
J
pwords, by definition meant that tokens were not
contiguous? Is this still true if the the query uses the same analyzer and
filters out the same stopwords?
Thanks,
Dan
t the "current" code, then you should compile the code that is
in subversion right now.
Dan
--
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://infor
I've run through exactly the same train of thought. Php is an efficient and
effective web development language - Java provides excellent libraries for
developing powerful business logic layer. Wouldn't it be nice to couple the
two together? The answer is no, it would suck. You end up with some clus
ile cannot be removed becuase windows won't let you remove a file that is
> open. When you run in the debugger, the pause because of the breakpoint
> gives the JVM a chance to call the finalize method.
>
>
>
> : Date: Tue, 01 Nov 2005 14:39:09 -0500
> : From: Dan Adams &
I get a "couldnt delete the lock file" exception from the
second test. If I run it again and set a breakpoint at the beginning of
the second test, wait a second, and then let it continue it runs fine.
What is causing this?
--
Dan Adams
Software Engineer
Interacti
1 - 100 of 146 matches
Mail list logo