ombination than just emailing the user group, or is this our
best bet in the future as well?
Thanks again!
Kevin
On Tue, Oct 19, 2021 at 5:07 AM Michael Sokolov wrote:
> > I would a bit careful: On our Jenkins server running with AMD Ryzen CPU
> it happens quite often that JDK 16, JDK 1
any other orgs using Java 17 with Lucene?
- Any other considerations we should be aware of?
Best,
Kevin Rosendahl
true' in this case?)
I've read the Javadocs as well as multiple other questions on this topic on
this channel, but it's still confusing to me.
Appreciate your time and help.
Thanks,
Kevin
A1, A2, D (binding)
Kevin Risden
On Thu, Sep 3, 2020 at 4:44 PM jim ferenczi wrote:
> A1 (binding)
>
> Le jeu. 3 sept. 2020 à 07:09, Noble Paul a écrit :
>
>> A1, A2, D binding
>>
>> On Thu, Sep 3, 2020 at 7:22 AM Jason Gerlowski
>> wrote:
>> &
urce or do I need one for
reading from each of the indexed docValue fields and then use a combination
of MultiFloatFunctions and SumFloatFunctions to achieve this?
Appreciate your time and help.
Thanks,
Kevin
I see, thank you Adrien ! I'll look into it and get back to you if I have
any questions.
On Fri, Feb 21, 2020 at 1:45 AM Adrien Grand wrote:
> Hi Kevin,
>
> FunctionScoreQuery can also work with dynamically-computed values, you just
> need to provide it with a DoubleValuesSou
t
the above use case probably needs something more dynamic due to the
distance calculation.
Was wondering if you had any suggestions on how to achieve this or if maybe
I'm misunderstanding something?
Thanks,
Kevin
Hi,
I was just wondering is there an upper limit to the score that can be
generated for a non-constant score query?
Thanks,
Kevin
Hi Vadim,
Thank you so much for your reply. I think you were right.
So if a field is 'analyzed' how can I get both terms 'hey' and 'tom'?
Thanks,
Kevin
On Thu, Aug 23, 2018, 20:26 Vadim Gindin wrote:
> Hi Kevin!
>
> I think that your field is "ana
wondering if there's something wrong with the way I'm accessing it or it
was an issue in these versions.
Thanks,
Kevin
What's the current status of the sort merge strategy?
I want to sort an index by a given field and keep it in that order on disk.
It seems to have evolved over the years and I can't easily figure out the
current status via the Javadoc in 6.x
--
We’re hiring if you know of any awesome Java Devo
hen you are really concerned
> with something else.
>
500GB per day... additionally, disk is cheap, but IOPS are not. The more we
can keep in ram and on SSD the better.
And we're trying to get as much in RAM then SSD as possible... plus we have
about 2 years of content. It adds up ;)
Kevi
I have a large index (say 500GB) that with a large percentage of near
duplicate documents.
I have to keep the documents there (can't delete them) as the metadata is
important.
Is it possible to get the documents to be contiguous somehow?
Once they are contiguous then they will compress very well
Currently I'm using StandardTokenizerFactory which tokenizes the words
bases on spaces. For Toy Story it will create tokens toy and story.
Ideally, I would want to extend the functionality ofStandardTokenizerFactory to
create tokens toy, story, and toy story. How do I do that?
Hi, experts
I had a program running for 2 days to build an index for around 160 million
text files, and after program ended, I tried searching the index and found
the index was not correctly built, *indexReader.numDocs()* returns 0. I
checked the index directory, it looked good, all the index data
From: Kevin Daly (kedaly)
Sent: Friday, May 16, 2008 1:34 PM
To: 'java-user@lucene.apache.org'
Subject: CLucene and Lucene
I am have a question concerning interop between CLucene and Lucene. It
is possible to have a C++ Application using CLucene
test where I can write/read to/from index using
Clucene and Lucene.
- Kevin.
Kevin Daly
Software Engineer
IP Communications Business Unit
[EMAIL PROTECTED]
Phone :+35391384651
Block 10
Parkmore
Galway
Ireland
Ireland
www.cisco.com/
This e-mail may contain
)
at de.gesichterparty.LuceneServlet.run(LuceneServlet.java:140)
at java.lang.Thread.run(Thread.java:595)
On Mac OS X Leopard this code works fine.
Thanks
Kevin
I can see that termpositions gives an enum with all positions of term in
document. I want to do the opposite. Given a position , can I query the
document for term at that position in document?
-
Ready for the edge of your seat? Check out tonight's top pi
I need to use getTermFreqVector on a subset of docs that belong to the hits for
a query. I understand I need to pass the docNumber as an argument in this case.
How do I access that.
For ex .
doc = hits.doc(0);
TermFreqVector vector = reader.getTermFreqVector(docId, "field");
How do I get docI
Thank,
I found it. I wasn't aware of those both source tree.
Kévin.
- Message d'origine
De : Doron Cohen <[EMAIL PROTECTED]>
À : java-user@lucene.apache.org
Envoyé le : Mercredi, 20 Juin 2007, 23h42mn 17s
Objet : Re: The localized Languages.
Hi Kevin, are you looking
Hi,
It seem that all localized languages Analyser are absent from
org.apache.lucene.analysis.* in the lastest 2.2
source release of Lucene. Is this normal or not ?
regards,
Kévin.
_
Ne gardez plus qu'une se
Hi,
how to highlight the search key word in lucene's search results? pls
give advise,thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi,
how to highlight the keyword in the search result summary ? can i use
the /highlight/ package?
Thanks!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ical RAM on the box.
-Kevin
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 28, 2006 12:47 PM
To: java-user@lucene.apache.org
Subject: Commercial vendors monitoring this ML? was: Lucene Performance
Issues
Weird, I was just about to comment on the
Hello,
I recently came across this email in the Lucene user list and am
interested in this article. I tried to access it from the link you
provided, but couldn't find any link to access it. Do you still have an
electronic copy?
Thanks,
Kevin Runde
-Original Message-
From: Ma
Thanks Hoss... You're absolutely right!
Kevin
On 2/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> : I need all the documents returned from the search and am manipulating
> the
> : results with a custom HitCollector, therefore I can't use filters.
>
>
> One more thing: in case these queries are generated, you might
> consider building the corresponding (nested) BooleanQuery yourself
> instead of using the QueryParser.
>
> Regards,
> Paul Elschot
I'll give that a try. Thanks Paul.
ed all the documents returned from the search and am manipulating the
results with a custom HitCollector, therefore I can't use filters.
Kevin
st
scenario also. Is there anyway around this error?
As a side note, it is very unlikely that this will be encountered in the
real world, but b/c we are dealing with content categorization it is still
possible.
Thanks in advance,
Kevin
Does anyone have examples of using Carrot2? I've been looking into it
lately and am not finding good documentation.
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 23, 2005 2:23 PM
To: java-user@lucene.apache.org
Subject: Re: Search clusteri
e descerning, so that the term "cat" returns a
hit but less than 100%. The term "big green cat" should return 100%, the
term "big green" or "green big" should return something less than 100%
and then term "big" or "green" or "cat"
er to build the query for the keywordField (only one
field to search)
4. Can I combine these separate queries together into one?
-Kevin
-Original Message-
From: Jeff Rodenburg [mailto:[EMAIL PROTECTED]
Sent: Wednesday, October 26, 2005 1:04 PM
To: java-user@lucene.apache.org
Subject
the keyword field.
At this point, I'm thinking that I'll need to do two distinct searches,
one using the search term in what I'm calling my searchable fields, and
the other using the other term in the keyword field. Then join the two
HIT lists together.
Looking for some advice.
Thanks,
Kevin
ow grease on my part.
Thanks very much for the advice.
Cheers,
Kevin
Jeff Rodenburg wrote:
Kevin -
You've come to the right list to get information to help you make a
decision. That said, the responsible answer to your question will be
"it depends". The supporter in me s
t direction. Either way I would be very
grateful for any advice.
Cheers,
Kevin
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ot be
searchable.
Thanks,
-Kevin
n optimizing REALLY huge
indexes like 300G or so. Then you run out of available system memory
(4G on 32bit machines) and you hit disk. Then it starts to take weeks
to optimize :-)
Of course you coudl use multiple machines or get more memory.
Kevin
--
Kevin A. Burton, Location - San Francis
it
lacking. I started off just trying to find a library to use in our
crawler but never found anything. Which is why I ended up writing my
own.
> Of these, the Nutch one is certainly under active development, the
> others don't seem to be as far as I can tell.
They should just use ngramcat
Yes. We don't handle the mixed language case very well. The chunking
method is something I wanted to approach.
> So, there is still a lot to do in this area, if you come up with some
> unique way of improving LI performance...
Maybe I'm being dense but what is LI performance?
hat's a good place to find out about multilingual
> corpora.
Yeah. That was my biggest problem. This area had never really been
solved in the OSS world.
--
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http
ld use language
categorization to help deal with the chaos of tagging and full-text
search. Google has done this for a long time now and Technorati has it
in beta.
http://www.feedblog.org/2005/08/ngram_language_.html
--
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonat
Open Source C/C++ only? When are you going to include Open Source Java?
We demand fair treatmant ;)
-Original Message-
From: Robert Schultz [mailto:[EMAIL PROTECTED]
Sent: Sunday, August 07, 2005 6:18 PM
To: java-user@lucene.apache.org
Subject: New Site Live Using Lucene
Not sure if
y creating a 5G file and then cating that to
/dev/null but I have no way to verify that this actually works.
I just made the BUFFER_SIZE veriables non-final so that I can set them at any time.
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if y
f anyone has successfully increased the FSOutputStream and FSInputStream
buffers and got it not to blow up on array copies I would love to know the
short cut
Maybe that was my problem...
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want
indexing. I'm more interested in merging multiple
indexes...
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location - San Francisc
Bill Au wrote:
Optimize is disk I/O bound. So I am not sure what multiple CPUs will buy you.
Now on my system with large indexes... I often have the CPU at 100%...
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo
Is it possible to get Lucene to do an index optimize on multiple
processors?
Its a single threaded algorithm currently right?
Its a shame since I have a quad machine but I'm only using 1/4th of the
capacity. Thats a heck of a performance hit.
Kevin
--
Use Rojo (RSS/Atom aggre
Andrew Boyd wrote:
Kevin,
Those results are awsome. Could you please give those of us that were
following but not quite understanding everything some pseudo code or some more
explaination?
Ug.. I hate to say this bug ignore these numbers. Turns out that I was
hitting a cache ... I
e to the filesystem buffer cache but I
can't imagine why they'd be faster in the second round. It might be
that Linux is deciding not to buffer the document blocks.
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat
searches on 20 TermQueries.
Actually.. it wasn't... :-/
It was about 4x slower.
Ug...
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. B
this should be
fast ... maybe we're calling it too often?
I didn't have much time to look at it but I wanted to illuminate the issue.
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - ht
o one responded.
So it seems like my bottleneck is in seek() so It would make sense to
figure out how to limit this.
Is this O(log(N)) btw or is it O(N) ?
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! -
s? I just assumed that termDocs was already sorted...
I don't see any mention of this in the API...
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
as I'm on a SCSI RAID array at RAID0 on FAST scsi disks... I
also tried tweaking InputStream.BUFFER_SIZE with no visible change in
performance.
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http
other filesystems? I
know that XFS is 4096. What about ext2? ext3? JFS? ReiserFS? NTFS? UFS?
etc....
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. B
ing fun to do tomorrow! w00t!
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator,
rc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
-
ot;);
String maxDateString = maxDoc.get("dateField");
This certainly is an interesting solution. How would lucene score this
result set? The first and last will depend on the score...
I guess I can build up a quick test
Kevin
--
Use Rojo (RSS/Atom aggregator)!
ideas?
Kevin
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location - San Francisco, CA
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG
Is it possible to find the minimum and maximum values for a date field
with a given reader?
I guess I could use TermEnum to do a binary search until I get a hit but
this seems a bit kludgy.
Thoughts?
I don't see any APIs for doing this and a google/grep of the source
doesn't he
API
for doing this and that I'd have to dive into SegmentReader stuff.
Any idea?
--
Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.
Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
Kevin A. Burton, Location
My policy on this type of exception handling is to only byte off what
you can chew. If you catch an IOException, then you simply report to the
user that an unexpected error has occurred and the search engine is
unobtainable at the moment. Errors should be logged and developers
should look at the sp
I think your bottleneck is most likely the DB hit. I assume by 2
products you mean 2 distinct entries into the Lucene Index, i.e.
2 rows in the DB to select from.
I index about 1.5 million rows from a SQL Server 2000 database with
several fields for each entry and it finishes in about
I worked on a website that had the same issue. We made a "search engine"
page that listed all the documents that we wanted to index as links to
documents that contained summaries of those documents with links to the
entire document on the limited access site - Google won't be able to
follow these l
65 matches
Mail list logo