Re: Facet Count strategies and common errors

2024-10-02 Thread Stefan Vodita
ation engine [2] in Lucene 9.12, in the sandbox module for now, if you're willing to consider it. It facets at match-time and is generally faster than the faceting we had before 9.12. Stefan [1] https://github.com/apache/lucene/tree/main/lucene/demo/src/java/org/apache/lucene/demo/facet [2

Re: Help running the demo program

2024-04-22 Thread Stefan Vodita
Hi Siddharth, If you happen to be using IntelliJ, you can run a demo class from the IDE. It probably works with other IDEs too, though I haven't tried it. Stefan On Sun, 21 Apr 2024 at 23:59, Siddharth Jain wrote: > Hello, > > I am a new user to Lucene. I checked out the Lucene

Re: Faceting Queries NON-Taxonomy-based

2023-11-16 Thread Stefan Vodita
, making for a path like `Publish Date/2023/11/16`. When faceting, you could get counts with respect to each of the labels in the path (e.g. counts per year or counts per month of given year). Stefan [1] https://github.com/apache/lucene/pull/12817 On Wed, 15 Nov 2023 at 01:01, Tony Schwartz wrote

Re: Faceting Queries NON-Taxonomy-based

2023-11-14 Thread Stefan Vodita
ying to do. If you've already checked these resources, is there a specific question they didn't help answer? Stefan [1] https://github.com/apache/lucene/tree/main/lucene/demo/src/java/org/apache/lucene/demo/facet [2] https://github.com/apache/lucene/blob/main/lucene/demo/src/java/or

Re: Computing multiple different aggregations over a match-set in one pass

2023-09-09 Thread Stefan Vodita
Hi everyone, I ended up using the idea of doing multiple aggregations in one go and it was a nice improvement. Maybe we can reconsider introducing this? I've opened an issue [1] and published a PR [2] based on the code I had previously shared, with some extra tests and a few improvements. S

Re: Reindexing leaving behind 0 live doc segments

2023-08-30 Thread Stefan Vodita
help? Stefan [1] https://github.com/apache/lucene/blob/d1c353116157d0375de9d673ae5e9c90524ffe2f/lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java On Wed, 30 Aug 2023 at 15:19, Rahul Goswami wrote: > Thanks for the response Mikhail. I don't think I am looking for > f

Re: Computing multiple different aggregations over a match-set in one pass

2023-03-06 Thread Stefan Vodita
Hi Greg, The PR looks great. I think it's a useful feature to have and it helps with the use-case we were discussing. I left a comment with some other ideas that I'd like to explore. Thank you for coding this up, Stefan On Sun, 5 Mar 2023 at 19:33, Greg Miller wrote: > >

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-24 Thread Stefan Vodita
ng call. Has anyone worked on something similar? Best, Stefan On Thu, 23 Feb 2023 at 16:53, Greg Miller wrote: > > Thanks for the detailed benchmarking Stefan! I think you've demonstrated > here that looping over the collected hits multiple times does in fact add > meaningful over

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-17 Thread Stefan Vodita
for my use-case. I'd like to know if the test and results seem reasonable. If so, maybe we can think about providing this functionality. Thanks, Stefan [1] https://github.com/stefanvodita/lucene/commit/3536546cd9f833150db001e8eede093723cf7663 [2] https://download.geonames.org/export/dump/a

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-16 Thread Stefan Vodita
asted work as I had expected. I'll try to do a performance comparison to quantify precisely how much time we could save, just in case. Thank you the help! Stefan [1] https://github.com/stefanvodita/lucene/commit/3227dabe746858fc81b9f6e4d2ac9b66e8c32684 On Wed, 15 Feb 2023 at 15:48, Greg M

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-14 Thread Stefan Vodita
, iterating twice duplicates most of the work, correct? Stefan [1] https://github.com/apache/lucene/blob/7f8b7ffbcad2265b047a5e2195f76cc924028063/lucene/demo/src/java/org/apache/lucene/demo/facet/ExpressionAggregationFacetsExample.java#L91 On Mon, 13 Feb 2023 at 22:46, Greg Miller wrote: > > Hi

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-11 Thread Stefan Vodita
present in Mark Twain’s writing, and those of other American authors, and in sci-fi novels too. Does that make the example clearer? Stefan On Sat, 11 Feb 2023 at 00:16, Greg Miller wrote: > > Hi Stefan- > > Can you clarify your example a little bit? It sounds like you want to facet

Computing multiple different aggregations over a match-set in one pass

2023-02-10 Thread Stefan Vodita
pass. Is there a way to do that? Stefan [1] https://javadoc.io/doc/org.apache.lucene/lucene-demo/latest/org/apache/lucene/demo/facet/package-summary.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For

Question about current situation of good first issues in GitHub

2023-01-10 Thread Stefan Vodita
Hello Shunya, As far as I know, GitHub issues are not marked for new developers yet. The project migrated a few months ago from Jira to GitHub issues, so you can still search the old labels in Jira . In particular, there is `newdev` for good starter issues [1]. Hope this helps, Stefan [1

Re: Retrieving query-time join fromQuery hits

2020-06-08 Thread Stefan Onofrei
actually hadn't thought of that. Could you please provide more details on how we could approach the problem from this angle? Thanks, Stefan Onofrei On Wed, Jun 3, 2020 at 9:59 PM Mikhail Khludnev wrote: > Hi, Stefan. > Have you considered faceting/aggregation over `from` field? > &g

Retrieving query-time join fromQuery hits

2020-05-12 Thread Stefan Onofrei
, it would then have to be exposed somehow. Thanks, Stefan Onofrei [1] https://lucene.apache.org/core/8_5_1/join/org/apache/lucene/search/join/JoinUtil.html [2] https://lucene.472066.n3.nabble.com/access-to-joined-documents-td4412376.html [3] https://issues.apache.org/jira/browse/LUCENE-3602

Indexing performance on HDFS

2016-04-26 Thread KORTMANN Stefan (MORPHO)
Hi, can indexing on HDFS somehow be tuned up using pluggable codecs / some customized PostingsFormat? What settings would you recommend for using Lucene 5.5 on HDFS? Regards, Stefan # " This e-mail and any attached documents may contain confidential or proprietary information. If you ar

Store a query in a database for later use

2012-05-17 Thread Stefan Undorf
Hi, I want to store a query for later use in a database, like: 1. queryToPersist = booleanQuery.toString(); 2. store it to the db, go fishing, retrieve it 3. Query query = Query.parseString(queryToPersist) The method Query.parseString does not exist. Is there a way to do something similar?

Re: Memory issues

2011-09-05 Thread Stefan Trcek
and size of allocated >> objects. Just looking for the installed plugins. I guess this plugin does the job. http://www.eclipse.org/mat/ Stefan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.or

Re: Memory issues

2011-09-05 Thread Stefan Trcek
DumpPath=. -XX:+HeapDumpOnOutOfMemoryError and drag'n'drop the resulting java_pid*.hprof into eclipse. You will get an outline by class for the number and size of allocated objects. Note that you need somewhat more main memory for the post mortem diagnosis, so you may want to st

Re: Index size and performance degradation

2011-06-14 Thread Stefan Trcek
strategy. > I think it's better to spend time during reopen so that searches > aren't slower. Absolutely, if you build an internet search engine. For our closed world with numbered clients search speed doesn't have that impact. It must scale for one client

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-20 Thread Stefan Trcek
On Tuesday 18 January 2011 22:04:01 Grant Ingersoll wrote: Where do you get your Lucene/Solr downloads from? [x] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an

Re: Using multiple drives and non-CFS format to improve search performance

2010-08-27 Thread Stefan Nikolic
in IndexReader, but I don't know how to repair/rebuild the segments_N file afterwards. Is this even possible? Also, I found this: http://lucene.472066.n3.nabble.com/Distributing-index-over-N-disks-td577483.html Gives me hope! Thanks again, Stefan On Fri, Aug 27, 2010 at 2:35 AM, Sanne Gr

Using multiple drives and non-CFS format to improve search performance

2010-08-26 Thread Stefan Nikolic
, the .fdx/.fdt stored-fields-related files onto a standard rotational drive, and using symlinks to hide all of this from Lucene. Any ideas on what the performance effects of such a setup would be? Which files would you recommend putting on the slower media, and which on the faster media? Thanks! -Stefan

Re: [ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-13 Thread Stefan Trcek
that this event has ended, not really unexpected as it was one hour late now. Plenty data for registration required and the account seems to be undeletable. (*) sending xhtml claiming to be text/plain regards, Stefan - To u

Re: Get info wheter a field is multivalued

2010-03-17 Thread Stefan Trcek
multiValued=true; > break; > } > docsRead.set(td.doc()); > } > }while(te.next()&&multiValued==false); Nice

Get info wheter a field is multivalued

2010-03-17 Thread Stefan Trcek
Hello Is there an api that indicates whether a field is multivalued, just like IndexReader.getFieldNames(IndexReader.FieldOption fldOption) does it for fields beeing indexed/stored/termvector? Of course I could track it at index time. Stefan

NGramTokenizer stops working after about 1000 terms

2009-12-14 Thread Stefan Trcek
tokenStream.end(); tokenStream.close(); return output.toArray(new String[0]); } The complete example is attached. "in.txt" must be in "." and is plain ASCII. Stefan import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOExcep

Re: About Lucene ...

2009-12-02 Thread Stefan Trcek
On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote: > On Wednesday 02 December 2009 15:50:45 archibal wrote: > > -optionnally i want to have a central server which index all data > > (name of files, folders and file content) on network and i would > > like to connect

Re: About Lucene ...

2009-12-02 Thread Stefan Trcek
is or something like ? You may have a look at regain http://regain.murfman.de/wiki/doku.php?id=start hounder http://www.hounder.org/index.html I never used these products, so I can't tell you anything about 'em. Stefan --

Re: What does "out of order" mean?

2009-12-01 Thread Stefan Trcek
On Monday 30 November 2009 18:51:34 Nick Burch wrote: > On Mon, Nov 30, 2009 at 12:22 PM, Stefan Trcek wrote: > > I'd do, but was not successful to get the svn repo some months ago. > > I have to claim the sys admin for any svn repo to open a door > > through th

Re: What does "out of order" mean?

2009-12-01 Thread Stefan Trcek
opFieldDocs Searcher.search(Query query, Filter filter, int n, Sort sort) Stefan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What does "out of order" mean?

2009-12-01 Thread Stefan Trcek
On Monday 30 November 2009 18:42:50 Michael McCandless wrote: > I was able to apply that git patch just fine -- so I think it'll > work? Good to hear it works that simple. This patch completes the task. It is a "two file" patch, so if this will work too, I'm confident.

Re: What does "out of order" mean?

2009-11-30 Thread Stefan Trcek
I'd try to setup the git-svn bridge locally, just to create the patches. Stefan diff --git a/src/java/org/apache/lucene/search/TopDocs.java b/src/java/org/apache/lucene/search/TopDocs.java index 7e53662..0f098e1 100644 --- a/src/java/org/apache/lucene/search/TopDocs.java +++ b/src/j

Re: What does "out of order" mean?

2009-11-30 Thread Stefan Trcek
", "Expert", "Expert + low level" or return a TopDocs/TopFieldDocs object, which itself claimes to be "Expert". I appreciate the labeling but I guess the road to go is somewhat hidden. Stefan --

Re: What does "out of order" mean?

2009-11-27 Thread Stefan Trcek
ll into the "search(query, filter, collector)" method. I see that I can do that simpler with "search(Query, Filter, int, Sort)". Stefan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What does "out of order" mean?

2009-11-27 Thread Stefan Trcek
sed it, but I stumpled upon "out of order" and "in order" several times and wasn't sure what will be the consequence of the decision. Not even sure what will be the "don't care" case. I like "don't care" options like "Version.LUCENE_CURRENT

Re: Proposal for changing Lucene's backwards-compatibility policy

2009-10-16 Thread Stefan Trcek
aller and maybe even grouped by topic. I prefer advancing step by step. A bunch of deprecated API parts even hinders reading and understanding the API. So, the sooner they are gone, the better. Regards, Stefan - To unsubscribe

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
give up on tracking down those binary references, I spent too much time on this already. Thanks a lot for your insights. Stefan -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Do 25.06.2009 15:57 An: java-user@lucene.apache.org B

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
it is similar to creating a new IndexWriter. HTH, Stefan -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Do 25.06.2009 13:13 An: java-user@lucene.apache.org Betreff: Re: OutOfMemoryError using IndexWriter Can you post your test code? If yo

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
term/freq vector fields per doc] No problems were detected with this index. Stefan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
[Ljava.util.HashMap$Entry;11329 2494312 class java.util.HashMap$Entry 132578 2121248 class [I51862097300 So far I had no success in pinpointing those binary arrays, I will need some more time for this. Stefan -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc

AW: OutOfMemoryError using IndexWriter

2009-06-25 Thread stefan
eneric. I shall post my results. Stefan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

AW: OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
some hint, whether this is the case, from the programming side would be appreciated ... Stefan -Ursprüngliche Nachricht- Von: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov] Gesendet: Mi 24.06.2009 16:18 An: java-user@lucene.apache.org Betreff: RE: OutOfMemoryError using

AW: OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
. > Why is it, that creating a new Index Writer will let the indexing run fine > with 80MB, but keeping it will create an > OutOfMemoryException running with 100MB heap size ? Please explain those buffered deletes in a few more details. Thanks, Stefan -

AW: OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
Hi, I do use Win32. What do you mean by "the index file before optimizations crosses your jvm memory usage settings (if say 512MB)" ? Could you please further explain this ? Stefan -Ursprüngliche Nachricht- Von: Sudarsan, Sithu D. [mailto:sithu.sudar...@fda.hhs.gov] Ge

AW: OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
IndexWriter for the complete indexing operation, I do not call optimize but get an OOMError. Stefan -Ursprüngliche Nachricht- Von: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Gesendet: Mi 24.06.2009 14:22 An: java-user@lucene.apache.org Betreff: Re: OutOfMemoryError using

AW: OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
org.apache.lucene.index.BufferedDeletes$Num 117303 469212 The --- as well was the reflect.Method are part of the app's data. Why is it, that creating a new Index Writer will let the indexing run fine with 80MB, but keeping it will create an OutOfMemoryException running with 100MB heap size ? S

AW: OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
org.apache.lucene.index.FreqProxTermsWriter$PostingList 116736 (instances) 3268608 (size) Well, something I should do differently ? Stefan -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Mi 24.06.2009 10:48 An: java-user@lucene.apache.org Betreff: Re

OutOfMemoryError using IndexWriter

2009-06-24 Thread stefan
oblem related to Lucene. Any Ideas - or post this to Jira ? Jira has quite a few OutOfMemory postings but they all seem closed in Version 2.4.1. Thanks, Stefan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.or

Re: 1:n queries again

2008-11-12 Thread Stefan Trcek
On Wednesday 12 November 2008 14:58:53 Christian Reuschling wrote: > In order to offer some simple 1:n matching, currently we create > several, counted attributes and expand our queries that we search > inside each attribute, e.g.: I use one attribute (Field) multiple times

Re: Boosting results

2008-11-11 Thread Stefan Trcek
f course even a second load and then a search is much > slower than just a warmed search though). Was hoping to see some > advantage with a payload implementation with LUCENE-831, but really > didn't seem to... Currently I have 50 mil docs maximum, but usually 5 mil

Re: Boosting results

2008-11-11 Thread Stefan Trcek
But a servlet engine   > can be mighty handy when you need to go to distributed search,   > replication, etc.  But one can use Solr very much like using Lucene, > API-only (but with config files). Yes - for additional tasks you may use additional software or services, but I do not

Re: Boosting results

2008-11-10 Thread Stefan Trcek
use a ready to use lucene index in Solr? Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Boosting results

2008-11-10 Thread Stefan Trcek
1: category=A text=john Doc 2: category=B text=mary Doc 3: category=B text=john Doc 4: category=C text=mary This is intended for search refinement (I use about 200 categories). Sorry for hijacking this thread. Stefan - To un

Re: Can Lucene tells which field matched ?

2008-11-06 Thread Stefan Trcek
I think org.apache.lucene.search.highlight.Highlighter will do the job. Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

[ANN] katta-0.1.0 release - distribute lucene indexes in a grid

2008-09-18 Thread Stefan Groschupf
After 5 month work we are happy to announce the first developer preview release of katta. This release contains all functionality to serve a large, sharded lucene index on many servers. Katta is standing on the shoulders of the giants lucene, hadoop and zookeeper. Main features: + Plays wel

How to get the error position in QueryParser/ParseException

2008-05-23 Thread Stefan Trcek
entToken.beginColumn=" + e.currentToken.beginColumn); } System.out.println("message=" + e.getMessage()); } } } == Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: search result problem

2007-05-22 Thread Stefan Colella
Hello, I used the setMaxFieldLength() and it works now thx all. Doron Cohen wrote: Stefan Colella wrote: I tried to only add the content of the page where that expression can be found (instead of the whole document) and then the search works. Do i have to split my pdf text into more

Re: search result problem

2007-05-21 Thread Stefan Colella
aren't actually in your index? Can you elaborate on what your analysis process is? Are you searching using the same Analyzer as you are indexing with? I would try to isolate the problem down to some unit tests, if possible. Cheers, Grant On May 18, 2007, at 8:12 AM, Stefan Colella

search result problem

2007-05-18 Thread Stefan Colella
Hello, My application is working with PDF files so i use lucene with PdfBox to create a little search engine. I am new to lucene. All seemed to work fine but after some tests I saw that some expressions like "stock option" where never found (or returns the wrong documents) even if it exist i

Re: Doubt in FuzzyQuery

2007-05-03 Thread Stefan Will
It seems to me like a french stemmer is what you need instead of a fuzzy query. What analyzer are you using for your documents and queries ? -- Stefan [EMAIL PROTECTED] wrote: Hi! I have a problem in dealing whith a fuzzy query in Lucene 2.1.0. In order to explain my problem, I illustrate

Stefan Raspl/Germany/IBM is out of the office.

2007-02-02 Thread Stefan Raspl
I will be out of the office starting 02/03/2007 and will not return until 02/12/2007. I will respond to your message when I return. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Stefan Raspl/Germany/IBM is out of the office.

2006-12-20 Thread Stefan Raspl
I will be out of the office starting 12/21/2006 and will not return until 01/02/2007. I will respond to your message when I return. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Problems with Queries which contain '_' and wildcards

2006-12-13 Thread Stefan Schütz
underscores in filenames is not a solution because we have many other fields which can contain underscores. :( Is there anybody who knows this problem or knows a solution for this? thx in advance. Stefan - To unsubscribe, e

Stefan Raspl/Germany/IBM is out of the office.

2006-09-29 Thread Stefan Raspl
I will be out of the office starting 09/30/2006 and will not return until 10/09/2006. I will respond to your message when I return. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Stefan Raspl/Germany/IBM is out of the office.

2006-06-02 Thread Stefan Raspl
I will be out of the office starting 06/03/2006 and will not return until 06/26/2006. I will respond to your message when I return. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Stefan Raspl/Germany/IBM is out of the office.

2006-05-08 Thread Stefan Raspl
I will be out of the office starting 05/09/2006 and will not return until 05/15/2006. I will respond to your message when I return. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Dealing with acronyms

2006-04-26 Thread Stefan Will
This makes perfect sense to me. Of course the hard part will be how to extract the acronyms. -- Stefan Hannes Carl Meyer wrote: Hi All, I would like enable users to do an acronym search on my index. My idea is the following: 1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document

Re: Stemming german words

2006-01-31 Thread Stefan Gusenbauer
rd is used as a noun and don't stem it therefore. But It would be interesting if a POS-Tagger can distinguish between "Sucht" as nound and "sucht" as verb. But you could give this a try. stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Finding similar documents

2006-01-16 Thread Stefan Gusenbauer
Grant Ingersoll wrote: I believe there is a MoreLikeThis class floating around somewhere (I think it is in the contrib/similarity package). The Lucene book also has a good example, and I have some examples at http://www.cnlp.org/apachecon2005 that demonstrate using term vectors to do this

AW: Scoring by number of terms in field

2006-01-10 Thread Stefan Gusenbauer
result if the body isn't regarded at. But in the body the words are usually weighed lower because it contains more words. Hope this helps! stefan -Ursprüngliche Nachricht- Von: Eric Jain [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 10. Januar 2006 07:32 An: java-user@lucene.apach

AW: AW: Determine the index of a hit after using MultiSearcher

2005-11-29 Thread Stefan Gusenbauer
This is an other good reason for buying the book ultimately! Thx stefan -Ursprüngliche Nachricht- Von: Erik Hatcher [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 29. November 2005 15:57 An: java-user@lucene.apache.org Betreff: Re: AW: Determine the index of a hit after using

Re: Caching Results

2005-11-29 Thread Stefan Groschupf
Well, this depends, in case you have a small index just some million documents that it make no sense. But in case you have some hundred millions documents and may use distributed searching it makes a lot of sense. Just check ehcache.sf.net i found it very useful. HTH Stefan Am 29.11.2005 um

AW: Determine the index of a hit after using MultiSearcher

2005-11-29 Thread Stefan Gusenbauer
I've done this in the same way every document contains a field with the corresponding index. I fear there is no other way to do this. -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 29. November 2005 14:48 An: java-user@lucene.apache.org Be

Re: Wordnet JWLN

2005-11-17 Thread Stefan Gusenbauer
Complutense de Madrid - Mensaje original - De: Stefan Gusenbauer <[EMAIL PROTECTED]> Fecha: Jueves, Noviembre 17, 2005 6:08 pm Asunto: Wordnet JWLN For my index i want to check if a word is a noun, is this possible with the wordnet package which can be found under lucene contrib

Wordnet JWLN

2005-11-17 Thread Stefan Gusenbauer
For my index i want to check if a word is a noun, is this possible with the wordnet package which can be found under lucene contributions or does anyone knows a good tutorial or documentation for http://jwordnet.sourceforge.net/ ? Thanks Stefan

Re: Is There Other Ports of Nutch?

2005-11-06 Thread Stefan Groschupf
No! Porting nutch in general makes no sense. Since nutch is not a library as lucene but a complete ready to use application you can download and start. There is a kind of 'webservice' (open search rss) to be able to integrate nutch search results in third party applications. Stef

Vector Model and Relevance Feedback

2005-11-02 Thread Stefan Gusenbauer
n the computation. So my questions are: Is this the only way to do so? ( I hope so not) Is there an add on for lucene to get a real vector representation? Does anyone has experiences with this issue? Thanks Stefan - To unsubscribe, e

Re: Document number

2005-10-26 Thread Gusenbauer Stefan
Gusenbauer Stefan wrote: >I've searching trough the archives but is there a way to get the >document number for a specific document? I would need it for the Method >getTermFreqVector of IndexReader? For deleting I've saved a unique ID >Field to delete the documents but

Document number

2005-10-26 Thread Gusenbauer Stefan
I've searching trough the archives but is there a way to get the document number for a specific document? I would need it for the Method getTermFreqVector of IndexReader? For deleting I've saved a unique ID Field to delete the documents but how I get the document number? tha

Re: AW: Java heap space ...after index process

2005-10-26 Thread Gusenbauer Stefan
Patricio Galeas wrote: >Hello Ben, >It happens when one of the documents [4.95 MB] is indexed. >I use the framework to index office documents from the book "Lucene In >Action". I think the PDDocument objects are closed correctly. > >I'll look for more information about increasing the heap size. >

Re: How to Integrate Lucene/Nutch with Mysql?

2005-10-25 Thread Stefan Groschupf
BTW, there are some cool free ad servers available as open source... Am 25.10.2005 um 09:14 schrieb Sam Lee: Hi, My network is designed to have a bunch of advertisers to enter their ads with keywords. I think of using mysql to store those, and then use lucene and part of nutch to index them fr

Re: Can I Do Reverse Search?

2005-10-23 Thread Stefan Groschupf
hen # of query is 1 only. Huge difference! Any idea how to accomplish this? --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: Index the keywords of your ads with lucene. Extract all words from your page (ajax), remove stop words, build a query from the page words by connect the words with

Re: Can I Do Reverse Search?

2005-10-23 Thread Stefan Groschupf
number. HTH Stefan Am 23.10.2005 um 18:39 schrieb Sam Lee: ok, I am implementing a google adsense/adwords-like system. For examples, the website has keywords "nike red shoe", so it can match text ad with keywords "nike shoe -blue". Of course, I can always use the text ad k

Relevance Feedback

2005-09-19 Thread Gusenbauer Stefan
Does anyone have experiences with relevance feedback and lucene or just knows some good websites? thx stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: cancel search

2005-09-08 Thread Gusenbauer Stefan
t;>> >>>http://www.altheim.com/lit/robnworm.html >>> >>> >>>----- >>>To unsubscribe, e-mail: [EMAIL PROTECTED] >>>For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >>> >>- >>To unsubscribe, e-mail: [EMAIL PROTECTED] >>For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> >> > > > > I've had such a long lasting search too. I sounds good to start the search in another thread. I've done this for the indexing procedure. This is started in another thread and the gui will be informed when indexing is performed. If the user will stop it he has to click on a button stop and then an event is send to the indexerthread. The indexerthread stops if he reaches a safe point. Surely this is for indexing but I think this would work for searching also. stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

aslib cranfield test collection

2005-09-07 Thread Gusenbauer Stefan
Sorry for that offtopic message but does anyone has experiences with the aslib cranfield test collection or does anyone know where i can get it? thanks in advance stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan
Olivier Jaquemet wrote: > Gusenbauer Stefan wrote: > >> I think nutch uses ngramj for language classification but i don't know >> what type of saving language information they use. In our application >> for example i save the language in an extra field in the

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan
annot find any >discussion >about multiple language on the ML archive. I even did Google! :-) Or >maybe I >was giving the keywords in the wrong language? :-) > >----- >To unsubscribe, e-mail: [EMAIL PROTECTED] &g

Re: setBoost(float) in org.apache.lucene.document.Field cannot be applied to (double)???

2005-08-08 Thread Stefan Groschupf
this field was indexed." HTH Stefan Am 07.08.2005 um 19:05 schrieb Riccardo Daviddi: I don't know where I am wrong... I just do this: IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), !IndexReader.indexEx

Re: OutOfMemory when indexing

2005-06-13 Thread Gusenbauer Stefan
he garbage collector. Then I do some reinitializing and continue indexing. Looks easy but it wasn't. How do I check if i will run out of memory? Runtimeclass and its methods for getting information about the free memory were very unreliable. Therefore I changed to Java 1.5 and implemented a memorynotification listener which is support by the java.lang.management package. There you can adjust a threshold when you should be informed. After the notification I perform a "save". Hope this will help you Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Confused about non-tokenized fields

2005-05-27 Thread Gusenbauer Stefan
Erik Hatcher wrote: > > On May 27, 2005, at 12:14 PM, Gusenbauer Stefan wrote: > >> Max Pfingsthorn wrote: >> >> >>> Hi! >>> >>> Thanks for the reply. I figured already that fields are actually >>> not tokenized... I lost trac

Re: Confused about non-tokenized fields

2005-05-27 Thread Gusenbauer Stefan
>in the field are not... I guess I have to do that manually during indexing? Or >is there some nicer way? > > I think this is not a problem. This should be done automatically when you make a case insensitiv search so that you don't have to think about it. If it should become a

Re: Confused about non-tokenized fields

2005-05-27 Thread Gusenbauer Stefan
Max Pfingsthorn wrote: >Hi! > >In my application, I index some strings (like filenames) untokenized, meaning >via > >doc.add(new Field(FIELD,VALUE,false,true,false)); > >When I later take a look at it with Luke, I still get tokens of the filenames >(like "news" instead of "news-item.xml") in the

Re: URL search causes BooleanQuery TooManyClauses Excp

2005-05-23 Thread Stefan Groschupf
Andrew, the solution for RangeQueries will work for WildcardQueries as well. see: http://wiki.apache.org/jakarta-lucene/ LuceneFAQ#head-06fafb5d19e786a50fb3dfb8821a6af9f37aa831 HTH Stefan Am 23.05.2005 um 21:26 schrieb Andrew Boyd: Hi All, I have an index with 4811 documents each of which

Re: Compass 0.4 Released

2005-05-03 Thread Gusenbauer Stefan
--- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > I would be very interesting to hear/read more about the compass::semantics how this will be realized. I don't mean the code I mean the models behind (like rdf ont

Re: Multiple field search problem

2005-04-25 Thread Gusenbauer Stefan
erharps you've only put them into apostrophs like this " if you instanciate the Queryparser you can specifiy which Boolean operand should be used if no operand is given. With query.toString() you get the query in an human readable format for with + for and and so on. stefan ---

Re: finding all docs with field.

2005-04-17 Thread Gusenbauer Stefan
Peter Veentjer - Anchor Men wrote: >How can I find all documents with a field (the value doesn`t matter). > >I have tried: >Query query = new TermQuery(new Term(AbstractBaseDoc.FIELD_INDEX_ERROR,"")); > > >But this never finds results. The field with name FIELD_INDEX_ERROR has been >of type U

Re: NoSuchMethodError

2005-04-10 Thread Gusenbauer Stefan
[EMAIL PROTECTED] > > > > > try to check the modifiers of the main method and of the Document1 method they should be static I think. By the way a better approach would be you rename the Document1 Method to Document (over

  1   2   >