On line 114, in the init() method, a ClassicTokenizerImpl object is created,
but the constructor is passed a parameter called input. Where does this
variable come from? It doesn't seem to be declared anywhere in the java file.
Hi all,
I am trying to get the demo for Lucene to run but I am running into a
problem. When I try to run the IndexFiles command through the command prompt
I get an ArrayOutOfBoundsException at
org.apache.lucene.demo.IndexFiles.main. The line that I put into the prompt
is
java -classpath
C:\Use
Let's say I have the query
(nacho OR foo OR bar)
and some documents (single field with norms off)
doc a: nacho nacho nacho nacho
doc b: foo bar bar
doc c: nacho foo bar
I'm interested in all of these documents but I would like c to score the
highest since it contains all of the search terms, b to
On 4/11/2011 1:47 AM, Chris Mantle wrote:
> Hi, I’m having some trouble with Lucene at the moment. I have a number of
> unique identifiers that I need to search through. They’re in many different
> forms, eg. “M”, “MO”, “:MOFB”, “FH..L-O”, etc. All I need to do is an exact
> prefix search: at th
I know that it's best practice to reuse the Document object when
indexing, but I'm curious how multi-valued fields affect this. I tried
this before indexing each document:
doc.removeFields(myMultiValuedField);
for (String fieldName: fieldNames) {
Field field= doc.getField(field);
if (null != f
I need to add synonyms to an index depending on the field being indexed.
I know that TokenFilter is not "field aware", but is there a good way to
get at the field or do I need to add something to allow my Analyzer to
tell the TokenFilter which field is currently being examined?
Thanks,
-Chris
---
I see in the JavaDoc for IndexWriterConfig that:
"Note that IndexWriter makes a private clone; if you need to
subsequently change settings use IndexWriter.getConfig()."
However when I attempt to use the same IndexWriterConfig to create
multiple IndexWriters the following exception is thrown:
org.
>> Ideally I'd like to have the parser use the
>> custom analyzer for everything unless it's going to parse a clause into
>> a PhraseQuery or a MultiPhraseQuery, in which case it uses the
>> SimpleAnalyzer and looks in the _exact field - but I can't figure out
>> the best way to accomplish this.
>
I have Lucene indexes build using a shingled, stemmed custom analyzer.
I have a new requirement that exact searches match correctly.
ie: bar AND "nachos"
will only fetch results with plural nachos. Right now, with the
stemming, singular nacho results are returned as well. I realize that
I'm going t
I'm trying to use the QueryParser in 3.0.2 to make "foo and bar" (with
the quotes) return documents with the exact phrase "foo and bar". When I
run it through the QueryParser (with a StandardAnalyzer) I end up with
"foo ? bar", which doesn't match the documents in the index. I know that
"and" is a
On Tue, Jan 18, 2011 at 3:04 PM, Grant Ingersoll wrote:
>
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from sourc
7;t have a hash an equals
or hash method.
Suggestions? I've worked around it by registering a class
based builder, checking for the field name and either
delegating to the original builder or doing my custom
processing, but it's a little awkward.
-cks
--
Chr
I'm curious about embedding extra information in an index (and being able to
search the extra information as well). In this case certain tokens correspond
to recognized entities with ids. I'd like to get the ids into the index so that
searching for the id of the entity will also return that docu
> built into lucene. There is in solr, and lucene can use directories
> on networked file systems.
>
>
> --
> Ian.
>
>
> On Fri, Sep 17, 2010 at 6:08 PM, Christopher Gross wrote:
>> I'm trying to connect to a Lucene index on a test server. All of the
>&g
I'm trying to connect to a Lucene index on a test server. All of the
examples that I've found use a local directory to connect into the
Lucene index, but I can't find one that will remotely hook into it.
Can someone please point me in the right direction? I'm fairly
certain that someone has run
> > > I heard work is being done on re-writing MultiPassIndexSplitter so it
> > > will be a single pass and work quicker.
> > Because that was so slow I just wrote a utility class to create a list of N
> > IndexWriters and round robin documents to them as the index is created.
> > Then we use a Pa
> I heard work is being done on re-writing MultiPassIndexSplitter so it will be
> a
> single pass and work quicker.
Because that was so slow I just wrote a utility class to create a list of N
IndexWriters and round robin documents to them as the index is created. Then we
use a ParallelMultiSear
> [Toke: No frequent updates]
>
> So everything is rebuild from scratch each time? Or do you mean that you're
> only adding new documents, not changing old ones?
Everything is reindexed from scratch - indexing speed is not essential to us...
> Either way, optimizing to a single 140GB segment is
Hi Toke-
> > * 20 million documents [...]
> > * 140GB total index size
> > * Optimized into a single segment
>
> I take it that you do not have frequent updates? Have you tried to see if you
> can get by with more segments without significant slowdown?
Correct - in fact there are no updates and n
We're getting up there in terms of corpus size for our Lucene indexing
application:
* 20 million documents
* all fields need to be stored
* 10 short fields / document
* 1 long free text field / document (analyzed with a custom shingle-based
analyzer)
* 140GB total index size
* Optimized into a s
Hi Larry-
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other
> text analyzers that handle stemming and also keep capitalization, stop words,
> and punctuation?
Have you tried the SnowballFilter? You co
> It looks good to me, but I did not test, when testing, we may print out both
>
> initialQuery.toString() // query produced by QueryParser
> finalQuery.toString() // query after your new function
>
> as comparison, besides testing the query result.
Yes - it's exactly what I wanted:
Test Input
> 2) if I have to accept whole input string with all logic (AND, OR, ..) inside,
>I think it is easier to change TermQuery afterwards than parsing the
> string,
>since final result from query parser should be BooleanQuery (in your
> example),
>then we iterate through each BooleanClause
Hi Lisheng-
>> On a small index that I have I'd like to query certain fields by adding
>> wildcards
>> on either side of the term: foo -> *foo*. I realize the performance
>> implications but there are some cases where these terms are crammed
>> together in the indexed content (ie foonacho) and I
On a small index that I have I'd like to query certain fields by adding
wildcards on either side of the term: foo -> *foo*. I realize the performance
implications but there are some cases where these terms are crammed together in
the indexed content (ie foonacho) and I need to be able to return
This code is in fact working. I had an error in my test case. Things seem
to work as advertised.
sorry / thanks -
C>T>
On Fri, Apr 2, 2010 at 10:20 AM, Christopher Tignor wrote:
> Hello,
>
> I'm having a hard time implementing / understanding a very simple custom
> s
rd(int overlap, int maxOverlap) {
return 1f / (float) maxOverlap;
}
@Override
public float idf(int docFreq, int numDocs) {
return 1f;
}
@Override
public float sloppyFreq(int distance) {
return 0f;
}
}
--
TH!NKMAP
Christopher Tignor | Senior
On Mon, Mar 8, 2010 at 7:52 PM, Michael McCandless
wrote:
> This was done for performance (to remove alloc/init/GC load).
>
> There are two parts to it -- first, consolidating what used to be lots
> of little objects into shared byte[]/int[] blocks. Second, reusing
> those blocks.
Thanks, just o
Hi all,
I'm not sure if this is the right list, as it's sort of a development
question too, but I don't want to bother them over there. Anyway, I'm
curious as to the reason for using "manual memory management" a la
ByteBlockPool and consorts in Java. Is it for performance reasons
alone, to avoid t
rm
> vectors.
>
> Mike
>
> On Fri, Feb 26, 2010 at 7:42 PM, Christopher Condit
> wrote:
> >> Payload Data is accessed through PayloadSpans so using SpanQUeries is
> the
> >> netry point it seems. There are tools like PayloadSpanUtil that convert
> other
>
> It sounds like you need to iterate through all terms sequentially in a given
> field in the doc, accessing offset & payload? In which case reanalyzing at
> search time may be the best way to go.
If it matters it doesn't need to be sequential. I just need access to all the
payloads for a given
> Payload Data is accessed through PayloadSpans so using SpanQUeries is the
> netry point it seems. There are tools like PayloadSpanUtil that convert other
> queries into SpanQueries for this purpose if needed but the api for Payloads
> looks it like it goes through Spans is the bottom line.
So t
Hi Chris-
> To my knoweldge, the character position of the tokens is not preserved by
> Lucene - only the ordinal postion of token's within a document / field is
> preserved. Thus you need to store this character offset information
> separately, say, as Payload data.
Thanks for the information. S
, 2010 at 3:41 PM, Christopher Condit wrote:
> I'm trying to store semantic information in payloads at index time. I
> believe this part is successful - but I'm having trouble getting access to
> the payload locations after the index is created. I'd like to know the
&g
I'm trying to store semantic information in payloads at index time. I believe
this part is successful - but I'm having trouble getting access to the payload
locations after the index is created. I'd like to know the offset in the
original text for the token with the payload - and get this inform
clude the shorter one and get weeded out.
thanks -
C>T>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
The Snowball Analyzer works well for certain constructs but not others. In
particular I'm having a problem with things like "colossal" vs "colossus" and
"hippocampus" vs "hippocampal".
Is there a way to customize the analyzer to include these rules?
Thanks,
-Chris
---
ck Erickson wrote:
> Hmmm, are they unit tests? Or would you be wiling to create stand-alone
> unit tests demonstrating this and submit it as a patch?
>
> Best
> er...@alwaystrollingforworkfromothers.opportunistic.
>
> On Wed, Nov 25, 2009 at 5:38 PM, Christopher Tignor >wrote:
my own tests with my own data show you are correct and the 1-n slop works
for matching terms at the same ordinal position.
thanks!
C>T>
On Wed, Nov 25, 2009 at 4:25 PM, Paul Elschot wrote:
> Op woensdag 25 november 2009 21:20:33 schreef Christopher Tignor:
> > It's worth
It's worth noting however that this -1 slop doesn't seem to work for cases
where oyu want to discover instances of more than two terms at the same
position. Would be nice to be able to explicitly set this in the query
construction.
thanks,
C>T>
On Tue, Nov 24, 2009 at 9:17
tions.emptyList();
}
};
}
}
thanks,
C>T>
On Wed, Nov 25, 2009 at 8:10 AM, Grant Ingersoll wrote:
>
> On Nov 24, 2009, at 9:56 AM, Christophe
glance I'm not sure how to correlate the payload with
> the span match using NSU, nor why they're different.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands
sure why and am tracing through the code, looking at NearSpansUnordered.
Any thoughts?
thanks so much,
C>T>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
yes that indeed works for me.
thanks,
C>T>
On Mon, Nov 23, 2009 at 5:50 PM, Paul Elschot wrote:
> Op maandag 23 november 2009 20:07:58 schreef Christopher Tignor:
> > Also, I noticed that with the above edit to NearSpansOrdered I am getting
> > erroneous results fo no
as a valid in-order result now that the eqaul to clause
has been added to the inequality.
C>T>
On Mon, Nov 23, 2009 at 12:26 PM, Christopher Tignor
wrote:
> Thanks so much for this.
>
> Using an un-ordered query, the -1 slop indeed returns the correct results,
> matching t
iling, returning no results.
C>T>
On Mon, Nov 23, 2009 at 11:59 AM, Mark Miller wrote:
> Your trying -1 with ordered right? Try it with non ordered.
>
> Christopher Tignor wrote:
> > A slop of -1 doesn't work either. I get no results returned.
> >
> > this wo
p of -1, but one could try
> that for both the ordered and unordered cases.
> One way to do that is to start from the existing test cases.
>
> Regards,
> Paul Elschot
>
> >
> > Regards,
> > Adriano Crestani
> >
> > On Thu, Nov 19, 2009 at 7:28 PM, Chr
er
way.
C>T>
On Sat, Nov 21, 2009 at 10:47 PM, Adriano Crestani <
adrianocrest...@gmail.com> wrote:
> Hi,
>
> I didn't test, but you might want to try SpanNearQuery and set slop to
> zero.
> Give it a try and let me know if it worked.
>
> Regards,
t inoto Spans first
which do not support searching for Terms at the same document position?
Any help appreciated.
thanks,
C>T>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
ssing your intent is simply tacking on the part of speech
> marker to the words you care about (e.g. report_n when you wanted
> report as a noun). No phrases or slop required, at the expense of
> more terms.
>
> H, if you wanted to, say, "find all the nouns in the index", y
of synonyms. That is, index
> report and report_n (note no space) at the same location. Then, when
> you wanted to create a part-of-speech-aware query, you'd attach the
> various markers to your terms (_n, _v, _adj, _adv etc.) and not have to
> worry about unexpected side-effects.
t. This will allow you to enumerate all terms that match
> your wildcard term.
> Is that what are you asking for?
>
> simon
>
> On Wed, Nov 18, 2009 at 10:39 PM, Christopher Tignor
> wrote:
> > Hello,
> >
> > Firstly, thanks for all the good answers and suppo
e sort of identifier that describes
the words as having to be at the same location - like a null slop or
something.
Any thoughts on how to do this?
thanks so much,
C>T>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
ot;my*" and get a collection
of document ids that match this search,
is there a good way to determine whether this query found "myopic", "mylar"
or some other term without loading/searching the returned documents?
thanks!
C>T>
--
TH!NKMAP
Christopher Tignor | Senior
cter offset info used for?
thanks so much,
C>T>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
et info used for?
thanks so much,
C>T>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
ng this:
>
>http://issues.apache.org/jira/browse/LUCENE-1573
>
> (which is fixed in 2.9).
>
> Mike
>
> On Fri, Oct 16, 2009 at 6:01 PM, Christopher Tignor
> wrote:
> > I discovered the problem and fixed it's effect on my code:
> >
> > Using the sou
e it, i.e. Future.cancel(false)
thanks,
C>T>
On Fri, Oct 16, 2009 at 4:44 PM, Christopher Tignor wrote:
> Indeed it looks like the thread running MergerThread started (After passing
> off to ConcurentMergeScheduler) by the thread calling IndexWriter.optimize()
> is indeed waiting o
inal int size = mergeExceptions.size();
for(int i=0;iT>
On Fri, Oct 16, 2009 at 4:11 PM, Christopher Tignor wrote:
> After tracing through the lucene source more it seems that what is
> happening is after I call Future.cancel(true) on my parent thread,
> optimize() is called and this
looks like a normal "optimize is waiting for the
> background merges to complete". Is it possible your background merges
> are hitting exceptions? You should see them on your error console if
> so...
>
> Mike
>
> On Fri, Oct 16, 2009 at 3:17 PM, Christopher
t possible to get the stack trace of the thrown exception when the
> thread was interrupted? Maybe indeed something in IW isn't cleaning
> up its state on being interrupted.
>
> Mike
>
> On Fri, Oct 16, 2009 at 1:43 PM, Christopher Tignor
> wrote:
> > thanks for getting bac
e 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Christopher Tignor [mailto:ctig...@thinkmap.com]
> > Sent: Friday, October 16, 2009 6:50 PM
> > To: java-user
> > Subject: IndexWriter optimize() deadloc
issing here?
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
gt;
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999
mple.
But alas, I cannot seem to get access to any TermPositions from my above
BooleanQuery.
I have looked into the contributed SpanExtractorClass but
ConstantScoreRangeQuery seems unsupported
and I am at a loos as to how to best use Spans here.
Any help appreciated,
C>T>
--
TH!NKMAP
Hi Anshum-
> You might want to look at writing a custom analyzer or something and
> add a
> document boost (while indexing) for documents containing those terms.
Do you know how to access the document from an analyzer? It seems to only have
access to the field...
Thanks,
-Chris
---
e-
> From: Christopher Condit [mailto:con...@sdsc.edu]
> Sent: Tuesday, July 21, 2009 2:48 PM
> To: java-user@lucene.apache.org
> Subject: Analysis Question
>
> I'm trying to implement an analyzer that will compute a score based on
> vocabulary terms in the indexed content
I'm trying to implement an analyzer that will compute a score based on
vocabulary terms in the indexed content (ie a document field with more terms in
the vocabulary will score higher). Although I can see the tokens I can't seem
to access the document from the analyzer to set a new field on it a
Thanks!
Chris
______
Christopher Collins \ http://www.cs.utoronto.ca/~ccollins
Department of Computer Science \ University of Toronto
Collaborative User Experience Group \ IBM Research
Please accept my sincere apologies: I was reading the Javadoc of an old
version.
Christopher
__
Christopher Collins \ http://www.cs.utoronto.ca/~ccollins
Department of Computer Science \ University of Toronto
Collaborative User
clarify it.
Thanks,
Chris
______
Christopher Collins \ http://www.cs.utoronto.ca/~ccollins
Department of Computer Science \ University of Toronto
Collaborative User Experience Group \ IBM Research
;Lacing", etc. It's as if the regex is treated as a
"find()" and not a "match()" in Java. Is there a way to make it behave
like a full match, and not a prefix regex?
Thanks!
Christopher
__
Christop
ly it will
happen this fall and we won't have to do it this way anymore.
Appreciate all the help Mike, we got an OK from the customer to wait until
the fall and hopefully a move to linux. So I'll leave it be for now, though
not perfect, at least it works the way it did before I started to
cene");
> }
>
> ctx.setAttribute(FelleskatalogenStartupServlet.SEARCH, new
> Search(getRoot(request) + "/lucene"));
> ctx.setAttribute(FelleskatalogenStartupServlet.SEARCHACTIVE,
> new Boolean(true));
>
BR,
Christopher
On Thu, Jul 10, 2008 at 4:2
ctory
from subversion without having to disable the IndexSearcher?
BR
Christopher
--
Regards,
Christopher Kolstad
=
|100 little bugs in the code, debug one, |
|recompile, 101 little bugs in the code |
=
E-mail: [EMAIL PROTECTED] (University)
[
have to setup? Can you give info on
what commands you ran? I have never used GData, but this error looks
like it is trying to configure something and it is not getting the
class it expects.
-Grant
On Nov 15, 2007, at 7:46 PM, Lyth, Christopher [USA] wrote:
> Nov 15, 2007 7:40:39
Nov 15, 2007 7:40:39 PM
org.apache.lucene.gdata.server.registry.GDataServerRegistry
registerScopeVisitor
INFO: Register scope visitor -- class
org.apache.lucene.gdata.server.registry.ProvidedServiceConfig
Nov 15, 2007 7:40:39 PM org.apache.commons.digester.Digester endElement
SEVERE: End event thre
Is anyone on this list using the gdata server? I have been trying to get
it working and have been running into some problems.
I've encountered a very vexing problem with Lucene 2.0.0. I am able
to create and search an index, but if I attempt get a document out of
the index, an IO exception is thrown. The type of exception depends
on the size of the index. If the index is very small, say fewer than
10 documents I do no
kberry
Date: Wed, 14 Sep 2005 16:59:21 +0200
christopher may wrote:
MOBIC has a article listing as follows " Finally a search engine is
available, thanks to the Apache Lucene team." so I am to assume that this
does include blackberry devices. If your not familar with mobic chec
913
From: Andrzej Bialecki <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Blackberry
Date: Wed, 14 Sep 2005 16:45:13 +0200
christopher may wrote:
Can anyone tell me how I can load lucene on to a blackberry. I would
really like to load i
Can anyone tell me how I can load lucene on to a blackberry. I would really
like to load it into there JDE and run it on the simulator so as much help
you can provide would be greatly appreciated. Thanks
-
To unsubscribe, e
Hey all I am new to this so bear with me. I am looking to put the open
source app Lucene onto a Blackberry. How can I load the code into
Blackberry's jde so I can run it on there simulator. As many steps as
someone could provide would be greatly appreciated. Thanks all
--
a-user@lucene.apache.org
Subject: Re: Excel Spreadsheet
Date: Mon, 8 Aug 2005 10:58:55 -0400
On Aug 8, 2005, at 10:30 AM, christopher may wrote:
I have a spreadsheet with the first cell being the term and the next
cells hold the description. Is there a way I can build this into the
index easily or i
Term
sided by its definition. Thanks hope you can help.
From: Erik Hatcher <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Excel Spreadsheet
Date: Mon, 8 Aug 2005 10:58:55 -0400
On Aug 8, 2005, at 10:30 AM, christopher may wrote:
I
I have a spreadsheet with the first cell being the term and the next cells
hold the description. Is there a way I can build this into the index easily
or is this going to take a custom analyzer ? Any help or ideas would be
greatly appreciated. Thanks
-
a-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: New line
Date: Tue, 19 Jul 2005 10:15:15 -0700 (PDT)
I may be misunderstanding you, but \n is the "newline" character.
http://www.google.com/search?q=newline%20character%20java
Otis
--- christopher may <[EMAIL PROTECTED
I am using text files in my index. What can be used as the new line
character ? Say I have
A batch of apples Apples . So the doc is returned as Apples and the
summary is A batch of apples. If I want to then on the next line of the file
put A state out west Arizona. This all blends together. W
How can I personalize the summary results ? Where and how does Lucene
retrieve this data.
The source location would be great to know but any help would be
appreciated. Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For ad
Is there a simple way for me to add a browse by letter setup on lucene's
main page. If anybody knows of any documents on this I would greatly
appreciate it, Thanks
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional co
I was able to use the current
lucene without modification.
christopher may wrote:
Hey all I am working on a project that requires a search engine on a
embedded linux that is also bluetooth capable. Is there a lucene mobile or
can I recompile the code in the J2me wireless toolkit ? Any help wo
Hey all I am working on a project that requires a search engine on a
embedded linux that is also bluetooth capable. Is there a lucene mobile or
can I recompile the code in the J2me wireless toolkit ? Any help would be
appreciated, Thanks
--
93 matches
Mail list logo