Hi,
This is a kind of followup to a thread a couple of weeks ago.
In my indexer, I want to pre-pend a string to certain terms to make it easier
to search. So for example, if I have a string "XXX", I want to add, say,
"field1" to it, to get "field1XXX" before I index it.
To make it easier to s
Paul Cowan wrote:
> oh...@cox.net wrote:
> > - I'd have to create a (very small) index, for each sub-document, where I
> > do the Document.add() with just the (for example) two terms, then
> > - Run a query against the 1-entry index, which
> > - Would either give me a "yes" or "no" (for th
Paul Cowan wrote:
> oh...@cox.net wrote:
> > Document1 subdoc1 term1 term2
> > subdoc2 term1a term2a
> > subdoc3 term1b term2b
> >
> > However, I've now been asked to implement the ability to query t
Hi,
I guess, that, in short, what I'm really trying to find out is:
If I construct a Lucene query, can I (somehow) use that to query a String
object that I have, rather than querying against a Lucene index?
Thanks,
Jim
oh...@cox.net wrote:
> Hi,
>
> This question is going to be a littl
Hi,
This question is going to be a little complicated to explain, but let me try.
I have implemented an indexer app based on the demo IndexFiles app, and a web
app based on the luceneweb web app for the searching.
In my case, the "Documents" that I'm indexing are a proprietary file type, and
e
Hi,
I've been doing development of my indexer app, which uses StandardAnalyzer on a
WIndows machine, and today, I deployed an initial onto a Redhat Linux (RHEL)
machine.
On my development machine, I have the files that are being indexed in something
like:
C:\lucene-devel\files\dir1\xxx.d
Hi Matt,
Good catch! As I just posted, I *just* noticed that (Luke use Keyword
Analyzer) :)!!!
Once I switched Luke to using Standard Analyzer, the Luke search results
matched my web query results.
Thanks!
Jim
Matthew Hall wrote:
> Luke defaults to KeywordAnalyzer when you do a sea
Andrzej,
Hah!
I tried as you suggested using Luke, and I found at least part of my problem.
Luke was defaulting to KeywordAnalyzer.
I changed that to StandardAnalyzer, and did queries for:
path:x
and
path:xx.dat
For the first, the Rewritten was:
Ian,
I just re-confirmed that StandardAnalyzer is used in both my indexer app and in
the query/search web app.
The actual file paths look like:
C:\lucene-devel\dat\.dat
or
C:\lucene-devel\data\testdir\\.dat
For field "path", Luke shows:
lucene
data
c
devel
dat
Hi Phil,
Well, kind of... but...
Then, why, when I do the search in Luke, do I get the results I cited:
==> succeeds
.yyy ==> fails (no results)
I guess that I've been assuming that the search in Luke is "correct" and I've
been using that to "test my understanding", but maybe that'
Phil,
I need to be more precise...
The files that I have are at, say:
C:\dir1\dir2\
so, for example, I have
C:\dir1\dir2\file-1-1.dat
C:\dir1\dir2\file-1-2.dat
C:\dir1\dir2\file-1-3.dat
C:\dir1\dir2\file-1-4.dat
C:\dir1\dir2\file-1-5.dat
After indexing, and, using Luke, I look at the "path" f
Phil,
Both my indexer and the webapp are basically from the Lucene demos, the indexer
starting with the IndexFiles.java demo code, so I think they're both using the
StandardAnalyzer.
What appears in Luke, when I select "path" is just the filename part, without
the extension, i.e., the "" p
Hi,
In my indexer app (based on the IndexFiles.java demo), I am adding the "path"
field:
doc.add(new Field("path", f.getPath(), Field.Store.YES,
Field.Index.ANALYZED));
Per Luke, the full path (e.g., "c:\\.yyy") gets parsed, and one of the
terms (again, per Luke) is "", i.e.,
Hi Ian,
Ok, thanks for the additional info.
I've implemented check for both file.lastModified and file.length(), and it
seems to work in my dev environment (Windows), so I'll have to test on a "real"
system.
Thanks again,
Jim
Ian Lea wrote:
> Jim
>
>
> The sleep is simply
>
>
Ian,
One question about the 4th alternative: I was wondering how you implemented
the sleep() in Java, esp. in such a way as not to mess up any of the Lucene
stuff (in case there's threading)?
Right now, my indexer/inserter app doesn't explicitly do any threading stuff.
Thanks,
Jim
oh..
Hi Ian,
Thanks for the quick response.
I forgot to mention, but in our case, the "producers" is part of a commercial
package, so we don't have a way to get them to change anything, so I think the
1st 3 suggestions are not feasible for us.
I have considered something like the 4th suggestion (ch
Hi,
I have an app to initially create a Lucene index, and to populate it with
documents. I'm now working on that app to insert new documents into that
Lucene index.
In general, this new app, which is based loosely on the demo apps (e.g.,
IndexFiles.java), is working, i.e., I can run it with a
I posted, there was a close() in the
> > finally?
> >
> > Or, are you saying that when an IndexReader is opened, that that somehow
> > persists in the system, even past my Java app terminating?
> >
> > FYI, I'm doing this testing on Windows, under Eclipse...
>
g2011 wrote:
>
> hi,as you the error messages you listed below,pls put the 'reader.close()'
> block to the bottom of method.
> i think,if you invoke it first,the infrastructure stream is closed ,so
> exceptions is encountered.
>
>
> ohaya wrote:
> >
&g
Hi Phil,
For problem with my app, it wasn't what you suggested (about the tokens, etc.).
For some later things, my indexer creates both a "path" field that is analyzed
(and thus tokenized, etc.) and another field, "fullpath", which is not analyzed
(and thus, not tokenized).
The problem with my
Hi,
BTW, my indexer app is basically the same as the demo IndexFiles.java. Here's
part of the main:
try {
IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
true, IndexWriter.MaxFieldLength.LIMITED);
System.out.println("Indexing to directory '" +INDEX_DIR+
Hi,
I've noticed a kind of strange problem with term counts and actual terms.
Some background: I wrote an app that creates an index, including a "path"
field.
I am now working on an app (code was in the previous thread) that, as part of
what it does, needs to get a list of all of the "path"
Hi,
I don't know what happened, but all of a sudden, it started working :(...
Jim
oh...@cox.net wrote:
> Hi,
>
> I changed the beginning of the try to:
>
> try {
> System.out.println("About to call .next()...");
> boolean foo = t
Hi,
I changed the beginning of the try to:
try {
System.out.println("About to call .next()...");
boolean foo = termsEnumerator.next();
System.out.println("Finished calling first .next()");
Hi,
BTW, the next() method is an abstract method in the Javadocs. Does that mean
that I'm suppose to have my own implementation?
Jim
oh...@cox.net wrote:
> Phil,
>
> I posted in haste. Actually, from the output that I posted, doesn't it it
> look like the .next() itself is throwing t
Phil,
I posted in haste. Actually, from the output that I posted, doesn't it it look
like the .next() itself is throwing the exception?
That is what has been puzzling me. It looks like it got through the open() and
terms() with no problem, then it blew up when calling the next()?
Jim
Phil,
Yes, that exception is not very helpful :)!!
I'll try your suggestions and post back.
Thanks,
Jim
Phil Whelan wrote:
> Hi Jim,
>
> I cannot see anything obvious, but both open() and terms() throw
> IOException's. You could try putting these in separate try..catch
> blocks to see
Hi,
I'm starting to work on an app to list all of the terms in the "path" field.
I'm including the beginning of my code below.
When I run this, pointing it to a directory named "index" containing the Lucene
indexes, I am getting a java.io.IOException.
Here's the output when I run:
Index in d
Hi,
I don't know the answer to your questions, but I'm guessing that the answer to
#3 is probably because the answers to #1 and #2.
Did you try to look at the indexes using Luke? That shows the top 50 terms
when it starts, so it might be obvious what the differences are, and that might
give
Hi,
Sorry to jump in, but I've been following this thread with interest
:)...
Am I misunderstanding your original observation, that
ThreadedIndexWriter produced smaller index? Did the ThreadedIndexWriter
also finish faster (I'm assuming that it should)?
If the index is smaller, and everyt
Hi,
Phil and Ian,
Thanks for the responses and confirmations about this.
Assuming that our requirements (as I described earlier) don't change, it looks
like this updating/inserting thing should be pretty easy :)!
Later, and have a great weekend!
Jim
Phil Whelan wrote:
> Hi Jim,
>
Hi,
I still am new to Lucene, but I think I have an initial indexer app (based on
the demo IndexFiles app) working, and also have a web app, based on the demo
luceneweb web app working.
I'm still busy tweaking both, but am starting to think ahead, about operational
type issues, esp. updating
Hi Ahmet,
Thanks for the clarification and information! That was exactly what I was
looking for.
Jim
AHMET ARSLAN wrote:
>
> > I guess that the obvious question is "Which characters are
> > considered 'punctuation characters'?".
>
> Punctuation = ("_"|"-"|"/"|"."|",")
>
> > In part
Phil Whelan wrote:
> On Thu, Jul 30, 2009 at 7:12 PM, wrote:
> > I was wonder if there is a list of special characters for the standard
> > analyzer?
> >
> > What I mean by "special" is characters that the analyzer considers break
> > characters.
> > For example, if I have something like
Hi,
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by "special" is characters that the analyzer considers break
characters. For example, if I have something like "foo=something", apparently
the analyzer considers this as two terms, "foo" and "so
prashant ullegaddi wrote:
> How to get the number of times a term occurs in the Lucene index?
>
> Regards,
> Prashant.
Hi,
You didn't mention if you were looking for something programmatic or not, but
there's a tool called "Luke", and when you start that up and point it to your
index
Hi Matthew and Narcis,
I think that I found the (original) problem.
It looks like the reason that I was getting all those other terms, which looked
to me like the octets, weren't the octets :)...
When I was doing the doc.add(), there were some other numbers (not IP
addresses) in the String tha
Ian,
I'll respond to this msg, re. searching "path".
I made the change you suggested, to "Field.Index.ANALYZED", and that fixed the
problem I was having with searching for components of the "path" field.
Thanks!
Jim
Ian Lea wrote:
> In contrast to your last question and reply, if you u
Hi,
Oh. Ok, thanks! I'll give that a try.
Jim
"Armasu wrote:
> Keyword: Field.Index.NOT_ANALYZED
>
> -Original Message-
> From: oh...@cox.net [mailto:oh...@cox.net]
> Sent: Thursday, July 30, 2009 4:36 PM
> To: java-user@lucene.apache.org
> Subject: How to index IP addresses?
Hi,
I am working with a modified version of the demo IndexFiles.
In that code, when it builds the index, it has:
doc.add(new Field("path", f.getPath(), Field.Store.YES,
Field.Index.NOT_ANALYZED));
In Luke, I can see all the file paths in the "path" field.
I am also using the demo lucenewe
Hi,
I am trying to index information in some proprietary-formatted files.
In particular, these files contain some IP addresses in dotted notation, e.g.,
aa.bb.cc.dd.
For my initial test, I have a Document implementation, and after I extract what
I need into a String named "Info", I do:
doc.
Matthew,
Ok, thanks for the clarifications.
When I have some quiet time, I'll try to re-do the tests I did earlier and post
back if any questions.
Thanks again,
Jim
Matthew Hall wrote:
> Oh.. no.
>
> If you specifically include a fieldname: blah in your clause, you don't
> need a Mult
;>>> Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't
> >>>> think that's the problem either :(...
> >>>>
> >>>> I looked at the SearchFiles.java code, and it looks like it's literally
> >>>> u
; >> searching on the fields other than the "contents" field (recall, I'm
> >> pretty sure that all those other fields are in the index, via Luke)?
> >>
> >> Jim
> >>
> >>
> >>
> >> Ian Lea wrote:
> &g
t match. A
> search for "FooFoo" would, assuming that your search terms are not
> being lowercased.
>
>
>
> --
> Ian.
>
>
> On Tue, Jul 28, 2009 at 1:56 PM, Ohaya wrote:
> > Hi,
> >
> > I'm just starting to work with Lucene, and I gues
Hi,
I'm just starting to work with Lucene, and I guess that I learn best by
working with code, so I've started with the demos in the Lucene
distribution.
I got the IndexFiles.java and IndexHTML.java working, and also the
luceneweb.war is deployed to Tomcat.
I used IndexFiles.java to index
46 matches
Mail list logo