so that their translations have the same order in
the output.
Can I accomplish this using Lucene components? I presume I'd have to start by
creating an analyzer for the foreign language, but then what? How do I (i)
tokenize, (ii) access words in the correct order, (iii) also access non
can I instead get pointers to these fragments in the original contents? In
other words, I need to know where these fragments start and, if possible, end.
Thanks,
Ilya Zavorin
-
To unsubscribe, e-mail: java-user-unsubscr
I am trying to perform a "translation" of sorts of a stream of text. More
specifically, I need to tokenize the input stream, look up every term in a
specialized dictionary and output the corresponding "translation" of the token.
However, i also want to preserve all the original whitespaces, stop
o:torin...@gmail.com]
Sent: Monday, January 16, 2012 5:50 AM
To: java-user@lucene.apache.org
Subject: Re: how to preserve whitespaces etc when tokenizing stream?
Maybe you could simply use String.replace()?
Or the text actually needs to be tokenized?
On Fri, Jan 13, 2012 at 18:44, Ilya Zavorin w
rch only the part of index that corresponds to doc X". Or can I?
Is there any way to make this incremental index/search more efficient? For
instance, is it at all possible to restrict where in the index a search for
hits is performed? Or any other optimization?
Thanks much
Ilya Zavorin
languages using different
scripts, e.g. Latin vs Cyrillic vs Arabic vs Chinese etc.
Thanks much
Ilya Zavorin
a way to do it faster using
Lucene's core or Highlighter machinery?
Thanks
Ilya Zavorin
I am writing a Lucene based indexing-search app and testing it using some
simple docs and querries. I have 3 simples docs that are shown at the bottom of
the this email between pairs of "==="s and about a dozen terms.
One of them is "electricity". As you can see, it appears in al
can tell you what's in your index: <http://code.google.com/p/luke/>
Steve
-Original Message-
From: Ilya Zavorin [mailto:izavo...@caci.com]
Sent: Monday, March 26, 2012 10:11 AM
To: java-user@lucene.apache.org
Subject: can't find common words -- using Lucene 3.4.0
I am w
original text.
Are you sure that these files were analyzed with StandardAnalyzer, and not some
other language-specific analyzer, as a result of language misidentification?
Steve
-Original Message-
From: Ilya Zavorin [mailto:izavo...@caci.com]
Sent: Monday, March 26, 2012 11:21 AM
To: j
));
IndexWriter writer = new IndexWriter(dir, iwc);
Anything suspicious here?
Thanks
Ilya Zavorin
-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu]
Sent: Monday, March 26, 2012 1:48 PM
To: java-user@lucene.apache.org
Subject: RE: can't find common
Hello All,
I am using 3.4. I need to find locations of query hits in a document. What I've
implemented works fine for textual queries but does not work for phone numbers.
Here's how I index my docs:
String oc = "Joe dialed 800-555-1212 but got a busy signal";
doc.add(new Field("contents",
numbers
Try putting the phone number in quotes in the query:
String qstr = "\"800-555-1212\"";
And check query.toString to see how the query parser analyzed the term, bot
with and without quotes.
And make sure you initialized the query parser with "contents" as the default
ler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Ilya Zavorin [mailto:izavo...@caci.com]
> Sent: Thursday, June 14, 2012 6:49 PM
> To: java-user@lucene.apache.org
> Subject: RE: need to find locations of quer
numbers
Look at this code: QueryTermExtractor.getTerms(Query query)
http://lucene.apache.org/core/3_6_0/api/contrib-highlighter/org/apache/lucene/search/highlight/QueryTermExtractor.html
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Thursday, June 14, 2012 2:36 PM
To: java-user
Hi,
I am using 3.4.0 and just discovered a weird issue. I have a set of simple
English one-word queries and two target files that I want to search. One has
all these queries in one line, i.e. something like this
Query1 Query2 Query3 Query4
The other has them one per line, i.e.
Query1
Query2
Q
But why then does it find all the querries in the 1st file? I use exactly the
same code.
IZ
-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, July 13, 2012 12:32 PM
To: java-user@lucene.apache.org
Subject: RE: can't find queries when they are one per line
ou are
doing we cannot answer your request.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -----Original Message-
> From: Ilya Zavorin [mailto:izavo...@caci.com]
> Sent: Friday, July 13, 2012 6:39 PM
> To: java-user@l
Ian,
Turns out you were very close to the truth. The problem was in how I was
ingesting the original file into memory before indexing.
Thanks,
Mr. Ilya Zavorin
Applied Research and Consulting
CACI Advanced Knowledge Solutions Division
4831 Walden Lane, Lanham, MD 20706
ph: 1-301-306-2859
fx
nce rather than tokenizing and looping over tokens?
Thanks much,
Ilya Zavorin
t.
Essentially, what I am trying to do is implement substring matching more
efficiently that using Java's standard substring matching methods.
Thanks!
Ilya Zavorin
Does it mean that the resulting index will be very large?
Thanks,
Ilya
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, August 24, 2012 4:59 PM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
> search for a string "run", I
Does Lucene support this type of structure, or do I need to somehow implement
it outside Lucene?
By the way, I need this to run on an Android phone so size of memory might be
an issue...
Thanks,
Ilya Zavorin
-Original Message-
From: Dawid Weiss [mailto:dawid.we...@gmail.com]
Sent
The user uploads a set of text files, either all of them at once or one at a
time, and then they will be searched locally on the phone against a set of
"hotlist" words. This assumes no connection to any sort of server so everything
must be done locally.
I already have Lucene integrated so I mig
ust like the
tilde is removed above. What is the complete set of such characters? Do I need
to do any other preprocess?
Thanks,
Ilya Zavorin
dd the fuzzy
query. Note: In 4.0 the fuzzy query is limited to an editing distance of 2.
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Monday, September 17, 2012 10:41 AM
To: java-user@lucene.apache.org
Subject: how to fully preprocess query before fuzzy search?
I am proces
e indexing/searching/highlighting steps? Can I use the lucene and
highlighting jars (lucene-core-3.4.0.jar and lucene-highlighter-3.4.0.jar) "out
of the box"?
Also, is there any sample code that would show how Lucene components should be
invoked on Android?
Thank you,
Ilya Zavorin
text that was quite far
from the original query. For instance, I was looking for a 3-word term and it
highlighted a sequence of only 2 of these 3 words. How can I control how close
highlighted fragments should be to the original query?
Thanks much,
Ilya Zavorin
28 matches
Mail list logo