Yes, just add the field two times to Document (with the same name), to
achieve this.
Using the same name is no problem, as between stored and inverted fields no
relation exists. Lucene always created internally "two fields" with the same
name. You can still do this, but if you want to compress, yo
Well, I think some people will be for hiding complexity, while others will be
for being in control and having transparency. Think how surprised one would be
to find 1 extra field in his index, say when looking at their index with Luke.
:)
Otis
--
Sematext is hiring -- http://sematext.com/about
Hi,
Thanks for your feedbacks. I have checked it again and found that this
behavior is rather consistent. So may be OS cache and Lucene warm up have
big impact.
Regards,
Dinh
I understand the reasons, but - if I may ask so late in the game - was
this the best way to do this?
>From a user (developer) perspective, this is an implementation issue.
Couldn't this have been done behind the scenes, so that when I asked
for Field.Index.ANALYZED && Field.Store.COMPRESS, instea
So will I need to use 2 fields, one filed is analyzed and the other
field is binary, to replace one compressed fields previously?
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucen
Here is some of the history:
https://issues.apache.org/jira/browse/LUCENE-652
https://issues.apache.org/jira/browse/LUCENE-1960
Glen Newton wrote:
> Could someone send me where the rationale for the removal of
> COMPRESSED fields is? I've looked at
> http://people.apache.org/~uschindler/staging-a
Because you can do the compression yourself by just adding a binary stored
field with the compressed content. And then you can use any algorithm, even
bz2 or whatever.
The problem is that the compressed fields made lot's of problems and special
cases during merging, because they were always decomp
Could someone send me where the rationale for the removal of
COMPRESSED fields is? I've looked at
http://people.apache.org/~uschindler/staging-area/lucene-3.0.0-rc1/changes/Changes.html#3.0.0.changes_in_runtime_behavior
but it is a little light on the 'why' of this change.
My fault - of course - f
Hello Lucene users,
On behalf of the Lucene dev community (a growing community far larger than
just the committers) I would like to announce the first release candidate
for Lucene Java 3.0.
Please download and check it out - take it for a spin and kick the tires. If
all goes well, we hope
The "usual" recommendation is just to fire up a series of warmup
queries at startup if you really require the first queries to be fast.
Best
Erick
On Tue, Nov 17, 2009 at 2:43 PM, Scott Ribe wrote:
> > Most likely due to the operating system caching the relevant portions of
> the
> > index after
> Most likely due to the operating system caching the relevant portions of the
> index after the first set of queries.
I have enough RAM to keep the Lucene indexes in memory all the time, so I
"dd ... > /dev/null" the files at boot. And also perform a single query to
force JIT of the query code. T
Hello,
Most likely due to the operating system caching the relevant portions of the
index after the first set of queries.
Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
- Original Message
> From: Din
The character offset info is only stored if you enable
Field.TermVector.WITH_OFFSETS or WITH_POSITIONS_OFFSETS on the field.
Then, it can only be retrieved if you get the term vectors for that
document, and locate the term & specific occurrence that you're
interested in.
This is likely quite a bi
Hello,
Hoping someone might clear up a question for me:
When Tokenizing we provide the start and end character offsets for each
token locating it within the source text.
If I tokenize the text "word" and then search for the term "word" in the
same field, how can I recover this character offset i
Hi all,
I made a list of 4 simple, singe term queries and do 4 searches via Lucene
and find that if the term is used for search in the first time, Lucene takes
quite a bit time to handle it.
- Query A
00:27:28,781 INFO LuceneSearchService:151 - Internal search took
328.21463ms
00:27:28,781 INFO
> But if re-creating the entire file on each reopen isn't a problem for
> you then there's no need to change this :)
It's actually created after IndexWriter.commit(), but same idea. If we
needed real-time indexing, or if disk I/O gets excessive, I'd go with
separate files per segment.
>Hmm -- if
On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan wrote:
> The external data is just an array of fixed-length records, one for each
> Lucene document. Indexes are updated at regular intervals in one jvm. A
> searcher jvm opens the index and reads all the fixed-length records into
> RAM. Given an index
On Tue, Nov 17, 2009 at 10:23 AM, Peter Keegan wrote:
>>This is a generic solution, but just make sure you don't do the
>>map lookup for every doc collected, if you can help it, else that'll
>>slow down your search.
>
> What I just learned is that a Scorer is created for each segment (lights
> on!
Hello,
Hoping someone might clear up a question for me:
When Tokenizing we provide the start and end character offsets for each
token locating it within the source text.
If I tokenize the text "word" and then serach for the term "word" in the
same field, how can I recover this character offset i
>This is a generic solution, but just make sure you don't do the
>map lookup for every doc collected, if you can help it, else that'll
>slow down your search.
What I just learned is that a Scorer is created for each segment (lights
on!).
So, couldn't I just do the subreader->docBase map lookup onc
The external data is just an array of fixed-length records, one for each
Lucene document. Indexes are updated at regular intervals in one jvm. A
searcher jvm opens the index and reads all the fixed-length records into
RAM. Given an index-wide docId, the custom scorer can quickly access the
correspo
right!
this just emphasized the word 'ironic' :-)
2009/11/17 Michael McCandless :
> Remember that, like Lucee, if you give this query to google:
>
> java -server
>
> It means "find all docs that contain java and do not contain server".
> I'm sure this has messed up a great many people trying t
On Mon, Nov 16, 2009 at 6:38 PM, Peter Keegan wrote:
>>Can you remap your external data to be per segment?
>
> That would provide the tightest integration but would require a major
> redesign. Currently, the external data is in a single file created by
> reading a stored field after the Lucene in
Remember that, like Lucee, if you give this query to google:
java -server
It means "find all docs that contain java and do not contain server".
I'm sure this has messed up a great many people trying to figure out
command line options ;)
The fix is to put the -server in double quotes:
ja
Hello all,
I am having millions of records in the database and in that 75% of the records
required to be sorted. Does 2.9 provides facility to do custom sorting (Avoid
loading all records) ?
Regards
Ganesh
Send instant messages to your online friends http://in.messenger.yahoo.com
The PriorityQueue is fixed size, it cannot grow (please note, it is *not*
Java's PQ, ist an own one!).
TopDocs will contain only n documents in it's scoreDocs array, the reported
total hit count will return all matches!
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.
Hello all,
Sorry if this is offtopic or already discussed/documented somewhere.
Regarding lucene 2.9.1 javadoc:
In Searcher the method "TopDocs search(Query query, int n)" says "Finds the
top n hits for query."
However if I do a search(someQuery, 100) which gets me 1000 results all
results are a
27 matches
Mail list logo