s far from perfect, since in
the worst case field values could be unique for each document, leading to
high memory consumption.
- using Facets API by adding a FacetField for each document field and
utilizing FastTaxonomyFacetCounts for querying top N values. With this
approach I am able to both f
Hi all!
A little bit more of exploration:)
After indexing with multiple atomic field values, here is what I get:
indexSearcher.doc(0).getFields("gramm")
stored,indexed,tokenized,termVector,omitNorms
stored,indexed,tokenized,termVector,omitNorms
stored,indexed,tokenized,termVector
tokens concatenated eventually?
--
Igor
27.09.2013, 18:12, "Igor Shalyminov" :
> Hello!
>
> I have really long document field values. Tokens of these fields are of the
> form: word|payload|position_increment. (I need to control position increments
> and payload manually.)
&
Hello!
I have really long document field values. Tokens of these fields are of the
form: word|payload|position_increment. (I need to control position increments
and payload manually.)
I collect these compound tokens for the entire document, then join them with a
'\t', and then pass t
Hello!
I'm thinking on a way of implementing the search with word ambiguity in Lucene.
Say, a word "duck" appears in a document at the position 10.
It has 2 Part-of-Speech tags: "Noun" and "Verb". And I want to recover this
position both for POS:Noun and POS:Verb queries.
So can you please point
On Thu, Jan 31, 2013 at 7:07 AM, Gili Nachum wrote:
> So, when loading the results I want to return (say 10 documents), if not
> all docs fit in RAM, I would incur up to 10 individual disk seek
> operations. Which will kill my performance. Is that correct?
Yes, 10 seeks, and that may or may not
Hi Mike,
So, when loading the results I want to return (say 10 documents), if not
all docs fit in RAM, I would incur up to 10 individual disk seek
operations. Which will kill my performance. Is that correct?
Considering what are my alternatives:
1. Create another separate lean index that would f
Hi Gili,
I responded last time you asked this:
http://lucene.markmail.org/thread/svun5cdtgiy4hnjg
Maybe you are not subscribed to the list?
Mike McCandless
http://blog.mikemccandless.com
On Sat, Jan 26, 2013 at 7:45 AM, Gili Nachum wrote:
> Hi,
>
> I have a search workload that focuses o
Hi,
I have a search workload that focuses on two fields in my 1GB index. I get
very good performance when loaded the index via MMapDirectory. I attribute
this performance to the Operating System File System (FS OS) cache, that
keeps the most recently used FS blocks RAM resident.
*I would like to
Are the additional rarely used 48 fields used for searching? Or, for
looking up stored fields?
If it's for searching then you should see good locality (efficient use
of the OS's IO cache) from the posting lists: each field's postings
are stored in a single chunk of the files, then the next field'
Hi,
I have a search workload that focuses on two fields in my 1GB index. I get
very good performance when loaded the index via MmapDirectory. I attribute
this performance to the Operating System File System (FS OS) cache, that
keeps the most recently used FS blocks RAM resident.
I would like to
I have an indexer that already collapses field values into a Map of
(value, count) before indexing, and I would like to specify an increment
to frequency (docFreq?) when adding a field value to a Lucene Document.
Should I just add the same value multiple times?
-Mike
On Wed, Mar 28, 2012 at 2:30 PM, Tim Eck wrote:
> Excuse my ignorance of lucene internals, but is the problem any easier if
> the requirement is just to allow the addition/removal of stored only fields
> (as opposed to indexed)?
It would substantially simplify the problem... but even this
simplif
index with a new set of fields.
-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Wednesday, March 28, 2012 5:31 AM
To: java-user@lucene.apache.org
Subject: Re: Can I add new field values to a existing lucene index ?
Alas, no, not yet. This is a
, Anupam Bhattacharya
wrote:
> Does Lucene API allows to add new field values to a existing doc.
>
> For example,
> Initially
>
> Doc 1 FieldName Value id 1 schoolname xyz zipcode pqr
>
>
> After update can I keep the existing field values and add 2 more fields.
>
&
[mailto:simon.willna...@googlemail.com]
Sent: Friday, February 24, 2012 1:29 PM
To: java-user@lucene.apache.org
Subject: Re: retrieved doc field values being cached?
Hey Stuart,
Lucene solely relies on the FS cache with some exceptions for the
term-dictionary and FieldCache which is pulled
e:
>
> Lucene (using 3.5) seems to be caching field values for documents (after they
> have been retrieved) and I am hoping someone can provide more information on
> how and where exactly the field values are stored.
>
> The table below lists the times (in milliseconds) associated
Lucene (using 3.5) seems to be caching field values for documents (after they
have been retrieved) and I am hoping someone can provide more information on
how and where exactly the field values are stored.
The table below lists the times (in milliseconds) associated with retrieving
for a set
Hi,
thanks for your answer. I know about lazy loading fields, but my
question is whether fields are always loaded as a whole or if it is
possible in some way to stream a field's contents.
Regards,
Günter
--
Dipl.-Inform. Günter Ladwig
Institute AIFB, University of Karlsruhe, D-76128 Karlsru
Hi,
thanks for your answer. I know about lazy loading fields, but my
question is whether fields are always loaded as a whole or if it is
possible in some way to stream a field's contents.
Regards,
Günter
--
Dipl.-Inform. Günter Ladwig
Institute AIFB, University of Karlsruhe, D-76128 Karlsru
Have a look at the FieldSelector and the Lazy load capability. See http://www.lucidimagination.com/search/?q=FieldSelector
for some pointers.
-Grant
On Jul 14, 2009, at 11:12 AM, Günter Ladwig wrote:
Hi,
I have a situation, where stored, un-indexed fields can contain
potentially large amo
Hi,
I have a situation, where stored, un-indexed fields can contain
potentially large amounts of data. Is it possibly to read the contents
of a field incrementally? That is, do not load the complete contents
from disk, but read X bytes at a time. Does the Reader returned by
Field.readerVa
Yes, see the Payload functionality and the BoostingTermQuery:
http://www.lucidimagination.com/search/?q=Payload
On Jul 9, 2009, at 6:42 PM, Eric Chu wrote:
Hi all,
I was wondering if there is any way to do a boost on the document
based on
which value is in a field matched by a query.
ie,
Hi all,
I was wondering if there is any way to do a boost on the document based on
which value is in a field matched by a query.
ie, (Sample code below)
- You have a document that contains 1 field with multiple values.
- Field has value ABC boosted by 2.0
- Field has value XYZ boosted by 3.0
- I
site, (anonymous per request) got
2.6 Million Euro funding!
On Sun, Feb 15, 2009 at 5:02 AM, Joel Halbert wrote:
> Hi,
>
> I'm looking for an optimal solution for extracting unique field values.
> The rub is that I want to be able to perform this for a unique subset of
> documents...
Hi,
I'm looking for an optimal solution for extracting unique field values.
The rub is that I want to be able to perform this for a unique subset of
documents...as per the example:
I have an index with Field1 and Field2.
I want "all unique values of Field1 where Field2=X".
Othe
Akanksha Baid wrote:
I have indexed multiple documents - each of them have 3 fields ( id, tag
, text). Is there an easy way to determine the set of tags for a given
query without iterating through all the hits?
For example if I have 100 documents in my index and my set of tag = {A,
B, C}. Query
You could go through this implementation. Have been using this (improvised)
for a while now. There might be better ways to do so too. so you could
check!
http://www.gossamer-threads.com/lists/lucene/java-user/35704?search_string=categorycounts;#35704
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blo
Hi, You may also want to take a look at Carrot2:
http://demo.carrot2.org/demo-stable/main
Lucene documentation references them, but I was disappointed to see that
they had an open source version (really old) and one that you can buy. It
may work for you.
Also, take a look at SOLR's implementatio
: For example if I have 100 documents in my index and my set of tag = {A, B, C}.
: Query Q on the text field returns 15 docs with tag A , 10 with tag B and none
: with tag C (total of 25 hits). Is there a way to determine that the set of
: tags for query Q = {A, B} without iterating through all 25
Is there something I could do to Index the documents differently to
accomplish this? Currently I am looking at all the hits to generate the
set of tags for the query.
If I need to implement the same thing within Lucene, I am not sure if I
will gain anything performance wise. Or am I wrong about
Hi,
You could try changing (or extending) TopFieldDocCollector and do your
processing there (that is what I tried... and it worked fine). But that
would mean changing lucene code a little bit.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody,
I have indexed multiple documents - each of them have 3 fields ( id, tag
, text). Is there an easy way to determine the set of tags for a given
query without iterating through all the hits?
For example if I have 100 documents in my index and my set of tag = {A,
B, C}. Query Q on the text field r
, and I need to
offset all dates based on change to stored time zone (subtract 12
hours from each value). The sort order of field values would not
change, and the postings should not need to change, only values of
fields. I do not believe there is any API to do it, but is there
some lower lev
from each value). The sort order of field values would not
change, and the postings should not need to change, only values of
fields. I do not believe there is any API to do it, but is there
some lower level way to do it (modifying files manually)? I only ask
because I have a large index and I don
sort order
of field values would not change, and the postings should not need to change,
only values of fields. I do not believe there is any API to do it, but is
there some lower level way to do it (modifying files manually)? I only ask
because I have a large index and I don't want to re-
D'oh!
Of course - I'm using StandardAnalyzer. Changing to a
PerFieldAnalyzerWrapper with a KeywordAnalyzer for
that field fixes the issue.
Thanks so much for fast response.
Ulf
--- Ian Lea <[EMAIL PROTECTED]> wrote:
> Hi
>
>
> I bet you are using an analyzer that is downcasing
> isbn:00714
Hi
I bet you are using an analyzer that is downcasing isbn:007149216X to
isbn:007149216x. I've been there! Options include creating the query
programmatically, using PerFieldAnalyzerWrapper, downcasing everything
yourself in advance. Or convert to ISBN-13.
--
Ian.
On Fri, May 16, 2008 at 10
Hello-
I'm experiencing a weird issue searching an index. The
index has information about books, and one of the
fields is the ISBN number. It is stored in the index
in untokenized form to enable searches by ISBN. So a
query like "isbn:0071490833" would return the Document
for that book. But it doe
Thanks.
> Date: Mon, 24 Mar 2008 21:03:13 -0700
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: RE: Field values ...
>
>
> : The Id and Phone fields are stored. So I can just do a MatchAllQuery as
> : you suggested. I have read about field s
: The Id and Phone fields are stored. So I can just do a MatchAllQuery as
: you suggested. I have read about field selectors on this mailing list
: but have never used it. Does anyone know where I can find some sample
: code? Thank you.
there's a couple of reusable implementations in subver
[EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: RE: Field values ...
>
>
> : I want to do something like:
> :
> : List infoList = new ArrayList ();
> : foreach (Document doc in LuceneIndex)
> : {
> :String id = doc.get ("Id&quo
: I want to do something like:
:
: List infoList = new ArrayList ();
: foreach (Document doc in LuceneIndex)
: {
:String id = doc.get ("Id");
:String phone = doc.get ("Phone");
:infoList.add (new Info (id, phone));
: }
If "Id" and "Phone" are stored value
);
}
Thank you.
> Date: Thu, 20 Mar 2008 10:05:17 -0400
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Re: Field values ...
>
> See TermDocs/TermEnum. The trick is to start one of your enumerations
> with "" (I forget exactly which), and
));
}
Thank you.
> Date: Thu, 20 Mar 2008 10:05:17 -0400
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Re: Field values ...
>
> See TermDocs/TermEnum. The trick is to start one of your enumerations
> with "" (I forget exactly which), and that
See TermDocs/TermEnum. The trick is to start one of your enumerations
with "" (I forget exactly which), and that'll iterate them all.
Best
Erick
On Thu, Mar 20, 2008 at 9:55 AM, Dragon Fly <[EMAIL PROTECTED]>
wrote:
> What's the easiest way to extract the values of 2 fields from each
> document
What's the easiest way to extract the values of 2 fields from each document in
the index. For example, each document has 5 fields:
Id Name Address Phone Preference
I'd like to extract the values for the Id and Phone fields for each document in
the index. Thank you.
On Tuesday 27 February 2007 19:21, Michael Barbarelli wrote:
> GB821628930 (+VAT_reg:GB* doesn't work)
What about VAT_reg:gb*? Also see QueryParser.setLowercaseExpandedTerms()
Regards
Daniel
--
http://www.danielnaber.de
-
T
Hello. I'm using Lucene.NET, but would like to pose the question here in
the Java group since I think the collective expertise here is still valid.
Hope you don't mind.
After indexing data from an Oracle DB using the standard analyzer, I am
using Luke (standardanalyzer) to query at the moment.
Two things I would check:
1) converting pubDate to String during indexing for later
date-range-filtering search results might not work well, because, e.g.,
string wise, "9" > "100". You could use Lucene's DateTools - there's an
example in TestDateFilter -
http://svn.apache.org/viewvc/lucene/ja
Hi Adrian,
I don't see anything obviously wrong with your code.
Can you give more details about which field values are different from
what you expect? I'm guessing it's the id field you're worried about,
but it's not clear from what you have written whether it's t
Hello,
I have problem with my search code - i try to index some data with
searching simultanously. Everything goes fine till some number of data
are indexed then my fields are bugged.
Eg. I have field with title indexed as "Nowitzki führt "Mavs" zum
ersten Heimsieg" and inner id "15" (not doc id,
Subject: Re: Null field values
When you indexed the fields you can't get back, did you use
Field.Store.YES?
I've been confused by the fact that Luke can "reconstruct" fields that
aren't stored, but are indexed
If that isn't the problem, perhaps you could post some
When you indexed the fields you can't get back, did you use Field.Store.YES?
I've been confused by the fact that Luke can "reconstruct" fields that
aren't stored, but are indexed
If that isn't the problem, perhaps you could post some code snippets.
Best
Erick
values no matter what I'm searching for.
Thanks,
Seeta
-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Friday, June 30, 2006 6:55 PM
To: java-user@lucene.apache.org
Subject: Re: Null field values
There is no requirement that every document contain values for
There is no requirement that every document contain values for every field.
Doc A could have fields z, y, x, and Doc B could have fields x, w, v. So,
when you say "some of the values are being returned as null", do you mean
that you *never* get any values for some field or you get values for a fie
Hi,
I indexed some XML files using Lucene. When I open up the index using
Luke, I can see that all the fields are stored correctly in the index.
But, when I try to grab the fields from the hits, after searching, some
of the values are being returned as null. Any suggestions about what
might be
57 matches
Mail list logo