Getting most frequent terms from single-token field values in a subset of Lucene documents

2017-08-28 Thread wilqor
s far from perfect, since in the worst case field values could be unique for each document, leading to high memory consumption. - using Facets API by adding a FacetField for each document field and utilizing FastTaxonomyFacetCounts for querying top N values. With this approach I am able to both f

Re: Indexing documents with multiple field values

2013-10-04 Thread Igor Shalyminov
Hi all! A little bit more of exploration:) After indexing with multiple atomic field values, here is what I get: indexSearcher.doc(0).getFields("gramm") stored,indexed,tokenized,termVector,omitNorms stored,indexed,tokenized,termVector,omitNorms stored,indexed,tokenized,termVector

Re: Indexing documents with multiple field values

2013-10-02 Thread Igor Shalyminov
tokens concatenated eventually? -- Igor 27.09.2013, 18:12, "Igor Shalyminov" : > Hello! > > I have really long document field values. Tokens of these fields are of the > form: word|payload|position_increment. (I need to control position increments > and payload manually.) &

Indexing documents with multiple field values

2013-09-27 Thread Igor Shalyminov
Hello! I have really long document field values. Tokens of these fields are of the form: word|payload|position_increment. (I need to control position increments and payload manually.) I collect these compound tokens for the entire document, then join them with a '\t', and then pass t

Multiple field values with the same position in the index

2013-02-28 Thread Igor Shalyminov
Hello! I'm thinking on a way of implementing the search with word ambiguity in Lucene. Say, a word "duck" appears in a document at the position 10. It has 2 Part-of-Speech tags: "Noun" and "Verb". And I want to recover this position both for POS:Noun and POS:Verb queries. So can you please point

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-31 Thread Michael McCandless
On Thu, Jan 31, 2013 at 7:07 AM, Gili Nachum wrote: > So, when loading the results I want to return (say 10 documents), if not > all docs fit in RAM, I would incur up to 10 individual disk seek > operations. Which will kill my performance. Is that correct? Yes, 10 seeks, and that may or may not

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-31 Thread Gili Nachum
Hi Mike, So, when loading the results I want to return (say 10 documents), if not all docs fit in RAM, I would incur up to 10 individual disk seek operations. Which will kill my performance. Is that correct? Considering what are my alternatives: 1. Create another separate lean index that would f

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-26 Thread Michael McCandless
Hi Gili, I responded last time you asked this: http://lucene.markmail.org/thread/svun5cdtgiy4hnjg Maybe you are not subscribed to the list? Mike McCandless http://blog.mikemccandless.com On Sat, Jan 26, 2013 at 7:45 AM, Gili Nachum wrote: > Hi, > > I have a search workload that focuses o

MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-26 Thread Gili Nachum
Hi, I have a search workload that focuses on two fields in my 1GB index. I get very good performance when loaded the index via MMapDirectory. I attribute this performance to the Operating System File System (FS OS) cache, that keeps the most recently used FS blocks RAM resident. *I would like to

Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-23 Thread Michael McCandless
Are the additional rarely used 48 fields used for searching? Or, for looking up stored fields? If it's for searching then you should see good locality (efficient use of the OS's IO cache) from the posting lists: each field's postings are stored in a single chunk of the files, then the next field'

MMapDirectory performance - Are searchable field values contiguously stored in FS block?

2013-01-23 Thread Gili Nachum
Hi, I have a search workload that focuses on two fields in my 1GB index. I get very good performance when loaded the index via MmapDirectory. I attribute this performance to the Operating System File System (FS OS) cache, that keeps the most recently used FS blocks RAM resident. I would like to

adding field values with count

2013-01-04 Thread Michael Sokolov
I have an indexer that already collapses field values into a Map of (value, count) before indexing, and I would like to specify an increment to frequency (docFreq?) when adding a field value to a Lucene Document. Should I just add the same value multiple times? -Mike

Re: Can I add new field values to a existing lucene index ?

2012-03-29 Thread Michael McCandless
On Wed, Mar 28, 2012 at 2:30 PM, Tim Eck wrote: > Excuse my ignorance of lucene internals, but is the problem any easier if > the requirement is just to allow the addition/removal of stored only fields > (as opposed to indexed)? It would substantially simplify the problem... but even this simplif

RE: Can I add new field values to a existing lucene index ?

2012-03-28 Thread Tim Eck
index with a new set of fields. -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, March 28, 2012 5:31 AM To: java-user@lucene.apache.org Subject: Re: Can I add new field values to a existing lucene index ? Alas, no, not yet. This is a

Re: Can I add new field values to a existing lucene index ?

2012-03-28 Thread Michael McCandless
, Anupam Bhattacharya wrote: > Does Lucene API allows to add new field values to a existing doc. > > For example, > Initially > >  Doc 1    FieldName Value  id 1  schoolname xyz  zipcode pqr > > > After update can I keep the existing field values and add 2 more fields. > &

RE: retrieved doc field values being cached?

2012-02-24 Thread Rose, Stuart J
[mailto:simon.willna...@googlemail.com] Sent: Friday, February 24, 2012 1:29 PM To: java-user@lucene.apache.org Subject: Re: retrieved doc field values being cached? Hey Stuart, Lucene solely relies on the FS cache with some exceptions for the term-dictionary and FieldCache which is pulled

Re: retrieved doc field values being cached?

2012-02-24 Thread Simon Willnauer
e: > > Lucene (using 3.5) seems to be caching field values for documents (after they > have been retrieved) and I am hoping someone can provide more information on > how and where exactly the field values are stored. > > The table below lists the times (in milliseconds) associated

retrieved doc field values being cached?

2012-02-24 Thread Rose, Stuart J
Lucene (using 3.5) seems to be caching field values for documents (after they have been retrieved) and I am hoping someone can provide more information on how and where exactly the field values are stored. The table below lists the times (in milliseconds) associated with retrieving for a set

Re: Stream field values

2009-07-15 Thread Günter Ladwig
Hi, thanks for your answer. I know about lazy loading fields, but my question is whether fields are always loaded as a whole or if it is possible in some way to stream a field's contents. Regards, Günter -- Dipl.-Inform. Günter Ladwig Institute AIFB, University of Karlsruhe, D-76128 Karlsru

Re: Stream field values

2009-07-15 Thread Günter Ladwig
Hi, thanks for your answer. I know about lazy loading fields, but my question is whether fields are always loaded as a whole or if it is possible in some way to stream a field's contents. Regards, Günter -- Dipl.-Inform. Günter Ladwig Institute AIFB, University of Karlsruhe, D-76128 Karlsru

Re: Stream field values

2009-07-14 Thread Grant Ingersoll
Have a look at the FieldSelector and the Lazy load capability. See http://www.lucidimagination.com/search/?q=FieldSelector for some pointers. -Grant On Jul 14, 2009, at 11:12 AM, Günter Ladwig wrote: Hi, I have a situation, where stored, un-indexed fields can contain potentially large amo

Stream field values

2009-07-14 Thread Günter Ladwig
Hi, I have a situation, where stored, un-indexed fields can contain potentially large amounts of data. Is it possibly to read the contents of a field incrementally? That is, do not load the complete contents from disk, but read X bytes at a time. Does the Reader returned by Field.readerVa

Re: Lucene boosting only on matching field values

2009-07-10 Thread Grant Ingersoll
Yes, see the Payload functionality and the BoostingTermQuery: http://www.lucidimagination.com/search/?q=Payload On Jul 9, 2009, at 6:42 PM, Eric Chu wrote: Hi all, I was wondering if there is any way to do a boost on the document based on which value is in a field matched by a query. ie,

Lucene boosting only on matching field values

2009-07-09 Thread Eric Chu
Hi all, I was wondering if there is any way to do a boost on the document based on which value is in a field matched by a query. ie, (Sample code below) - You have a document that contains 1 field with multiple values. - Field has value ABC boosted by 2.0 - Field has value XYZ boosted by 3.0 - I

Re: Optimal Solution for Unique Field Values

2009-02-15 Thread Chris Lu
site, (anonymous per request) got 2.6 Million Euro funding! On Sun, Feb 15, 2009 at 5:02 AM, Joel Halbert wrote: > Hi, > > I'm looking for an optimal solution for extracting unique field values. > The rub is that I want to be able to perform this for a unique subset of > documents...

Optimal Solution for Unique Field Values

2009-02-15 Thread Joel Halbert
Hi, I'm looking for an optimal solution for extracting unique field values. The rub is that I want to be able to perform this for a unique subset of documents...as per the example: I have an index with Field1 and Field2. I want "all unique values of Field1 where Field2=X". Othe

Re: distinct field values

2008-10-14 Thread Antony Bowesman
Akanksha Baid wrote: I have indexed multiple documents - each of them have 3 fields ( id, tag , text). Is there an easy way to determine the set of tags for a given query without iterating through all the hits? For example if I have 100 documents in my index and my set of tag = {A, B, C}. Query

Re: distinct field values

2008-10-14 Thread Anshum
You could go through this implementation. Have been using this (improvised) for a while now. There might be better ways to do so too. so you could check! http://www.gossamer-threads.com/lists/lucene/java-user/35704?search_string=categorycounts;#35704 -- Anshum Gupta Naukri Labs! http://ai-cafe.blo

Re: distinct field values

2008-10-14 Thread Khawaja Shams
Hi, You may also want to take a look at Carrot2: http://demo.carrot2.org/demo-stable/main Lucene documentation references them, but I was disappointed to see that they had an open source version (really old) and one that you can buy. It may work for you. Also, take a look at SOLR's implementatio

Re: distinct field values

2008-10-14 Thread Chris Hostetter
: For example if I have 100 documents in my index and my set of tag = {A, B, C}. : Query Q on the text field returns 15 docs with tag A , 10 with tag B and none : with tag C (total of 25 hits). Is there a way to determine that the set of : tags for query Q = {A, B} without iterating through all 25

Re: distinct field values

2008-10-14 Thread Akanksha Baid
Is there something I could do to Index the documents differently to accomplish this? Currently I am looking at all the hits to generate the set of tags for the query. If I need to implement the same thing within Lucene, I am not sure if I will gain anything performance wise. Or am I wrong about

Re: distinct field values

2008-10-14 Thread Anshum
Hi, You could try changing (or extending) TopFieldDocCollector and do your processing there (that is what I tried... and it worked fine). But that would mean changing lucene code a little bit. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody,

distinct field values

2008-10-14 Thread Akanksha Baid
I have indexed multiple documents - each of them have 3 fields ( id, tag , text). Is there an easy way to determine the set of tags for a given query without iterating through all the hits? For example if I have 100 documents in my index and my set of tag = {A, B, C}. Query Q on the text field r

Re: updating existing field values

2008-08-07 Thread Michael McCandless
, and I need to offset all dates based on change to stored time zone (subtract 12 hours from each value). The sort order of field values would not change, and the postings should not need to change, only values of fields. I do not believe there is any API to do it, but is there some lower lev

Re: updating existing field values

2008-08-07 Thread Andrzej Bialecki
from each value). The sort order of field values would not change, and the postings should not need to change, only values of fields. I do not believe there is any API to do it, but is there some lower level way to do it (modifying files manually)? I only ask because I have a large index and I don&#

updating existing field values

2008-08-07 Thread Robert Stewart
sort order of field values would not change, and the postings should not need to change, only values of fields. I do not believe there is any API to do it, but is there some lower level way to do it (modifying files manually)? I only ask because I have a large index and I don't want to re-

Re: search problem - not finding field values ending in "X"

2008-05-16 Thread Ulf Dittmer
D'oh! Of course - I'm using StandardAnalyzer. Changing to a PerFieldAnalyzerWrapper with a KeywordAnalyzer for that field fixes the issue. Thanks so much for fast response. Ulf --- Ian Lea <[EMAIL PROTECTED]> wrote: > Hi > > > I bet you are using an analyzer that is downcasing > isbn:00714

Re: search problem - not finding field values ending in "X"

2008-05-16 Thread Ian Lea
Hi I bet you are using an analyzer that is downcasing isbn:007149216X to isbn:007149216x. I've been there! Options include creating the query programmatically, using PerFieldAnalyzerWrapper, downcasing everything yourself in advance. Or convert to ISBN-13. -- Ian. On Fri, May 16, 2008 at 10

search problem - not finding field values ending in "X"

2008-05-16 Thread Ulf Dittmer
Hello- I'm experiencing a weird issue searching an index. The index has information about books, and one of the fields is the ISBN number. It is stored in the index in untokenized form to enable searches by ISBN. So a query like "isbn:0071490833" would return the Document for that book. But it doe

RE: Field values ...

2008-03-25 Thread Dragon Fly
Thanks. > Date: Mon, 24 Mar 2008 21:03:13 -0700 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: RE: Field values ... > > > : The Id and Phone fields are stored. So I can just do a MatchAllQuery as > : you suggested. I have read about field s

RE: Field values ...

2008-03-24 Thread Chris Hostetter
: The Id and Phone fields are stored. So I can just do a MatchAllQuery as : you suggested. I have read about field selectors on this mailing list : but have never used it. Does anyone know where I can find some sample : code? Thank you. there's a couple of reusable implementations in subver

RE: Field values ...

2008-03-24 Thread Dragon Fly
[EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: RE: Field values ... > > > : I want to do something like: > : > : List infoList = new ArrayList (); > : foreach (Document doc in LuceneIndex) > : { > :String id = doc.get ("Id&quo

RE: Field values ...

2008-03-22 Thread Chris Hostetter
: I want to do something like: : : List infoList = new ArrayList (); : foreach (Document doc in LuceneIndex) : { :String id = doc.get ("Id"); :String phone = doc.get ("Phone"); :infoList.add (new Info (id, phone)); : } If "Id" and "Phone" are stored value

RE: Field values ...

2008-03-20 Thread Dragon Fly
); } Thank you. > Date: Thu, 20 Mar 2008 10:05:17 -0400 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Field values ... > > See TermDocs/TermEnum. The trick is to start one of your enumerations > with "" (I forget exactly which), and

RE: Field values ...

2008-03-20 Thread Dragon Fly
)); } Thank you. > Date: Thu, 20 Mar 2008 10:05:17 -0400 > From: [EMAIL PROTECTED] > To: java-user@lucene.apache.org > Subject: Re: Field values ... > > See TermDocs/TermEnum. The trick is to start one of your enumerations > with "" (I forget exactly which), and that

Re: Field values ...

2008-03-20 Thread Erick Erickson
See TermDocs/TermEnum. The trick is to start one of your enumerations with "" (I forget exactly which), and that'll iterate them all. Best Erick On Thu, Mar 20, 2008 at 9:55 AM, Dragon Fly <[EMAIL PROTECTED]> wrote: > What's the easiest way to extract the values of 2 fields from each > document

Field values ...

2008-03-20 Thread Dragon Fly
What's the easiest way to extract the values of 2 fields from each document in the index. For example, each document has 5 fields: Id Name Address Phone Preference I'd like to extract the values for the Id and Phone fields for each document in the index. Thank you.

Re: Fwd: Unable to retreive 2/13 field values

2007-02-27 Thread Daniel Naber
On Tuesday 27 February 2007 19:21, Michael Barbarelli wrote: > GB821628930  (+VAT_reg:GB* doesn't work) What about VAT_reg:gb*? Also see QueryParser.setLowercaseExpandedTerms() Regards Daniel -- http://www.danielnaber.de - T

Fwd: Unable to retreive 2/13 field values

2007-02-27 Thread Michael Barbarelli
Hello. I'm using Lucene.NET, but would like to pose the question here in the Java group since I think the collective expertise here is still valid. Hope you don't mind. After indexing data from an Oracle DB using the standard analyzer, I am using Luke (standardanalyzer) to query at the moment.

Re: Lucene change field values to wrong ones when indexing

2006-12-14 Thread Doron Cohen
Two things I would check: 1) converting pubDate to String during indexing for later date-range-filtering search results might not work well, because, e.g., string wise, "9" > "100". You could use Lucene's DateTools - there's an example in TestDateFilter - http://svn.apache.org/viewvc/lucene/ja

Re: Lucene change field values to wrong ones when indexing

2006-12-14 Thread Steven Rowe
Hi Adrian, I don't see anything obviously wrong with your code. Can you give more details about which field values are different from what you expect? I'm guessing it's the id field you're worried about, but it's not clear from what you have written whether it's t

Lucene change field values to wrong ones when indexing

2006-12-14 Thread Java Programmer
Hello, I have problem with my search code - i try to index some data with searching simultanously. Everything goes fine till some number of data are indexed then my fields are bugged. Eg. I have field with title indexed as "Nowitzki führt "Mavs" zum ersten Heimsieg" and inner id "15" (not doc id,

RE: Null field values

2006-07-05 Thread Seeta Somagani
Subject: Re: Null field values When you indexed the fields you can't get back, did you use Field.Store.YES? I've been confused by the fact that Luke can "reconstruct" fields that aren't stored, but are indexed If that isn't the problem, perhaps you could post some

Re: Null field values

2006-07-01 Thread Erick Erickson
When you indexed the fields you can't get back, did you use Field.Store.YES? I've been confused by the fact that Luke can "reconstruct" fields that aren't stored, but are indexed If that isn't the problem, perhaps you could post some code snippets. Best Erick

RE: Null field values

2006-06-30 Thread Seeta Somagani
values no matter what I'm searching for. Thanks, Seeta -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Friday, June 30, 2006 6:55 PM To: java-user@lucene.apache.org Subject: Re: Null field values There is no requirement that every document contain values for

Re: Null field values

2006-06-30 Thread Erick Erickson
There is no requirement that every document contain values for every field. Doc A could have fields z, y, x, and Doc B could have fields x, w, v. So, when you say "some of the values are being returned as null", do you mean that you *never* get any values for some field or you get values for a fie

Null field values

2006-06-30 Thread Seeta Somagani
Hi, I indexed some XML files using Lucene. When I open up the index using Luke, I can see that all the fields are stored correctly in the index. But, when I try to grab the fields from the hits, after searching, some of the values are being returned as null. Any suggestions about what might be