Re: Facet DrillDown Exclusion

2016-12-06 Thread Shai Erera
Hey Matt, You basically don't need to use DDQ in that case. You can construct a BooleanQuery with a MUST_NOT clause for filter out the facet path. Here's a short code snippet: String indexedField = config.getDimConfig("Author").indexFieldName; // Find the field of the "Author" facet Query q = new

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-30 Thread Shai Erera
This feature is not available in Lucene currently, but it shouldn't be hard to add it. See Mike's comment here: http://blog.mikemccandless.com/2013/05/dynamic-faceting-with-lucene.html?showComment=1412777154420#c363162440067733144 One more tricky (yet nicer) feature would be to have it all in one

Re: Lucene 6.3 faceting documentation

2016-11-10 Thread Shai Erera
We've removed the userguide a long time ago. We have a set of example files under lucene-demo, e.g. here https://lucene.apache.org/core/6_3_0/demo/src-html/org/apache/lucene/demo/facet/ . Also, you can read some blog posts, start here: http://shaierera.blogspot.com/2012/11/lucene-facets-part-1.htm

Re: Faceting : what are the limitations of Taxonomy (Separate index and hierarchical facets) and SortedSetDocValuesFacetField ( flat facets and no sidecar index) ?

2016-11-10 Thread Shai Erera
Hi The reason IMO is historic - ES and Solr had faceting solutions before Lucene had it. There were discussions in the past about using the Lucene faceting module in Solr (can't tell for ES) but, sadly, I can't say I see it happening at this point. Regarding your other question, IMO the Lucene fa

Re: IndexWriter, DirectoryTaxonomyWriter & SearcherTaxonomyManager synchronization

2016-09-28 Thread Shai Erera
*> However, that should not lead to NSFE. At worst it should lead to> "ordinal is not known" (maybe as an AIOOBE) from the taxonomy reader.* That is correct, this interleaving indexing case can potentially result in an AIOOBE like exception during faceted search, when the facets that are in the "

Re: IndexWriter, DirectoryTaxonomyWriter & SearcherTaxonomyManager synchronization

2016-09-26 Thread Shai Erera
Hmm ... the commit part of the two indexes is always tricky. The javadocs are correct because the order of indexing is as follows: when you index a document with facets, the facets are first added to the taxonomy index and only then the document is indexed in IW. Therefore if you concurrently inde

Re: Clarification on LUCENE 4795 discussions ( Add FacetsCollector based on SortedSetDocValues )

2016-09-26 Thread Shai Erera
Hey, Here's a blog I wrote a couple years ago about using facet associations: http://shaierera.blogspot.com/2013/01/facet-associations.html. Note that the examples in the blog were written against a very old Lucene version (4.7 maybe). We have a couple of demo files that are maintained with the co

Re: Lucene Facets performance problems (version 4.7.2)

2016-02-26 Thread Shai Erera
True, but Erick's questions are still valid :-). We need more info to answer these questions. So Simona, the more info you can give us the better we'll be able to answer. On Fri, Feb 26, 2016, 10:54 Uwe Schindler wrote: > Hi Erick, > > this was a question about Lucene so "&debug=true" won't help

Re: how to backup lucene index file

2016-01-13 Thread Shai Erera
You should use Lucene's replicator module, which helps you take backups from live snapshots of your index, even while indexing happens. You can read about how to use it here: http://shaierera.blogspot.co.il/2013/05/the-replicator.html Shai On Wed, Jan 13, 2016, 19:14 Erick Erickson wrote: > Jus

Re: SOLR/LUCENE 5.2.1: Solution of CharTermAtt, StartOffset, EndOffset, Position

2015-08-07 Thread Shai Erera
I think you can just write a TokenFilter which sets the PositionIncrementAttribute of every other token to 0. Then you can use StandardTokenizer and wrap it with that filter. Shai On Aug 8, 2015 6:33 AM, "Văn Châu" wrote: > Hi, > > I'm looking a solution for the following format in solr/lucene 5

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Shai Erera
It deal with possible out of memory issue? > > > > I am thinking of using the same Database to store the merged indices. But > > the problem is the original sharded indices can be updated, when new > > entries come in. So the merged final indices also needs to be updated > &

Re: How to merge several Taxonomy indexes

2015-04-02 Thread Shai Erera
In some cases, MMapDirectory offers even better performance, since the JVM doesn't need to manage that RAM when it's doing GC. Also, using only RAMDirectory is not safe in that if the JVM crashes, your index is lost. On Thu, Apr 2, 2015 at 12:54 PM, Christoph Kaser wrote: > Hi Gimantha, > > why

Re: Filtering question

2015-03-11 Thread Shai Erera
I don't see that you use acceptDocs in your MyNDVFilter. I think it would return false for all userB docs, but you should confirm that. Anyway, because you use an NDV field, you can't automatically skip unrelated documents, but rather your code would look something like: for (int i = 0; i < reade

Re: Sampled Hit counts using Lucene Facets.

2015-03-11 Thread Shai Erera
prepare the Ranges > manually and pass them to LongRangeFacetsCounts. > > On Tue, Mar 10, 2015 at 4:54 PM, Shai Erera wrote: > > > I am not sure that splitting the ranges into smaller ranges is the same > as > > sampling. > > > > Take a look RandomSamplingFa

Re: Sampled Hit counts using Lucene Facets.

2015-03-10 Thread Shai Erera
I am not sure that splitting the ranges into smaller ranges is the same as sampling. Take a look RandomSamplingFacetsCollector - it implements sampling by sampling the document space, not the facet values space. So if for instance you use a LongRangeFacetCounts in conjunction with a RandomSamplin

Re: Faceted Search Hierarchy

2015-01-08 Thread Shai Erera
) > > Can Lucene internally index like above, as 'India' value already exist as > path of some other document ? > Or some other ways that can be explored within Lucene. > > > > On Thu, Jan 8, 2015 at 5:26 PM, Shai Erera wrote: > > > Lucene does not underst

Re: Faceted Search Hierarchy

2015-01-08 Thread Shai Erera
Lucene does not understand the word "India", therefore the facets that are actually indexed are: Doc1: Asia + Asia/India Doc2: India + India/Gujarat When you ask for top children, you will get Asia + India, both with a count of 1. Shai On Thu, Jan 8, 2015 at 1:48 PM, Jigar Shah wrote: > Very

Re: Facet Result Order

2014-12-14 Thread Shai Erera
Hi Mrugesh, This is strange indeed, as the facets are ordered by count, and we use a facet ordinal (integer code) as a tie breaker. What do you mean by "refreshed"? Do you have a sample test that shows this behavior? Shai On Fri, Dec 12, 2014 at 8:37 AM, patel mrugesh wrote: > > > Hi All, > I a

Re: Index replication strategy

2014-12-04 Thread Shai Erera
post, we use Lucene 4.2.1. > > On Thu, Dec 4, 2014 at 9:29 AM, Shai Erera wrote: > > > Do you use Lucene or Solr? Lucene also has a replication module, which > will > > allow you to replicate index changes. > > > > On Thu, Dec 4, 2014 at 4:19 PM, Vijay B wrote:

Re: Index replication strategy

2014-12-04 Thread Shai Erera
Do you use Lucene or Solr? Lucene also has a replication module, which will allow you to replicate index changes. On Thu, Dec 4, 2014 at 4:19 PM, Vijay B wrote: > Hello, > > We index docs coming from database nightly. Current index is sitting on > NFS. Due to obvious performance reasons, we are

Re: hierarchical facets

2014-11-25 Thread Shai Erera
Yes, hierarchical faceting in Lucene is only supported by the taxonomy index, at least currently. Shai On Tue, Nov 25, 2014 at 3:46 PM, Vincent Sevel wrote: > hi, > I saw that SortedSetDocValuesFacetCounts does not support hierarchical > facets. > Is that to say that hierarchical facets are onl

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Shai Erera
e not matched. > > And I have set hitpage =10 . > > > Thanks > Priyanka > > > On Mon, Oct 27, 2014 at 6:14 AM, Shai Erera wrote: > > > Hi > > > > Your question is a bit fuzzy -- what do you mean by not showing "low > > scores"? Are you

Re: Lucene not showing Low Score Doc

2014-10-27 Thread Shai Erera
Hi Your question is a bit fuzzy -- what do you mean by not showing "low scores"? Are you sure that these 2 documents are matched by the query? Can you boil it down to a short test case that demonstrates the problem? In general though, when you search through IndexSearch.search(Query, int), you wo

Re: Exception from FastTaxonomyFacetCounts

2014-10-15 Thread Shai Erera
lyAllDeletes=false) > > Will "IndexSearcher" and "TaxonomyReader" be in sync, in both > SearcherTaxonomyManager ? > > On Fri, Oct 10, 2014 at 12:08 AM, Shai Erera wrote: > > > This usually means that your IndexReader and TaxonomyReader are out of > &g

Re: Exception from FastTaxonomyFacetCounts

2014-10-09 Thread Shai Erera
This usually means that your IndexReader and TaxonomyReader are out of sync. That is, the IndexReader sees category ordinals that the TaxonomyReader does not yet see. Do you use SearcherTaxonomyManager in your application? It ensures that the two are always in sync, i.e. reopened together and that

Re: topdocs per facet

2014-10-09 Thread Shai Erera
The facets translation should be done at the application level. So if you index the dimension A w/ two facets A/A1 and A/A2, where A1 should also be translated to B1 and A2 translated to B2, there are several options: Index the dimensions A and B with their respective facets, and count the relevan

Re: Delete / Update facets from taxonomy index

2014-10-09 Thread Shai Erera
Hi You cannot remove facets from the taxonomy index, but you can reindex a single document and update its facets. This will add new facets to the taxonomy index (if they do not already exist). You do that just like you reindex any document, by calling IndexWriter.updateDocument(). Just make sure t

Re: FacetsConfig usage

2014-10-05 Thread Shai Erera
Hi The FacetsConfig object is the one that you use to index facets, and at search time it is consulted about the facets attributes (multi-valued, hierarchical etc.). You can make changes to the FacetsConfig, as long as they don't contradict the indexed data in a problematic manner. Usually the fa

Re: confused facet example

2014-09-30 Thread Shai Erera
Thanks Yonghui, I will commit a fix - need to initialize the example class before each example is run ! Shai On Tue, Sep 30, 2014 at 1:26 PM, Yonghui Zhao wrote: > > https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFac

Re: sortedset vs taxonomy

2014-09-27 Thread Shai Erera
Hi The taxonomy faceting approach maintains a sidecar index where it keeps the taxonomy and assigns an integer (ordinal) to each category. Those integers are encoded in a BinaryDocValues field for each document. It supports hierarchical faceting as well as assigning additional metadata to each fac

Re: document boost at lucene 4.8.1

2014-09-21 Thread Shai Erera
You can read some discussion here: http://search-lucene.com/m/Z2GP220szmS&subj=RE+What+is+equivalent+to+Document+setBoost+from+Lucene+3+6+inLucene+4+1+ . I wrote a post on how to achieve that with the new API: http://shaierera.blogspot.com/2013/09/boosting-documents-in-lucene.html. Shai On Sun,

Re: improve indexing speed with nomergepolicy

2014-08-14 Thread Shai Erera
gt; forceMerge(). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Shai Erera [mailto:ser...@gmail.com] > > Sent: Thursday, August 07, 2014

Re: Questions for facets search

2014-08-13 Thread Shai Erera
similar to how we store the payload :) We use an > integer as payload for each token, and store more complicated information > in another Lucene index with the integer payload as the key for each > document. > > Sheng > > On Wednesday, August 13, 2014, Shai Erera wrote: >

Re: Questions for facets search

2014-08-13 Thread Shai Erera
Sheng, I assume that you're using the Lucene faceting module, so I answer following that: (1) A document can be associated with many facet labels, e.g. Tags/lucene and Author/Shai. The way to extract all facet labels for a particular document is this: OrdinalsReader ordinals = new DocValuesOrd

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Shai Erera
looks like the MergePolicy is set > through IndexWriterConfig but I don't see a way to update an IWC on an > IW. > > Thanks, > > Jon > > > On Thu, Aug 7, 2014 at 7:37 AM, Shai Erera wrote: > > Using NoMergePolicy for online indexes is usually not recommende

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Shai Erera
Using NoMergePolicy for online indexes is usually not recommended. You want to use NoMP in case where you build an index in a batch job, then in the end before the index is "published" you run a forceMerge or maybeMerge (with a real MergePolicy). For online indexes, i.e. indexes that are being sea

Re: Sort, Search & Facets

2014-07-10 Thread Shai Erera
Hi Currently we do not provide the means to use a single SortedSetDVField for both faceting and sorting. You can add a SortedSetDVFacetField to a Document, then use FacetsConfig.build(), but that encodes all your dimensions under a single SSDV field. It's done for efficiency, since at search time,

Re: Incremental Field Updates

2014-07-02 Thread Shai Erera
------ > Thanks n Regards, > Sandeep Ramesh Khanzode > > > On Tuesday, July 1, 2014 9:53 PM, Shai Erera wrote: > > > > Except that Lucene now offers efficient numeric and binary DocValues > updates. See IndexWriter.updateNumeric/Binary... > > On

Re: Incremental Field Updates

2014-07-01 Thread Shai Erera
Except that Lucene now offers efficient numeric and binary DocValues updates. See IndexWriter.updateNumeric/Binary... On Jul 1, 2014 5:51 PM, "Erick Erickson" wrote: > This JIRA is "complicated", don't really expect it in 4.9 as it's > been hanging around for quite a while. Everyone would like th

Re: Lucene Facets Module 4.8.1

2014-06-23 Thread Shai Erera
ere any advantage of indexing some facets as not providing any > indexFieldName ? > > Thanks > > > > > On Mon, Jun 23, 2014 at 12:55 PM, Shai Erera wrote: > > > There is no sample code for doing that but it's quite straightforward - > if > > you know y

Re: Lucene Facets Module 4.8.1

2014-06-23 Thread Shai Erera
There is no sample code for doing that but it's quite straightforward - if you know you indexed some dimensions under different indexFieldNames, initialize a FacetCounts per such field name, e.g.: FastTaxoFacetCounts defaultCounts = new FastTaxoFacetCounts(...); // for your regular facets FastTaxo

Re: A question about FacetField constructor

2014-06-22 Thread Shai Erera
Reply wasn't sent to the list. On Jun 22, 2014 8:15 PM, "Shai Erera" wrote: > Can you post an example which demonstrates the problem? It's also > interesting how you count the facets, eg do you use a TaxonomyFacets object > or something else? > > Have yo

Re: Lucene Facets Module 4.8.1

2014-06-22 Thread Shai Erera
on 'CITY'. > > FastTaxonomyFacetCounts(String indexFieldName, TaxonomyReader taxoReader, > FacetsConfig config, FacetsCollector fc) throws IOException { > super(indexFieldName, taxoReader, config); > ... > } > > Thanks > Jigar Shah. > > > > On Sat, Ju

Re: A question about FacetField constructor

2014-06-22 Thread Shai Erera
What do you mean by does not index anything? Do you get an exception when you add a String[] with more than one element? You should probably call conf.setHierarchical(dimension), but if you don't do that you should receive an IllegalArgumentException telling you to do that... Shai On Sun, Jun 2

Re: Lucene Facets Module 4.8.1

2014-06-21 Thread Shai Erera
If you can, while in debug mode try to note the instance ID of the FacetsConfig, and assert it is indeed the same (i.e. indexConfig == searchConfig). Shai On Sat, Jun 21, 2014 at 8:26 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Are you sure it's the same FacetsConfig at search

Re: Lucene Facets Module 4.8.1

2014-06-20 Thread Shai Erera
How do you add facets to your documents? Did you play with the FacetsConfig, such as alter the field under which the CITY dimension is indexed? If you can reproduce this failure in a simple program, I guess it will be easy to spot the error. Looks like a configuration error to me... Shai On Fri

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Shai Erera
shows this count. > > I will check on a Linux box to make sure. Thanks, > > --- > Thanks n Regards, > Sandeep Ramesh Khanzode > > > On Tuesday, June 17, 2014 11:28 PM, Shai Erera wrote: > > > > Nothing suspicious ... code looks fine. The c

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Shai Erera
(1000, "F5")); > results.add(facets.getTopChildren(1000, "F6")); > results.add(facets.getTopChildren(1000, "F7")); > System.out.println("3. End Date: " + new Date()); > // Above part takes approx less than 1 second > ===

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
that way ... I look at e.g how doc-values are merged .. not sure it will improve performance. But if you want to cons up a patch, that'd be awesome! Shai On Tue, Jun 17, 2014 at 8:01 PM, Shai Erera wrote: > OK I think I now understand what you're asking :). It's unrelated thoug

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
7;t need any memory > > I was trying to get a heads-up on these 2 approaches. Please do let me know > if I have understood correctly > > -- > Ravi > > > > > On Tue, Jun 17, 2014 at 5:42 PM, Shai Erera wrote: > > > > > > > I am afraid the DocMap sti

Re: Facets in Lucene 4.7.2

2014-06-17 Thread Shai Erera
Execution: 11 seconds >Facet counts execution: < 1 second > >With 4.9M hits (1 different value for the 1 term): (Without > Flushing > Windows File Cache on Next run) > Query Execution: 2 seconds >Facet counts execu

Re: Facet migration 4.6.1 to > 4.7.0

2014-06-17 Thread Shai Erera
> > - we are extending FacetResultsHandler to change the order of the facet > results (i.e. date facets ordered by date instead of count). How can I > achieve this now? > Now everything is a Facets. In your case, since you use the taxonomy, it's TaxonomyFacets. You can check the class-hierarchy, w

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
I think lucene itself has a MergeIterator in o.a.l.util package. > > A MergePolicy can wrap a simple MergeIterator for iterating docs across > different AtomicReaders in correct sort-order for a given field/term > > That should be fine right? > > -- > Ravi > > -- > Ravi &

Re: SortingMergePolicy for already sorted segments

2014-06-17 Thread Shai Erera
sorted. > > I find this "loadSortTerm(compositeReader)" to be a bit heavy where it > tries to all load the doc-to-term mappings eagerly... > > Are there some alternatives for this? > > -- > Ravi > > > On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera wrote: >

Re: SortingMergePolicy for already sorted segments

2014-06-16 Thread Shai Erera
I'm not sure that I follow ... where do you see DocMap being loaded up front? Specifically, Sorter.sort may return null of the readers are already sorted ... I think we already optimized for the case where the readers are sorted. Shai On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan < rav

Re: Faceted Search User's Guide for Lucene 4.8.1

2014-06-16 Thread Shai Erera
#x27;ll help as much as I can with that too. Shai On Mon, Jun 16, 2014 at 7:15 PM, Nicola Buso wrote: > Hi Shai, > > I'm going to update from 4.6.1 to 4.8.1 :-( > > On Wed, 2014-06-11 at 14:05 +0300, Shai Erera wrote: > > Hi > > > > We remove

Re: Facets in Lucene 4.7.2

2014-06-16 Thread Shai Erera
rstand it, the > state is persisted to the disk. But this time, there are additional file > extensions like doc/pos/tim/tip/dvd/dvm, etc. I am not sure about this > difference and its cause. > > 5.] Does the RAMBufferSizeMB() control the commit intervals, so that when > the limit i

Re: Lucene 4.8.1 - Taxonomy

2014-06-16 Thread Shai Erera
Err ... are you sure there's an index in the directory that you point Luke at? I see that the exception points to "." which suggests the local directory from where Luke was run. There's nothing special about the taxonomy index, as far as Luke should concern. However, note that I do not recommend t

Re: Facets in Lucene 4.7.2

2014-06-14 Thread Shai Erera
use case? > > Please let me know. And, thanks! > > --- > Thanks n Regards, > Sandeep Ramesh Khanzode > > > On Friday, June 13, 2014 9:51 PM, Shai Erera wrote: > > > > Hi > > You can check the demo code here: > > https://svn.apache.org

Re: Facets in Lucene 4.7.2

2014-06-13 Thread Shai Erera
Hi You can check the demo code here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/. This code is updated with each release, so you always get a working code examples, even when the API changes. If you don't mind managing th

Re: Faceted Search User's Guide for Lucene 4.8.1

2014-06-11 Thread Shai Erera
Hi We removed the userguide long time ago, and replaced it with better documentation on the classes and package.html, as well as demo code that you can find here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_8/lucene/demo/src/java/org/apache/lucene/demo/facet/ You can also l

Re: Multi-thread indexing, should the commit be called from each thread?

2014-05-21 Thread Shai Erera
You don't need to commit from each thread, you can definitely commit when all threads are done. In general, you should commit only when you want to ensure the data is "safe" on disk. Shai On Wed, May 21, 2014 at 2:58 PM, andi rexha wrote: > Hi! > I have a question about multi-thread indexing.

Re: best choice for ramBufferSizeMB

2014-05-14 Thread Shai Erera
Well, first make sure that you set ramBufferSizeMB to well below the max Java heap size, otherwise you could run into OOMs. While a larger RAM buffer may speed up indexing (since it flushes less often to disk), it's not the only factor that affects indexing speed. For instance, if a big portion o

Re: Fields, Index segments and docIds (second Try)

2014-05-02 Thread Shai Erera
You don't need to do that in parallel to all indexes, unless it's more convenient for you. Shai On Fri, May 2, 2014 at 9:28 AM, Olivier Binda wrote: > On 05/02/2014 06:05 AM, Shai Erera wrote: > >> If you're always rebuilding, let alone forceMerge, you shouldn'

Re: Fields, Index segments and docIds (second Try)

2014-05-01 Thread Shai Erera
n May 2, 2014 1:57 AM, "Olivier Binda" wrote: > On 05/01/2014 10:28 AM, Shai Erera wrote: > >> I'm glad it helped you. Good luck with the implementation. >> > > Thanks. First I started looking at the lucene internal code. To understand > when/where and why

Re: Fields, Index segments and docIds (second Try)

2014-05-01 Thread Shai Erera
index. Or, if rebuilding all indexes won't take long, you can always rebuild all of them. Shai On Thu, May 1, 2014 at 12:00 AM, Olivier Binda wrote: > On 04/30/2014 10:48 AM, Shai Erera wrote: > >> I hope I got all the details right, if I didn't then please clarify. A

Re: Fields, Index segments and docIds (second Try)

2014-04-30 Thread Shai Erera
I hope I got all the details right, if I didn't then please clarify. Also, I haven't read the entire thread, so if someone already suggested this ... well, it probably means it's the right solution :) It sounds like you could use Lucene's ParallelCompositeReader, which already handles multiple Ind

Re: No Compound Files

2014-04-29 Thread Shai Erera
NoMP means no merges, and indeed it seems silly that NoMP distinguishes between compound/non-compound settings. Perhaps it's rooted somewhere in the past, I don't remember. I checked and IndexWriter.addIndexes consults MP.useCompoundFile(segmentInfo) when it adds the segments. But maybe NoMP.useCo

Re: No Compound Files

2014-04-29 Thread Shai Erera
The problem is that compound files settings are split between MergePolicy and IndexWriterConfig. As documented on IWC.setUseCompoundFile, this setting controls how new segments are flushed, while the MP setting controls how merged segments are written. If we only offer NoMP.INSTANCE, what would it

Re: Getting multi-values to use in filter?

2014-04-29 Thread Shai Erera
s. I think the best way to solve this is to encode > the number of values as first entry in the BDV. This is not that hard so I > will take this road. > > -Rob > > > > Op 27 apr. 2014 om 21:27 heeft Shai Erera het > volgende geschreven: > > > > Hi Rob, > &g

Re: Getting multi-values to use in filter?

2014-04-27 Thread Shai Erera
2014 at 1:20 PM, Shai Erera wrote: > I don't think that you should use the facet module. If all you want is to > encode a bunch of numbers under a 'foo' field, you can encode them into a > byte[] and index them as a BDV. Then at search time you get the BDV and > deco

Re: Getting multi-values to use in filter?

2014-04-24 Thread Shai Erera
etSum*Associations > would need to do this for all fields that I need facet counts/sums for. > > What do you think? > > -Rob > > > On Wed, Apr 23, 2014 at 5:13 PM, Shai Erera wrote: > > > A NumericDocValues field can only hold one value. Have you thought about >

Re: Getting multi-values to use in filter?

2014-04-23 Thread Shai Erera
ache.LongParser. These parsers only seem te parse one field. > > Is there an efficient way to get -all- of the (numeric) values for a field > in a document? > > > On Wed, Apr 23, 2014 at 4:38 PM, Shai Erera wrote: > > > You can do that by writing a Filter which returns matchin

Re: Getting multi-values to use in filter?

2014-04-23 Thread Shai Erera
You can do that by writing a Filter which returns matching documents based on a sum of the field's value. However I suspect that is going to be slow, unless you know that you will need several such filters and can cache them. Another approach would be to write a Collector which serves as a Filter,

Re: IndexReplication Client and IndexWriter

2014-04-15 Thread Shai Erera
>> (LUCENE-5438), InfosRefCounts (weird name), whose purpose is to do >> what IndexFileDeleter does for IndexWriter, ie keep track of which >> files are still referenced, delete them when they are done, etc. This >> could used on the client side to hold a lease for another client. >>

Re: NRT facet issue (bug?), hard to reproduce, please advise

2014-04-11 Thread Shai Erera
Hi I am not sure how more than one client_no field ends up w/ a document, and I'm not sure it's related to the taxonomy at all. However, looking at the code example you pasted above, and since you mention that you index+commit in one thread, while another thread does the reopen, I wonder if that'

Re: IndexReplication Client and IndexWriter

2014-04-08 Thread Shai Erera
IndexRevision uses the IndexWriter for deleting unused files when the revision is released, as well as to obtain the SnapshotDeletionPolicy. I think that you will need to implement two things on the "client" side: * Revision, which doesn't use IndexWriter. * Replicator which keeps track of how ma

Re: Replicator: how to use it?

2014-03-20 Thread Shai Erera
, then close() should not create a new commit point. Do you see that it does? Shai On Wed, Mar 19, 2014 at 11:09 PM, Roberto Franchini wrote: > On Sat, Mar 15, 2014 at 12:56 PM, Roberto Franchini > wrote: > > On Sat, Mar 15, 2014 at 12:47 PM, Shai Erera wrote: > >> If you

Re: Replicator: how to use it?

2014-03-15 Thread Shai Erera
If you use LocalReplicator on both sides, you have to use the same instance on both sides. Otherwise the replicas will never see the published revisions the which are done in a separate instance. Can you try that? Shai On Mar 15, 2014 1:10 PM, "Roberto Franchini" wrote: > On Sat, Mar 15, 2014 at

Re: Few questions on updatable DocValues

2014-03-15 Thread Shai Erera
Double fields can be implemented today over NumericDVField and therefore already support updates. String can be implemented on Sorted/SortedSetDVField, but not updates for them yet. I hope that once I'm done w/ LUCENE-5513, adding update support for Sorted/SortedSet will be even easier. Shai On

Re: Few questions on updatable DocValues

2014-03-14 Thread Shai Erera
Hi 1. Is it possible to provide updateNumericDocValue(Term term, > Map), incase I wish to update multiple-fields and it's > doc-values? > For now you can call updateNDV multiple times, each time w/ a new field. Under the covers, we currently process each update separately anyway. I think in order

Re: Adding custom weights to individual terms

2014-02-13 Thread Shai Erera
I often prefer to manage such weights outside the index. Usually managing them inside the index leads to problems in the future when e.g the weights change. If they are encoded in the index, it means re-indexing. Also, if the weight changes then in some segments the weight will be different than ot

Re: Actual min and max-value of NumericField during codec flush

2014-02-12 Thread Shai Erera
"adjacency" by "size", whereas it would be better > if "timestamp" is used in my case > > Sure, I need to wrap this in an SMP to make sure that the newly-created > segment is also in sorted-order > > -- > Ravi > > > > On Wed, Feb 12,

Re: Actual min and max-value of NumericField during codec flush

2014-02-12 Thread Shai Erera
Why not use LogByteSizeMP in conjunction w/ SortingMP? LogMP picks adjacent segments and SortingMP ensures the merged segment is also sorted. Shai On Wed, Feb 12, 2014 at 3:16 PM, Ravikumar Govindarajan < ravikumar.govindara...@gmail.com> wrote: > Yes exactly as you have described. > > Ex: Cons

Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
not about lack of creativity, I might have not explained you in the > proper way :) > > Thank you for all the support :) > > > On Tue, Feb 11, 2014 at 12:23 AM, Shai Erera wrote: > > > What you want sounds like grouping more like faceting? > > > > So

Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
documents result first and then category wise, > suppose 2 documents by the same Author etc > > As per my requirement, I am doing DrillDown Search by asking the user to > provide such as title of the docment, author of the document, etc... as > advanced search option. > > --

Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
e same category > from the FacetResult Object also. > > I hope you will understand my question :) > > Thank you :) > > -- > Jebarlin > > > > On Mon, Feb 10, 2014 at 9:09 PM, Shai Erera wrote: > > > Hi > > > > You will need to build a BooleanQue

Re: Regarding DrillDown search

2014-02-10 Thread Shai Erera
indly Guide me :) > > Thank you for All your Support. > > Regards, > Jebarlin.R > > > On Mon, Feb 10, 2014 at 1:28 PM, Shai Erera wrote: > > > Hi > > > > If you want to drill-down on first name only, then you have several > > options: > > >

Re: Regarding DrillDown search

2014-02-09 Thread Shai Erera
Hi If you want to drill-down on first name only, then you have several options: 1) Index Author/First, Author/Last, Author/First_Last as facets on the document. This is the faster approach, but bloats the index. Also, if you index the author Author/Jebarlin, Author/Robertson and Author/Jebarlin_R

Re: Regarding CorruptedIndexException in using Lucene Facet Search

2014-02-07 Thread Shai Erera
r.java:2034) > > 02-07 12:38:11.006: W/System.err(5411): at > > > com.example.lucene.threads.AsyncIndexWriter.addDocumentSynchronous(AsyncIndexWriter.java:343) > > 02-07 12:38:11.006: W/System.err(5411): at > > > com.example.lucene.threads.AsyncIndexWriter.addDocume

Re: Regarding CorruptedIndexException in using Lucene Facet Search

2014-02-06 Thread Shai Erera
It looks like something's wrong with the index indeed. Are you sure you committed both the IndexWriter and TaxoWriter? Do you have some sort of testcase / short program which demonstrates the problem? I know there were few issues running Lucene on Android, so I cannot guarantee it works fully .. w

Re: updating docs when using SortedSetDocValuesFacetFields

2014-01-22 Thread Shai Erera
Note that Lucene doesn't support general in-place document updates, and updating a document means first deleting it and adding it back. Therefore if you only intend to add/change few categories of an existing document, you have to fully re-index the document. This is not specific to categories but

Re: Issue with FacetFields.addFields() throwing ArrayIndexOutOfBoundsException

2014-01-17 Thread Shai Erera
ave > reproduces it very quickly, Only have to index ~330K docs. > > > On Fri, Jan 17, 2014 at 3:27 PM, Shai Erera wrote: > > > Do you have a test which reproduces the error? Are you adding categories > > with very deep hierarchies? > > > > Shai > > > &

Re: Issue with FacetFields.addFields() throwing ArrayIndexOutOfBoundsException

2014-01-17 Thread Shai Erera
Do you have a test which reproduces the error? Are you adding categories with very deep hierarchies? Shai On Fri, Jan 17, 2014 at 11:59 PM, Matthew Petersen wrote: > I've confirmed that using the LruTaxonomyWriterCache solves the issue for > me. It would appear there is in fact a bug in the Cl

Re: Index + Taxonomy Replication

2013-11-01 Thread Shai Erera
Opened https://issues.apache.org/jira/browse/LUCENE-5320. Shai On Fri, Nov 1, 2013 at 4:59 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Fri, Nov 1, 2013 at 3:12 AM, Shai Erera wrote: > > > Maybe we should offer such a ReferenceManager (ma

Re: Index + Taxonomy Replication

2013-11-01 Thread Shai Erera
SearcherTaxonomyManager can be used only for NRT, as it only takes an IndexWriter and DirectoryTaxonomyWriter. And I don't think you want to keep those writers open on the slaves side. I think that a ReferenceManager, which returns a SearcherAndTaxonomy, is the right thing to do. The reason why we

Re: Merging ordered segments without re-sorting.

2013-10-23 Thread Shai Erera
is that SortingMergePolicy performs sorting after > wrapping the 2 segments, correct? > > As I mentioned in my original email I would like to avoid the re-sorting > and exploit the fact that the input segments are already sorted. > > > > On Wed, Oct 23, 2013 at 11:02

Re: Merging ordered segments without re-sorting.

2013-10-23 Thread Shai Erera
Hi You can use SortingMergePolicy and SortingAtomicReader to achieve that. You can read more about index sorting here: http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html Shai On Wed, Oct 23, 2013 at 8:13 PM, Arvind Kalyan wrote: > Hi there, I'm looking for pointers, suggesti

Re: external file stored field codec

2013-10-17 Thread Shai Erera
> > The codec intercepts merges in order to clean up files that are no longer > referenced > What happens if a document is deleted while there's a reader open on the index, and the segments are merged? Maybe I misunderstand what you meant by this statement, but if the external file is deleted, sin

Re: Huge FacetArrays while using SortedSetDocValuesAccumulator

2013-08-28 Thread Shai Erera
Oops you're right, it was committed in LUCENE-4985 which will be released in Lucene 4.5. Shai On Wed, Aug 28, 2013 at 6:16 PM, Krishnamurthy, Kannan < kannan.krishnamur...@contractor.cengage.com> wrote: > Thanks for the response. I double checked that > SortedSetDocValuesAccumulator doesn't tak

  1   2   3   4   >