Using Lucene to model ownership of documents

2016-06-15 Thread Geebee Coder
Hi there,
I would like to use Lucene to solve the following problem:

1.We have about 100k customers and we have 25 millions of documents.

2.When a customer performs a text search on the document space, we want to
return only documents that the customer has access to.

3.The # of documents a customer owns varies a lot. some have close to 23
million, some have close to 10k and some own a third of the documents etc.

What is an efficient way to use Lucene in this scenario in terms of
performance and indexing?
We have tried a number of solutions such as

 a)100k boolean fields per document that indicates whether a customer has
access to the document.
 b)A single text field that has a list of customers who owns the document
e.g. (customers field : "abc abd cfx...")
c) the above option with shards by customers

The search&index performance for a was bad. b,c performed better for search
but lengthened the time needed for indexing & index size.
We are also thinking about using a custom filter but we are concerned about
the memory requirements.

Any ideas/suggestions would be really appreciated.


Facet

2016-06-15 Thread Marcio Napoli
Hey!

The Lucene facets module uses integer encoding using the method "
FacetsConfig.dedupAndEncode " . It would be convenient to use the IntPoint ?

Thanks!
Marcio Napoli


Getting exception while initializing FSDirectory

2016-06-15 Thread Mukul Ranjan
Hi,

I'm getting below exception while initializing FSDirectory-

Caused by: java.lang.IllegalAccessError: tried to access method 
org.apache.lucene.store.MMapDirectory.unmapHackImpl()Ljava/lang/Object; from 
class org.apache.lucene.store.MMapDirectory$$dtt <@>
at java.lang.invoke.MethodHandleNatives.resolve(Native Method) <@>
at java.lang.invoke.MemberName$Factory.resolve(MemberName.java:962) <@>
at java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:987) <@>
at java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:1390) 
<@>
at 
java.lang.invoke.MethodHandles$Lookup.linkMethodHandleConstant(MethodHandles.java:1746)
 <@>
at 
java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(MethodHandleNatives.java:477)
 <@>


can anybody help to resolve this issue?

Thanks,
Mukul Ranjan

Visit eGain on YouTube and 
LinkedIn


Re: How to get the index for a document after a search over multiple indexes

2016-06-15 Thread Mark Shapiro
Thanks, I appreciate the useful info.  I can go with option 1.

Mark


How to prevent WordDelimiterFilter tokenize the string with underscore?

2016-06-15 Thread Xiaolong Zheng
Hi,

How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.

I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:

flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;


But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.

I am wondering how can I prevent it to tokenize the string with underscores?




Sincerely,

--Xiaolong


Re: How to prevent WordDelimiterFilter tokenize the string with underscore?

2016-06-15 Thread Ahmet Arslan
Hi,

You can supply custom types. 
please see WordDelimiterFilterFactory and wdfftypes.txt for an example.

ahmet


On Wednesday, June 15, 2016 10:32 PM, Xiaolong Zheng  
wrote:
Hi,

How can I prevent WordDelimiterFilter tokenize the string with underscore,
e.g. word_with_underscore.

I am using WordDelimiterFilter to create my own Camel Case analyzer, I was
using the configuration flag:

flags |= GENERATE_WORD_PARTS;
flags |= SPLIT_ON_CASE_CHANGE;
flags |= PRESERVE_ORIGINAL;


But I realize that one of the side effect for using the
SPLIT_ON_CASE_CHANGE is it also tokenize the string with underscore.

I am wondering how can I prevent it to tokenize the string with underscores?




Sincerely,

--Xiaolong

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org