Re: How to store and retrieve latest utf8mb4 emoji / smiley characters in lucene index

2016-08-02 Thread Cristian Lorenzetto
hi i need a help. I want create a lucene query parser for adding evaluation of expression in lucene value1:>= value2 how to do it? for me it is sufficient also to work programatically creating a new Query Object making it 2016-08-01 19:36 GMT+02:00 Kumaran Ramasubramanian : > Hi All, > >

how to introduce a evaluator for comparing not only document with fixed values

2016-08-03 Thread Cristian Lorenzetto
hi i need a help. I want create a lucene query parser for adding evaluation of expression in lucene value1:>= value2 where value1 and value2 are 2 field names in the document. how to do it? for me it is sufficient also to work programatically creating a new Query Object making it

Re: how to introduce a evaluator for comparing not only document with fixed values

2016-08-03 Thread Cristian Lorenzetto
ompare the values of both fields, this is why we do > not have a query for it. The recommended approach in such cases is to index > a third field where the difference between value1 and value2 is computed at > index time. > > Le mer. 3 août 2016 à 14:42, Cristian Lorenzetto

Re: how to introduce a evaluator for comparing not only document with fixed values

2016-08-03 Thread Cristian Lorenzetto
i m thinkking another solution : it could also possible building additional info in relation to the query to use :). 2016-08-03 19:27 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > yes internally lucene could make a lot of things for reaching target: > - it co

encoding in byteref?

2016-08-09 Thread Cristian Lorenzetto
how to encode a short or a byte type in byteRef in lucene 6.1?

Re: encoding in byteref?

2016-08-10 Thread Cristian Lorenzetto
dless.com > > On Tue, Aug 9, 2016 at 10:12 AM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > > > how to encode a short or a byte type in byteRef in lucene 6.1? > > >

Re: encoding in byteref?

2016-08-10 Thread Cristian Lorenzetto
> throughput with a ShortPoint and BytePoint. > > But index size will be the same, because Lucene's default codec does a good > job compressing these values. > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Aug 10, 2016 at 5:19 AM, Cristian Lorenzetto

Re: encoding in byteref?

2016-08-10 Thread Cristian Lorenzetto
? 2016-08-10 11:35 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > ok thanks so i can do them. > but for boolean type? i could compress using bit. Is there pack function > for boolean arrays? > > 2016-08-10 11:25 GMT+02:00 Michael McCandless : > >> I

Re: encoding in byteref?

2016-08-10 Thread Cristian Lorenzetto
ere are only 8 > possible combinations. > > > > Le mer. 10 août 2016 à 11:41, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> a écrit : > > > in addition in the previous version of my code i used > > TYPE.setNumericPrecisionStep for setting the pr

lucene6 - search - confirmation

2016-08-10 Thread Cristian Lorenzetto
I need a clarification for building a Query Correct me if i m wrong. for searching a number i use int/longPoint. for ordering the results i use docvalue field for retrieving the value stored i use StoredField so if i want index,order and store a integer i need to add at least 3 fields 1 intpoint 1

Re: MultiFields#getTerms docs clarification

2016-08-12 Thread Cristian Lorenzetto
For type null I added a field with byte ref with a empty byte array. Maybe it will resolve ? Il 12/ago/2016 11:57 "Trejkaz" ha scritto: > Hi all. > > The docs on MultiFields#getTerms state: > > > This method may return null if the field does not exist. > > Does this mean: > > (a) The method *w

ThreadLocal Transaction

2016-08-18 Thread Cristian Lorenzetto
I d like to create a class for creating a classical transaction. Overviewing lucene api , i can see commit/rollback/prepareCommit are just for the entire index not for partial modifications. So i thought i could to use api writer.addIndexes as support: when i open a transaction i could create a te

Re: encoding in byteref?

2016-08-18 Thread Cristian Lorenzetto
eld("true"). > > See BigIntegerPoint in lucene's sandbox module. > > Mike McCandless > > http://blog.mikemccandless.com > > On Wed, Aug 10, 2016 at 6:16 AM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > > > thanks for suggestion

MultiReader question

2016-08-18 Thread Cristian Lorenzetto
Ipothesis: i want split universe set to index in different subtopics/entities/subsets Considering for semplicity entity X is indexed in Index_X folder. Considering now i have a query when is searched X1,X2,X3 , for semplicity N is the number of sub readers. I could use MultiReader lucene api for

docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
docid is a signed int32 so it is not so big, but really docid seams not a primary key unmodifiable but a temporary id for the view related to a specific search. So repository can contains more than 2^31 documents. My deduction is correct ? is there a maximum size for lucene index?

Re: ThreadLocal Transaction

2016-08-18 Thread Cristian Lorenzetto
uments? (note the final "s") This API ensures that all > provided documents will become visible at the same time (and with adjacent > doc ids moreover). > > Le jeu. 18 août 2016 à 10:52, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> a écrit : > > > I d l

Re: docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
ments in a long variable > and > > ensures it is less than 2^31, so you cannot have indexes that contain > more > > than 2^31 documents. > > > > Larger collections should be written to multiple shards and use > > TopDocs.merge to merge results. >

help for a migration error to 6.1 version

2016-08-18 Thread Cristian Lorenzetto
in my old code i created public class BinDocValuesField extends Field { /** * Type for numeric DocValues. */ public static final FieldType TYPE = new FieldType(); static { TYPE.setTokenized(false); TYPE.setOmitNorms(true); TYPE.setIndexOptions(IndexOptions.DOCS); TYPE.setStored(true); TYPE.set

Re: docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
ts." it is just a suggestion anyway for my loved lucene :):) 2016-08-18 17:43 GMT+02:00 Greg Bowyer : > What are you trying to index that has more than 3 billion documents per > shard / index and can not be split as Adrien suggests? > > > > On Thu, Aug 18, 2016, at 0

Re: help for a migration error to 6.1 version

2016-08-18 Thread Cristian Lorenzetto
using TYPE.setDocValuesType(DocValuesType.SORTED); it works. I didnt undestand the reasons. Maybe for for fast grouping is necessary maybe to sorting , so algo can find distinct groups 2016-08-18 17:40 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > in my old c

BinaryLink & delete hook

2016-08-19 Thread Cristian Lorenzetto
i d like add a new field in lucene permitting to save a link to a InputStream (on file) for adding/update document with this field is very simple. Maybe there is a problem with delete. If i deleteOnQuery or deleteAll . the esternal files are not deleted. Is there a callback or a hook for d

Re: docid is just a signed int32

2016-08-19 Thread Cristian Lorenzetto
ah :) "with 3TB of ram (we have these running), int64 for >2^32 documents in a single index should not be a problem" Maybe i m reasoning in bad way but normally the size of storage is not the size of memory. I dont know lucene in the deep, but i would aspect lucene index is scanning a block step

Re: docid is just a signed int32

2016-08-20 Thread Cristian Lorenzetto
For my opinion this study dont tell any thing more than before. Obviously if you try to retrieve all data store in a single query the performance will be not good. Lucene is fantastic But no magic. The physic laws continue to work also with lucene. The query is designed for retrieving a small pa

sorting biginteger

2016-08-21 Thread Cristian Lorenzetto
I took a look for bigInteger point but i didnt see no reference for sorting, and SortedNumericDocValuesField accept long not biginteger. I thought to sort so : BigInteger bi = (BigInteger) o; byte[] b = bi.toByteArray(); NumericUtils.bigIntToSortableBytes(bi, BigIntegerPoint.BYTES, b, 0); doc.ad

Re: docid is just a signed int32

2016-08-21 Thread Cristian Lorenzetto
i m overviewing TopDocs.merge. What is the difference to use multiple SearchIndexer and then to use TopDocs or to use MultiReader? 2016-08-21 2:28 GMT+02:00 Cristian Lorenzetto : > For my opinion this study dont tell any thing more than before. Obviously > if you try to retrieve all data

Re: docid is just a signed int32

2016-08-21 Thread Cristian Lorenzetto
maybe using TopDocs.merge you can the same query on multiple indexes, with multireader you can also to make join operation on different indexes 2016-08-21 19:31 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > i m overviewing TopDocs.merge. > > What is the di

info about private int processField(IndexableField field, long fieldGen, int fieldCount)

2016-08-22 Thread Cristian Lorenzetto
i m investigating about the update method for understanding if it is possible update a single field , not only all document the code try { for (IndexableField field : docState.doc) { fieldCount = processField(field, fieldGen, fieldCount); } seams to tell that is not so d

inverted table

2016-08-22 Thread Cristian Lorenzetto
i have a curiosity when a document is updated or deleted , lucene must update inverted table. considering the size of record about a specific term (before used) is changing ... how lucene resave term record? maybe flag the old term row as tombstone?

Searching in a bitMask

2016-08-26 Thread Cristian Lorenzetto
How it is possible to search in a bitmask for soddisfying a request as bitmask&0xf == 0xf ?

Re: Searching in a bitMask

2016-08-27 Thread Cristian Lorenzetto
you need to implement own MultyTermQuery, and I guess it's > gonna be slow. > > On Sat, Aug 27, 2016 at 8:41 AM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > > > How it is possible to search in a bitmask for soddisfying a re

Fields stored and instances

2016-09-07 Thread Cristian Lorenzetto
I have a doubt. I created a class storing a special value. When i store the document saving this field all ok. when i read the document the field is found but it is a different class (StoredField instead MyStoredField) it is ok so? or i saw wrong previously in other examples?

Re: Fields stored and instances

2016-09-07 Thread Cristian Lorenzetto
6-09-07 16:48 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > I have a doubt. > > I created a class storing a special value. > When i store the document saving this field all ok. > when i read the document the field is found but it is a different class >

Re: How can I list all the terms from a document?

2016-09-16 Thread Cristian Lorenzetto
interesting question :) 2016-09-16 13:30 GMT+02:00 szzoli : > Thank you. > > I am afraid that Lucene code is so poorly documented that only reading the > API I will not get closer to the solution. > > Could you please give me some more hints how to achieve that? Maybe some > code samples? > I do

suggestion for Lucene/Apache team

2016-09-25 Thread Cristian Lorenzetto
Lucene changed web applications in the last years. I created a bit revolution in many real commercial services. A lot of applications, services, servers based on lucene are born in the last years. The reason is simple. The modularity is the key for the computer science fast evolution. It permitted

PriorityQueue clarification

2017-02-17 Thread Cristian Lorenzetto
i want realize a priorityqueue not limited persistent (not all in memory) using lucene. I found on documemtation the class PriorityQueue. So i ask you clarifications: 1) PriorityQueue work all in memory or not? 2) if i develop on my own a class making a lucine storage where i search by priority and

how to search a tree node in lucene?

2017-03-04 Thread Cristian Lorenzetto
Hi i might implement a performant solution for searching a tree node, parent of a node , children of a node. a simple idea is save the path to specific node and searching prefix/suffix... Is there a specific solution in the lucene libraries for doing it?

Re: how to search a tree node in lucene?

2017-03-04 Thread Cristian Lorenzetto
ah right ... facet ? it isnt? 2017-03-05 1:32 GMT+01:00 Cristian Lorenzetto : > Hi > i might implement a performant solution for searching a tree node, parent > of a node , children of a node. > > a simple idea is > > save the path to specific node > and searching p

Re: how to search a tree node in lucene?

2017-03-05 Thread Cristian Lorenzetto
it is lucene or solr? I m using lucene. 2017-03-05 18:06 GMT+01:00 Erick Erickson : > PathHeirarchyTokenizer? > > On Sat, Mar 4, 2017 at 5:00 PM, Cristian Lorenzetto > wrote: > > ah right ... facet ? it isnt? > > > > 2017-03-05 1:32 GMT+01:00 Cristian L

join in lucene

2017-03-16 Thread Cristian Lorenzetto
I want realize multiple joins in lucene. Considering the complexy of joining i want i will skip index join. first strategy : Query-time joins String fromField = "from"; // Name of the from field boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has m

Re: join in lucene

2017-03-16 Thread Cristian Lorenzetto
er. but it just > usually slow. > > > On Thu, Mar 16, 2017 at 8:59 PM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > > > I want realize multiple joins in lucene. > > Considering the complexy of joining i want i will skip index join. > > >

search any field name having a specific value

2017-03-17 Thread Cristian Lorenzetto
it is possible create a query searching any document containing any field having value == X?

Re: search any field name having a specific value

2017-03-17 Thread Cristian Lorenzetto
> > Pearson > > Always Learning > Learn more at www.pearson.com <http://www.pearsonk12.com/> > > On Fri, Mar 17, 2017 at 11:05 AM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > > > it is possible create a query searching any document containing any field > > having value == X? > > >

is there a event before /post commit to file

2017-03-20 Thread Cristian Lorenzetto
lucene has a strategy for understadning then indexes files are not closed correctly or not? is there a way for saving a counter status when a commit is done so i can check if the maximun counter is equal to commit counter? i might insert this code after commit line.

how to rebuild a index corrupted?

2017-03-20 Thread Cristian Lorenzetto
lucene can rebuild index using his internal info and how ? or in have to reinsert all in other way?

how to rebuild a index corrupted / async commit problem

2017-03-22 Thread Cristian Lorenzetto
afraid it's not possible to rebuild index. It's important to >>>> maintain a >>>> backup policy because of that. >>>> >>>> >>>> On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto < >>>> cristian.lorenze...@gmail.com

Re: how to rebuild a index corrupted?

2017-03-23 Thread Cristian Lorenzetto
gt;> 2017-03-21 0:58 GMT+01:00 Michael McCandless : >> >>> You can use Lucene's CheckIndex tool with the -exorcise option but this >>> is quite brutal: it simply drops any segment that has corruption it detects. >>> >>> Mike McCandless >>&

Re: how to rebuild a index corrupted?

2017-03-23 Thread Cristian Lorenzetto
index i cant find a segment 5 , searching segment 4 and 6 i can understand the range of foreign keys (transaction ids) to reload in lucene. So i can load in lucene all the documents missing realoding them for example from a database. 2017-03-23 10:53 GMT+01:00 Cristian Lorenzetto

Re: how to rebuild a index corrupted?

2017-03-23 Thread Cristian Lorenzetto
6 i can find the minimum transaction id B so i can deduce the hole , the range is [A+1,B-1] ... making a query in db i reaload the corrisponding document and i add again in lucene this missing documents. 2017-03-23 15:28 GMT+01:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: >

Re: how to rebuild a index corrupted?

2017-03-23 Thread Cristian Lorenzetto
ons to "know" which operations made it into the commit and which did > not, and then on disaster recovery replay only those operations that didn't > make it? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Mar 23, 2017 at 5:53 AM, Cristian

Re: how to rebuild a index corrupted?

2017-03-23 Thread Cristian Lorenzetto
the corrupted segment will mean you don't drop the deletions. > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Mar 23, 2017 at 10:29 AM, Cristian Lorenzetto < > cristian.lorenze...@gmail.com> wrote: > >> I deduce the transaction range not us

a smart solution for calculating aggregate functions in lucene in a huge repository

2017-04-04 Thread Cristian Lorenzetto
is there new or particular classes /solutions in lucene for calculating aggregate functions in lucene in a huge repository?

Binary Automaton

2017-09-29 Thread Cristian Lorenzetto
Hi , it is possible to create a Automaton in lucene parsing not a string but a byte array?

Prefix field name search

2017-09-30 Thread Cristian Lorenzetto
Hi It there a way for searching all the documents where the field name starts with "ABC" and value is Y?

Re: Binary Automaton

2017-09-30 Thread Cristian Lorenzetto
*to @Uwe Schindler * thanks , it is very interesting :) *to @Dawid* Preface: I dont know how automaton is implemented deeply inside lucene , but (considering automaton is built on the fly when index is already present) i imagine that the automaton is scanning the lexicons/tokens present in th

Re: Binary Automaton

2017-10-02 Thread Cristian Lorenzetto
It sounds a good way :) Maybe the code to develop it is not so huge. Thanks for the suggestions :) 2017-10-02 12:27 GMT+02:00 Michael McCandless : > I'm not sure this is exactly what you are asking, but Lucene's terms are > already byte[] (default UTF-8 encoded from char[] terms), and the automat

typed IntPoint.RangeQuery & LongPoint.rangeQuery

2017-12-31 Thread Cristian Lorenzetto
Hi i have doubt. Considering i want search a document field 'a' in a range. The same problem is also if i search for a exact point. I know that it is possible a document containing either a property 'a' with type Integer either with type Long. For example document 1 contains {a:5} where 5 is int, d