ThreadLocal Transaction

2016-08-18 Thread Cristian Lorenzetto
I d like to create a class for creating a classical transaction. Overviewing lucene api , i can see commit/rollback/prepareCommit are just for the entire index not for partial modifications. So i thought i could to use api writer.addIndexes as support: when i open a transaction i could create a te

Re: encoding in byteref?

2016-08-18 Thread Cristian Lorenzetto
in 6.1.0 version BigIntegerPoint seams be moved in the mains module (no more in sandbox). However 1) BigIntegerPoint seams be a class for searching a 128integer not for sorting. NumericDocValuesField supports long not BigInteger. so I used for sorting SortedDocValuesField. 2) BigIntegerPoint name

MultiReader question

2016-08-18 Thread Cristian Lorenzetto
Ipothesis: i want split universe set to index in different subtopics/entities/subsets Considering for semplicity entity X is indexed in Index_X folder. Considering now i have a query when is searched X1,X2,X3 , for semplicity N is the number of sub readers. I could use MultiReader lucene api for

Re: MultiReader question

2016-08-18 Thread Adrien Grand
I don't think there would be significant disadvantages of using MultiReader, but depending what the data looks like, there might not be benefits either. If data is homogeneous per entity but not across entities then the MultiReader approach might have potential for making things more efficient, but

Re: ThreadLocal Transaction

2016-08-18 Thread Adrien Grand
What you are suggesting sounds like something that can already be done with IndexWriter.addDocuments? (note the final "s") This API ensures that all provided documents will become visible at the same time (and with adjacent doc ids moreover). Le jeu. 18 août 2016 à 10:52, Cristian Lorenzetto < cri

docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
docid is a signed int32 so it is not so big, but really docid seams not a primary key unmodifiable but a temporary id for the view related to a specific search. So repository can contains more than 2^31 documents. My deduction is correct ? is there a maximum size for lucene index?

Re: docid is just a signed int32

2016-08-18 Thread Adrien Grand
No, IndexWriter enforces that the number of documents cannot go over IndexWriter.MAX_DOCS (which is a bit less than 2^31) and BaseCompositeReader computes the number of documents in a long variable and ensures it is less than 2^31, so you cannot have indexes that contain more than 2^31 documents.

Re: docid is just a signed int32

2016-08-18 Thread Glen Newton
Or maybe it is time Lucene re-examined this limit. There are use cases out there where >2^31 does make sense in a single index (huge number of tiny docs). Also, I think the underlying hardware and the JDK have advanced to make this more defendable. Constructively, Glen On Thu, Aug 18, 2016 at

Re: ThreadLocal Transaction

2016-08-18 Thread Cristian Lorenzetto
it seams more complex the problem. I m trying to reasoning aload :) if i create a transaction using addDocuments... i suppose all the docs to persist inside transaction are before in memory. Not always is so. In addition for a complete isolation inside transaction , there is a aspect not working.

Re: docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
Maybe lucene has maxsize 2^31 because result set are java array where length is a int type. A suggestion for possible changes in future is to not use java array but Iterator. Iterator is a ADT more scalable , not sucking memory for returning documents. 2016-08-18 16:03 GMT+02:00 Glen Newton : >

help for a migration error to 6.1 version

2016-08-18 Thread Cristian Lorenzetto
in my old code i created public class BinDocValuesField extends Field { /** * Type for numeric DocValues. */ public static final FieldType TYPE = new FieldType(); static { TYPE.setTokenized(false); TYPE.setOmitNorms(true); TYPE.setIndexOptions(IndexOptions.DOCS); TYPE.setStored(true); TYPE.set

Re: docid is just a signed int32

2016-08-18 Thread Greg Bowyer
What are you trying to index that has more than 3 billion documents per shard / index and can not be split as Adrien suggests? On Thu, Aug 18, 2016, at 07:35 AM, Cristian Lorenzetto wrote: > Maybe lucene has maxsize 2^31 because result set are java array where > length is a int type. > A suggest

Re: docid is just a signed int32

2016-08-18 Thread Cristian Lorenzetto
normally databases supports at least long primary key. try to ask to twitter application , for example increasing every year more than 4 petabytes :) Maybe they use big storage devices bigger than a pc storage:) However If you offer a possibility to use shards ... it is a possibility anyway :) For

Re: help for a migration error to 6.1 version

2016-08-18 Thread Cristian Lorenzetto
using TYPE.setDocValuesType(DocValuesType.SORTED); it works. I didnt undestand the reasons. Maybe for for fast grouping is necessary maybe to sorting , so algo can find distinct groups 2016-08-18 17:40 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > in my old code > > i create

Re: docid is just a signed int32

2016-08-18 Thread Trejkaz
On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand wrote: > No, IndexWriter enforces that the number of documents cannot go over > IndexWriter.MAX_DOCS (which is a bit less than 2^31) and > BaseCompositeReader computes the number of documents in a long variable and > ensures it is less than 2^31, so y

How to recreate the Segments_ files on Lucence 3.2.0?

2016-08-18 Thread 郑文兴
Dear all, We suffered from a power lost and found that the Segments files were all in 0 bytes now, the worst is that we have no way to fix the index with the CheckIndex utility. Does anyone know any way to fix or re-create the segment files? By the way, I did a comparison with other index p

Re: docid is just a signed int32

2016-08-18 Thread Erick Erickson
OK, I'm a little out of my league here, but I'll plow on anyway bq: There are use cases out there where >2^31 does make sense in a single index Ok, let's put some definition to this and define the use-case specifically rather than be vague. I've just run an experiment for instance where I had