quick survey on schema less database usage

2009-09-10 Thread rr04
I am a MIT student doing a project on schema-less database usage and would greatly appreciate if you guys can fill out a quick survey on this (should take < 5 mins) http://bit.ly/nosqldb -- View this message in context: http://www.nabble.com/quick-survey-on-schema-less-database-usage-tp25394429

Re: Index docstore flush problem

2009-09-10 Thread Jason Rutherglen
Indexing locking was off, there was a bug higher up clobbering the index. Sorry and thanks! On Thu, Sep 10, 2009 at 4:49 PM, Michael McCandless wrote: > That's an odd exception.  It means IndexWriter thinks 468 docs have > been written to the stored fields file, which should mean the fdx file >

Re: Index docstore flush problem

2009-09-10 Thread Michael McCandless
That's an odd exception. It means IndexWriter thinks 468 docs have been written to the stored fields file, which should mean the fdx file size is 3748 (= 4 + 468*8), yet the file size is far larger than that (298404). How repeatable is it? Can you turn on infoStream, get the exception to happen,

Index docstore flush problem

2009-09-10 Thread Jason Rutherglen
I'm seeing a strange exception when indexing using the latest Solr rev on EC2. org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: after flush: fdx size mismatch: 468 docs vs 298404 length in bytes of _0.fdx at or

Re: How to avoid huge index files

2009-09-10 Thread Ted Stockwell
Not at the moment. Actually, I'm already working on a remote copy utility for gaevfs that will upload large files and folders but the first cut is about a week away. - Original Message > From: Dvora > To: java-user@lucene.apache.org > Sent: Thursday, September 10, 2009 2:18:35 PM > Su

Re: How to avoid huge index files

2009-09-10 Thread Dvora
Is it possible to upload to GAE an already exist index? My index is data I'm collecting for long time, and I prefer not to give it up. ted stockwell wrote: > > Another alternative is storing the indexes in the Google Datastore, I > think Compass already supports that (though I have not used it

Re: Chinese Japanese Korean Indexing issue Version 2.4

2009-09-10 Thread asitag
TO add some more context - I am able to index english and Western european langauages. asitag wrote: > > Hi, > > We are trying to index html files which have japanese / korean / chinese > content using the CJK analyser. But while indexing we are getting Lexical > parse error. Encountered unko

Chinese Japanese Korean Indexing issue Version 2.4

2009-09-10 Thread asitag
Hi, We are trying to index html files which have japanese / korean / chinese content using the CJK analyser. But while indexing we are getting Lexical parse error. Encountered unkown character. We tried setting the string encoding to UTF 8 but it does not help. Can anyone please help. Any point

Re: Extending Sort/FieldCache

2009-09-10 Thread Jason Rutherglen
I think CSF hasn't been implemented because it's only marginally useful yet requires fairly significant rewrites of core code (i.e. SegmentMerger) so no one's picked it up including myself. An interim solution that fulfills the same function (quickly loading field cache values) using what works rel

Re: How to avoid huge index files

2009-09-10 Thread Ted Stockwell
Another alternative is storing the indexes in the Google Datastore, I think Compass already supports that (though I have not used it). Also, I have successfully run Lucene on GAE using GaeVFS (http://code.google.com/p/gaevfs/) to store the index in the Datastore. (I developed a Lucene Directory

Re: IndexReader.isCurrent for cached indexes

2009-09-10 Thread Nick Bailey
Our commit code will close the IndexWriter after adding the documents and before we see the log message indicating the documents have been added and deleted, so I don't believe that is the problem. Thanks for the tip about reopen. I actually noticed that when researching this problem but didn'

Re: Problem in lucene query

2009-09-10 Thread Erick Erickson
Also, get a copy of Luke and examine your index, that'll tell you what isactually in there *and* it will let you see how queries parse under various analyzers. Best Erick On Thu, Sep 10, 2009 at 6:47 AM, vibhuti wrote: > Hello > > > > I am new to Lucene and facing a problem while performing

MultiSearcherThread.hits(ParallelMultiSearcher.java:280) nullPointerException

2009-09-10 Thread maryam ma'danipour
Hello every . I have a problem with MultiSearcherThread.hits in ParallelMultiSearcher.java . Some times when I want to search via paralleMultiSearcher, the method MultiSearcherThread.hits() throws nullPointerException. this is because docs somehow has become null. but why this field is null. I've c

Re: Problem in lucene query

2009-09-10 Thread Anshum
Hi Vibhuti, Not in sync with your query, but I'd advice you to graduate you to a rather recent lucene release. Something like 2.4.1 or atleast a 2.3.1 [Considering its already time for 2.9]. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the

Re: Problem in lucene query

2009-09-10 Thread AHMET ARSLAN
> I am new to Lucene and facing a problem while performing > searches. I am using lucene 2.2.0. > > My application indexes documents on "keyword" field which > contains integer values. Which analyzer/tokenizer are you using on that field? I am assuming it is a tokenized field. >If the value is

RE: How to avoid huge index files

2009-09-10 Thread Dvora
Me again :-) I'm looking at the code of FSDirectory and MMapDirectory, and found that its somewhat difficult for to understand how should subclass the FSDirectory and adjust it to my needs. If I understand correct, MMapDirectory overrides the openInput() method and returns MultiMMapIndexInput if

Re: TooManyClauses by wildcard queries

2009-09-10 Thread Patricio Galeas
Hi Uwe But if I don't use Lucene 2.9, is this procedure (items 1-4) the right way to avoid the TooManyClauses exception? or is there a more efficients procedure to do that? Thanks Patricio Uwe Schindler schrieb: Or use Lucene 2.9, it automatically uses constant score mode in wild card queri

RE: September 2009 Hadoop/Lucene/Solr/UIMA/katta/Mahout Get Together Berlin

2009-09-10 Thread Uwe Schindler
Hi again, By the way, if somebody of the other involved developers want to provide me some PPT Slides about the other new features in Lucene 2.9 (NRT, future Flexible Indexing), I would be happy! Uwe > Uwe Schindler, Lucene 2.9 Developments: Numeric Search, Per-Segment- and > Near-Real-Time Sear

September 2009 Hadoop/Lucene/Solr/UIMA/katta/Mahout Get Together Berlin

2009-09-10 Thread Uwe Schindler
Hi, I cross-post this here, Isabel Drost is managing the meetup. This time it is more about Hadoop, but there is also a talk about the new Lucene 2.9 release (presented by me). As far as I know, Simon Willnauer will also be there: --

RE: How to avoid huge index files

2009-09-10 Thread Uwe Schindler
The idea is just to put a layer on top of the abstract file system function supplied by directory. Whenever somebody wants to create a file and write data to it, the methods create more than one file and switch e.g. after 10 Megabytes to another file. E.g. look into MMapDirectory that uses MMap to

Re: How to avoid huge index files

2009-09-10 Thread Dvora
Hi again, Can you add some details and guidelines how to implement that? Different files types have different structure, is such spliting doable without knowing Lucene internals? Michael McCandless-2 wrote: > > You're welcome! > > Another, bottoms-up option would be to make a custom Directory

Re: support for PayloadTermQuery in MoreLikeThis

2009-09-10 Thread Grant Ingersoll
On Sep 9, 2009, at 4:39 PM, Bill Au wrote: Has anyone done anything regarding the support of PayloadTermQuery in MoreLikeThis? Not yet! Sounds interesting I took a quick look at the code and it seems to be simply a matter of swapping TermQuery with PayloadTermQuery. I guess a generic s

Problem in lucene query

2009-09-10 Thread vibhuti
Hello I am new to Lucene and facing a problem while performing searches. I am using lucene 2.2.0. My application indexes documents on "keyword" field which contains integer values. If the value is negative the query does not return correct results. Following is my lucene query: (keywo

Re: How to avoid huge index files

2009-09-10 Thread Michael McCandless
You're welcome! Another, bottoms-up option would be to make a custom Directory impl that simply splits up files above a certain size. That'd be more generic and more reliable... Mike On Thu, Sep 10, 2009 at 5:26 AM, Dvora wrote: > > Hi, > > Thanks a lot for that, will peforms the experiments a

Re: How to avoid huge index files

2009-09-10 Thread Dvora
Hi, Thanks a lot for that, will peforms the experiments and publish the results. I'm aware to the risk of peformance degredation, but for the pilot I'm trying to run I think it's acceptable. Thanks again! Michael McCandless-2 wrote: > > First, you need to limit the size of segments initially

RE: New "Stream closed" exception with Java 6

2009-09-10 Thread Chris Bamford
Hi Hoss, I have been thinking more about what you said (below) - could you please expand on the indented part of this sentence: "it's possibly you just have a simple bug where you are closing the reader before you pass it to Lucene, or maybe you are mistakenly adding the same field twi

RE: TooManyClauses by wildcard queries

2009-09-10 Thread Uwe Schindler
Or use Lucene 2.9, it automatically uses constant score mode in wild card queries, if needed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Patricio Galeas [mailto:gal...@prometa.de] > Sent: Thursday, S

Re: How to avoid huge index files

2009-09-10 Thread Michael McCandless
First, you need to limit the size of segments initially created by IndexWriter due to newly added documents. Probably the simplest way is to call IndexWriter.commit() frequently enough. You might want to use IndexWriter.ramSizeInBytes() to gauge how much RAM is currently consumed by IndexWriter's

Re: IndexReader.isCurrent for cached indexes

2009-09-10 Thread Ian Lea
isCurrent() will only return true if there have been committed changes to the index. Maybe for some reason your index update job hasn't committed or closed the index. Probably not relevant to this problem, but your reopen code snippet doesn't close the old reader. It should. See the javadocs.

TooManyClauses by wildcard queries

2009-09-10 Thread Patricio Galeas
Hi all, I get the TooManyClauses exception by some wildcard queries like : (a) de* (b) country AND de* (c) ma?s* AND de* I'm not sure how to apply the solution proposed in LuceneFAQ for the case of WildcardQueries like the examples above. Can you confirm if it is the right procedure? 1. Over