Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-22 Thread findbestopensource
Just updated my view in the article.. Feel free to add your comments.. http://www.findbestopensource.com/article-detail/lucene-solr-as-nosql-db Regards Aditya www.findbestopensource.com On Mon, May 21, 2012 at 2:25 PM, Shashi Kant wrote: > A related thread on Stackoverflow: > > http://stackov

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-20 Thread findbestopensource
Hi, Lucene is not a data store. You should store data in file system / DB and store only the reference key and data related to display summary results as part of Lucene. Usually in most application, once the search is performed list of search results with just few information will be displayed. O

Re: old fashioned....."Too many open files"!

2012-05-17 Thread findbestopensource
Post complete code. You are not closing the objects (IndexWriter / Index Searcher) properly. Regards Aditya www.findbestopensource.com On Fri, May 18, 2012 at 6:51 AM, Michel Blase wrote: > Hi all, > > I have few problems Indexing. I keep hitting "Too many open files". It > seems like Lucene i

Re: date issues

2012-02-23 Thread findbestopensource
Yes. By storing as String, You should be able to do range search. I am not sure, which is better, storing as String / Integer. Regards Aditya www.findbestopensource.com On Thu, Feb 23, 2012 at 1:25 PM, Jason Toy wrote: > Can I still do range searches on a string? It seems like it would be m

Re: date issues

2012-02-22 Thread findbestopensource
Hi, You could consider storing date field as String in "MMDD" format. This will save space and it will perform better. Regards Aditya www.findbestopensource.com On Thu, Feb 23, 2012 at 11:55 AM, Jason Toy wrote: > I have a solr instance with about 400m docs. For text searches it is > per

Re: Is Lucene a good candidate for a Google-like search engine?

2012-01-16 Thread findbestopensource
Check out the presentation. http://java.dzone.com/videos/archive-it-scaling-beyond Web archive uses Lucene to index billions of pages. Regards Aditya www.findbestopensource.com On Fri, Jan 13, 2012 at 4:31 PM, Peter K wrote: > yes and no! > google is not only the search engine ... > > > Just c

Large data set or data corpus

2012-01-11 Thread findbestopensource
Hello all, Recently i saw couple of discussions in LinkedIn group about generating large data set or data corpus. I have compiled the same in to an article. Hope it would be helpful. If you have any other links where we could get large data set for free, please reply to this mail thread, i will up

Re: Remoting Lucene

2012-01-09 Thread findbestopensource
Hi, One good option is to consider using Solr as it helps to access the index remotely. If you want to use Lucene and you are ready to build your own API then you could have a web application, which will receive user query, search in the index and return the result set in user expected fashion. Y

Re: Indexing product keys with and without spaces in them

2012-01-03 Thread findbestopensource
Hi Christoph My opinion is, you should not normalize or do any modification to the product keys. This should be unique. Should be used as it is. Instead of spaces you should have only used "-" but since the product already out in the market, it cannot help. In your UI, You could provide multiple

Re: Searching for Empty Field

2011-07-14 Thread findbestopensource
Hi Jason, The easiest way would be to set some default value for the field which is empty, Say EMPTY and search for this string to check out the records having empty field. Regards Aditya www.findbestopensource.com On Fri, Jul 15, 2011 at 5:32 AM, Trieu, Jason T wrote: > Hi all, > > I read pos

Re: Concurrent Issue

2011-04-06 Thread findbestopensource
You are trying to access the reader which is already closed by some other thread. 1. Keep a reference count for the reader you create. 2. Have a common function through which all functions will retrieve Reader objects 3. Once the index got changed, create a new reader, do warmup 4. When the new re

Re: Concurrent Issue

2011-04-06 Thread findbestopensource
You might have closed the IndexReader object but trying to access the search results. Regards Aditya www.findbestopensource.com On Tue, Apr 5, 2011 at 5:26 PM, Yogesh Dabhi wrote: > Hi > > > > My application is cluster in jobss application servers & lucene > directory was shared. > > > > Conc

Re: Indexation takes a lot of time :(

2011-04-06 Thread findbestopensource
Hello daniel, The code seems to be fine. I think you are calculating the time for entire program which may read the data from external source and prepare the array list. Just calculate time only for indexing. Regards Aditya www.findbestopensource.com On Wed, Apr 6, 2011 at 2:38 PM, ZYWALEWSKI,

Re: Converting an existing index format to Lucene Index

2011-02-24 Thread findbestopensource
Hello Lokendra, You could updates frequently. Anyway i think it is one time job. My advice would be do insertion and updates in batch. 1. Parse your file and read 1000 lines 2. Do some aggregation and insert / update with lucene. Regards Aditya www.findbestopensource.com On Fri, Feb 25, 2011

Re: Multi Index Search Query

2011-02-15 Thread findbestopensource
I don't think so, you could combine the queries. You are first searching Index A and the results are given as input to Index B. You cannot combine the queries and you cannot use multi searcher or parallel multi searcher. You need to search two indexes independently and sequentially. Regards Aditya

Re: Search requires too long search term

2011-02-13 Thread findbestopensource
You may need to use ngrams. http://lucene.apache.org/java/3_0_3/api/all/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html Another option would be doing wildcard query without enabling leading wildcard search. search for cr* and not *cr* as the auto suggest feature should give suggestion f

Re: outlook MSG file text extraction tool?

2011-02-03 Thread findbestopensource
Here are few projects tagged text-extraction. http://www.findbestopensource.com/tagged/text-extraction I am not sure, If any product actually extract content from msg files. But take a look. On Fri, Feb 4, 2011 at 5:33 AM, Zhang, Lisheng < lisheng.zh...@broadvision.com> wrote: > > Hi, > > Do you

Re: How to parse & index different portions of an HTML page using Tika & Lucene ?

2011-01-10 Thread findbestopensource
Your problem is more with tika. Pls post in tika user group. If you want to deal with only HTML then better use html parser. http://www.findbestopensource.com/search/?query=%22html+parser%22 On Tue, Jan 11, 2011 at 7:24 AM, amg qas wrote: > I have been trying to parse & index different portion

Re: Re: Scale up design

2010-12-22 Thread findbestopensource
>>Do I need to compile the Lucene and analyzer code in 64 bit JVM? You don't need to compile. Just drop your jars in 64 Bit JVM in 64 Bit OS. Regards Aditya www.findbestopensource.com On Wed, Dec 22, 2010 at 1:07 PM, Ganesh wrote: > Thanks. I going to try in 64 bit. I will post some update in

Re: Simple search question

2010-11-02 Thread findbestopensource
r searched words. The user should not be thinking about...just doing > it. > > Dirk > > On Tue, 2 Nov 2010 20:00:08 +0530, findbestopensource > wrote: > > Yes. Correct. It would be good, If User inputs the search string with *. > > > > My Idea is to index two fiel

Re: Simple search question

2010-11-02 Thread findbestopensource
Yes. Correct. It would be good, If User inputs the search string with *. My Idea is to index two fields separately first name and last name. Provide two text boxes with first name and last name. Leave the rest to the User. Regrads Aditya www.findbestopensource.com On Tue, Nov 2, 2010 at 7:44 P

Re: filtering results per field?

2010-11-02 Thread findbestopensource
Hello Doing single search with multiple filters will give faster results. Doing search per field (multiple saerch) and combining the results is a bad idea. Regards Aditya www.findbestopensource.com On Mon, Nov 1, 2010 at 11:02 PM, Francisco Borges < francisco.bor...@gmail.com> wrote: > Hello,

Re: Can we just update one field of a document in a lucene index, and leave other fields along?

2010-09-09 Thread findbestopensource
Hi fulin, It is not possible. You need to add / update as a document. Even if you modify a single field, you need to add all the fields. Update is nothing but Delete and Add operation. If you don't have the information of rest of the fields then you may need to search and retreive the document, m

Re: asking about incremental update

2010-08-19 Thread findbestopensource
Hi jacobian, Lucene will not do incremental update by iteself. Lucene is just a library. Your app should periodically add the content to the index and once done, reopen the reader to get your changes reflected. Regards Aditya www.findbestopensource.com On Thu, Aug 19, 2010 at 12:13 PM, Yakob w

Re: Sorting a Lucene index

2010-08-19 Thread findbestopensource
Hi Shelly, Have you tried sorting in your queries. Is it creating in any issues? Once you open a reader and warm your search with sorting then fieldcache will be loaded for that field. You could see more usage of RAM. You could do as many queries with sorting till you reopen the reader. If you ad

Re: Scaling Lucene to 1bln docs

2010-08-10 Thread findbestopensource
Hi Shelly, You need to reduce your maxMergeDocs. set ramBufferSizeMB to 100, which will help you to use less RAM in indexing. >>>search time is 15 secs.. How you are calculating this time. Just taking time difference before and after the search method or this involves time to parse the document o

Re: Using categories with Lucene

2010-08-08 Thread findbestopensource
Hello Daniel & Luan 1. Carrot is not required for your purpose. Carrot helps to consolidate the results from multiple search results. 2. You need to add a category to the pages at the index time and filter out the results during search time. If you want to use Lucene, then you could store the cat

Re: Fast way to get all Terms in a matching query

2010-07-26 Thread findbestopensource
If you know the extension during Index time then you could create a separate field and store all its related content. E.G: TITLE_EXTN: Lucene Apache Manning .. Search on this field will give you faster results. Regards Aditya www.findbestopensource.com On Tue, Jul 27, 2010 at 1:04 AM, Philippe

Re: Holding and changing index wide information

2010-07-22 Thread findbestopensource
Hi Jan, I think, you require version number for each commit OR updates. Say you added 10 docs then it is update 1, then modifed or added some more then it is update 2.. If it is so then my advice would be to have field named field-type, version-number and version-date-time as part of the field in

Re: Out of memory problem in search

2010-07-14 Thread findbestopensource
Certainly it will. Either you need to increase your memory OR refine your query. Eventhough you display paginated result. The first couple of pages will display fine and going towards last may face problem. This is because, 200,000 objects is created and iterated, 190,900 objects are skipped and la

Best open source

2010-07-14 Thread findbestopensource
Hello all, We have launched a new site, which provides the best open source products and libraries across all categories. This site is powered by Solr search. There are many open source products available in all categories and it is sometimes difficult to identify which is the best. The main probl

Re: Cache full text into memory

2010-07-14 Thread findbestopensource
You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and