Just updated my view in the article.. Feel free to add your comments..
http://www.findbestopensource.com/article-detail/lucene-solr-as-nosql-db
Regards
Aditya
www.findbestopensource.com
On Mon, May 21, 2012 at 2:25 PM, Shashi Kant wrote:
> A related thread on Stackoverflow:
>
> http://stackov
Hi,
Lucene is not a data store. You should store data in file system / DB and
store only the reference key and data related to display summary results as
part of Lucene.
Usually in most application, once the search is performed list of search
results with just few information will be displayed. O
Post complete code. You are not closing the objects (IndexWriter / Index
Searcher) properly.
Regards
Aditya
www.findbestopensource.com
On Fri, May 18, 2012 at 6:51 AM, Michel Blase wrote:
> Hi all,
>
> I have few problems Indexing. I keep hitting "Too many open files". It
> seems like Lucene i
Yes. By storing as String, You should be able to do range search. I am not
sure, which is better, storing as String / Integer.
Regards
Aditya
www.findbestopensource.com
On Thu, Feb 23, 2012 at 1:25 PM, Jason Toy wrote:
> Can I still do range searches on a string? It seems like it would be m
Hi,
You could consider storing date field as String in "MMDD" format. This
will save space and it will perform better.
Regards
Aditya
www.findbestopensource.com
On Thu, Feb 23, 2012 at 11:55 AM, Jason Toy wrote:
> I have a solr instance with about 400m docs. For text searches it is
> per
Check out the presentation.
http://java.dzone.com/videos/archive-it-scaling-beyond
Web archive uses Lucene to index billions of pages.
Regards
Aditya
www.findbestopensource.com
On Fri, Jan 13, 2012 at 4:31 PM, Peter K wrote:
> yes and no!
> google is not only the search engine ...
>
> > Just c
Hello all,
Recently i saw couple of discussions in LinkedIn group about generating
large data set or data corpus. I have compiled the same in to an article.
Hope it would be helpful. If you have any other links where we could get
large data set for free, please reply to this mail thread, i will up
Hi,
One good option is to consider using Solr as it helps to access the index
remotely. If you want to use Lucene and you are ready to build your own API
then you could have a web application, which will receive user query,
search in the index and return the result set in user expected fashion.
Y
Hi Christoph
My opinion is, you should not normalize or do any modification to the
product keys. This should be unique. Should be used as it is. Instead of
spaces you should have only used "-" but since the product already out in
the market, it cannot help.
In your UI, You could provide multiple
Hi Jason,
The easiest way would be to set some default value for the field which is
empty, Say EMPTY and search for this string to check out the records having
empty field.
Regards
Aditya
www.findbestopensource.com
On Fri, Jul 15, 2011 at 5:32 AM, Trieu, Jason T wrote:
> Hi all,
>
> I read pos
You are trying to access the reader which is already closed by some other
thread.
1. Keep a reference count for the reader you create.
2. Have a common function through which all functions will retrieve Reader
objects
3. Once the index got changed, create a new reader, do warmup
4. When the new re
You might have closed the IndexReader object but trying to access the search
results.
Regards
Aditya
www.findbestopensource.com
On Tue, Apr 5, 2011 at 5:26 PM, Yogesh Dabhi wrote:
> Hi
>
>
>
> My application is cluster in jobss application servers & lucene
> directory was shared.
>
>
>
> Conc
Hello daniel,
The code seems to be fine. I think you are calculating the time for entire
program which may read the data from external source and prepare the array
list. Just calculate time only for indexing.
Regards
Aditya
www.findbestopensource.com
On Wed, Apr 6, 2011 at 2:38 PM, ZYWALEWSKI,
Hello Lokendra,
You could updates frequently. Anyway i think it is one time job.
My advice would be do insertion and updates in batch.
1. Parse your file and read 1000 lines
2. Do some aggregation and insert / update with lucene.
Regards
Aditya
www.findbestopensource.com
On Fri, Feb 25, 2011
I don't think so, you could combine the queries. You are first searching
Index A and the results are given as input to Index B. You cannot combine
the queries and you cannot use multi searcher or parallel multi searcher.
You need to search two indexes independently and sequentially.
Regards
Aditya
You may need to use ngrams.
http://lucene.apache.org/java/3_0_3/api/all/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html
Another option would be doing wildcard query without enabling leading
wildcard search. search for cr* and not *cr* as the auto suggest
feature should give suggestion f
Here are few projects tagged text-extraction.
http://www.findbestopensource.com/tagged/text-extraction I am not sure, If
any product actually extract content from msg files. But take a look.
On Fri, Feb 4, 2011 at 5:33 AM, Zhang, Lisheng <
lisheng.zh...@broadvision.com> wrote:
>
> Hi,
>
> Do you
Your problem is more with tika. Pls post in tika user group.
If you want to deal with only HTML then better use html parser.
http://www.findbestopensource.com/search/?query=%22html+parser%22
On Tue, Jan 11, 2011 at 7:24 AM, amg qas wrote:
> I have been trying to parse & index different portion
>>Do I need to compile the Lucene and analyzer code in 64 bit JVM?
You don't need to compile. Just drop your jars in 64 Bit JVM in 64 Bit OS.
Regards
Aditya
www.findbestopensource.com
On Wed, Dec 22, 2010 at 1:07 PM, Ganesh wrote:
> Thanks. I going to try in 64 bit. I will post some update in
r searched words. The user should not be thinking about...just doing
> it.
>
> Dirk
>
> On Tue, 2 Nov 2010 20:00:08 +0530, findbestopensource
> wrote:
> > Yes. Correct. It would be good, If User inputs the search string with *.
> >
> > My Idea is to index two fiel
Yes. Correct. It would be good, If User inputs the search string with *.
My Idea is to index two fields separately first name and last name. Provide
two text boxes with first name and last name. Leave the rest to the User.
Regrads
Aditya
www.findbestopensource.com
On Tue, Nov 2, 2010 at 7:44 P
Hello
Doing single search with multiple filters will give faster results.
Doing search per field (multiple saerch) and combining the results is a bad
idea.
Regards
Aditya
www.findbestopensource.com
On Mon, Nov 1, 2010 at 11:02 PM, Francisco Borges <
francisco.bor...@gmail.com> wrote:
> Hello,
Hi fulin,
It is not possible. You need to add / update as a document. Even if you
modify a single field, you need to add all the fields. Update is nothing but
Delete and Add operation.
If you don't have the information of rest of the fields then you may need to
search and retreive the document, m
Hi jacobian,
Lucene will not do incremental update by iteself. Lucene is just a
library. Your app should periodically add the content to the index and
once done, reopen the reader to get your changes reflected.
Regards
Aditya
www.findbestopensource.com
On Thu, Aug 19, 2010 at 12:13 PM, Yakob w
Hi Shelly,
Have you tried sorting in your queries. Is it creating in any issues?
Once you open a reader and warm your search with sorting then
fieldcache will be loaded for that field. You could see more usage of
RAM. You could do as many queries with sorting till you reopen the
reader.
If you ad
Hi Shelly,
You need to reduce your maxMergeDocs. set ramBufferSizeMB to 100,
which will help you to use less RAM in indexing.
>>>search time is 15 secs..
How you are calculating this time. Just taking time difference before
and after the search method or this involves time to parse the
document o
Hello Daniel & Luan
1. Carrot is not required for your purpose. Carrot helps to
consolidate the results from multiple search results.
2. You need to add a category to the pages at the index time and
filter out the results during search time. If you want to use Lucene,
then you could store the cat
If you know the extension during Index time then you could create a
separate field and store all its related content.
E.G: TITLE_EXTN: Lucene Apache Manning ..
Search on this field will give you faster results.
Regards
Aditya
www.findbestopensource.com
On Tue, Jul 27, 2010 at 1:04 AM, Philippe
Hi Jan,
I think, you require version number for each commit OR updates. Say
you added 10 docs then it is update 1, then modifed or added some more
then it is update 2.. If it is so then my advice would be to have
field named field-type, version-number and version-date-time as part
of the field in
Certainly it will. Either you need to increase your memory OR refine your
query. Eventhough you display paginated result. The first couple of pages
will display fine and going towards last may face problem. This is because,
200,000 objects is created and iterated, 190,900 objects are skipped and
la
Hello all,
We have launched a new site, which provides the best open source products
and libraries across all categories. This site is powered by Solr search.
There are many open source products available in all categories and it is
sometimes difficult to identify which is the best. The main probl
You have two options
1. Store the compressed text as part of stored field in Solr.
2. Using external caching.
http://www.findbestopensource.com/tagged/distributed-caching
You could use ehcache / Memcache / Membase.
The problem with external caching is you need to synchronize the deletions
and
32 matches
Mail list logo