Desing Question related with Lucene
Hello; I have a desing question while developing my project. If you have time, lease read my problem and if you have a solution please make me informed. Project : Our system produce a txt file for every one hour(13 pm 14 pm e.g. ). (These files contain logs from network e.g. TCP logs). I use FreeBSD and cron. For every hour, after five minutes later(13:05 pm , 14:05 pm e.g.), there is a process which indexes this txt file with lucene indexing. Then I have an web app which search some textual search with this produced indexed file(lucene produce obviously). Problem: Our customers wants to know which client is using internet most, which site is used most and like this things which are done with sql like that as you know select site, count(site) from log_table group by site Mhy solution is: A second process which insert logs to table (temporary table ) and aftr inserting do some quesries on this temporarty table and get results to main statistics tablse which has tables as which site is most visited table . This temporary table updates related statistics table . I have need a recommendation about the problem? Is there any solution on lucene(get most ranked kind query searching is exist or not and if yes is it good for performance) If there is no a solution in lucene what will you use for this situation? : Thanks for your help. : ilkay POLAT : Research &Development Software Engineer TURKEY
Removing old datas from index file
Hello; I need to learn whether there is a way to remove some records on indexed files. And is it rapid for removing some indexed file records (For example clean old records whose created date's are less than a definite day) . Thanks
Re: Desing Question related with Lucene
Is it better to analyze logs with lucene ? Or other solutions are better for performance On Thu, May 20, 2010 at 9:51 AM, ilkay polat wrote: > Hello; > > I have a desing question while developing my project. If you have time, > lease read my problem and if you have a solution please make me informed. > > Project : Our system produce a txt file for every one hour(13 pm 14 pm > e.g. > ). (These files contain logs from network e.g. TCP logs). I use FreeBSD and > cron. For every hour, after five minutes later(13:05 pm , 14:05 pm e.g.), > there is a process which indexes this txt file with lucene indexing. > Then I > have an web app which search some textual search with this produced > indexed > file(lucene produce obviously). > > Problem: Our customers wants to know which client is using internet most, > which site is used most and like this things which are done with sql like > that as you know > > select site, count(site) from log_table > group by site > > Mhy solution is: A second process which insert logs to table (temporary > table ) and aftr inserting do some quesries on this temporarty table and > get > results to main statistics tablse which has tables as which site is most > visited table . This temporary table updates related statistics table . > > > I have need a recommendation about the problem? Is there any solution on > lucene(get most ranked kind query searching is exist or not and if yes is > it > good for performance) > If there is no a solution in lucene what will you use for this situation? > : Thanks for your help. > > > : ilkay POLAT > > : Research &Development Software Engineer TURKEY > -- --- ilkay POLAT Research &Development Software Engineer TURKEY
Out of memory problem in search
Hello Friends; Recently, I have problem with lucene search - memory problem on the basis that indexed file is so big. (I have indexed some kinds of information and this indexed file's size is nearly more than 40 gigabyte. ) I search the lucene indexed file with org.apache.lucene.search.Searcher.search(query, null, offset + limit, new Sort(new SortField("time", SortField.LONG, true))); (This provides to find (offset + limit) records to back.) I use searching by range. For example, in web page I firstly search records which are in [0, 100] range then second page [100, 200] I have nearly 200,000 records at all. When I go to last page which means records between 200,000 -100, 200,0, there is a memory problem(I have 4gb ram on running machine) in jvm( out of memory error). Is there a way to overcome this memory problem? Thanks -- ilkay POLAT Software Engineer TURKEY Gsm : (+90) 532 542 36 71 E-mail : ilkay_po...@yahoo.com
RE: Out of memory problem in search
Indeed, this is good solution to that kind of problems. But same problem can be occured in future when logs are added to index file. For example, here 200,000 records have problem(These logs are collected in 13 days). With that reverse way, there will be maximum search range is 100,000. But if there is 400,000 records same problem will be occured(Max search space is 200,000 again). Is there another way which do not consume so much memory or consume restrict memory and consume time instead of memory. This restriction come from our project hardware restrictions(Hardware memory is 8GB in maximum situation)? --- On Wed, 7/14/10, Uwe Schindler wrote: From: Uwe Schindler Subject: RE: Out of memory problem in search To: java-user@lucene.apache.org Date: Wednesday, July 14, 2010, 3:25 PM Reverse the query sorting to display the last page. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ilkay polat [mailto:ilkay_po...@yahoo.com] > Sent: Wednesday, July 14, 2010 12:44 PM > To: java-user@lucene.apache.org > Subject: Out of memory problem in search > > Hello Friends; > > Recently, I have problem with lucene search - memory problem on the basis > that indexed file is so big. (I have indexed some kinds of information and this > indexed file's size is nearly more than 40 gigabyte. ) > > I search the lucene indexed file with > org.apache.lucene.search.Searcher.search(query, null, offset + limit, new > Sort(new SortField("time", SortField.LONG, true))); (This provides to find > (offset + limit) records to back.) > > I use searching by range. For example, in web page I firstly search records > which are in [0, 100] range then second page [100, 200] I have nearly 200,000 > records at all. When I go to last page which means records between 200,000 - > 100, 200,0, there is a memory problem(I have 4gb ram on running machine) in > jvm( out of memory error). > > Is there a way to overcome this memory problem? > > Thanks > > -- > ilkay POLAT Software Engineer > TURKEY > > Gsm : (+90) 532 542 36 71 > E-mail : ilkay_po...@yahoo.com > > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Out of memory problem in search
Hi, We have hardware restrictions(Max RAM can be 8GB). So, unfortunately, increasing memory can not be option for us for today's situation. Yes, as you said that problem is faced when goes to last pages of search screen because of using search method which is find top n records. In other way, this is meaning "searching all the thinngs returns all". I am now researching whether there is a way which consumes time instead of memory in this search mechanism in lucene? Any other ideas? Thanks --- On Wed, 7/14/10, findbestopensource wrote: From: findbestopensource Subject: Re: Out of memory problem in search To: java-user@lucene.apache.org Date: Wednesday, July 14, 2010, 2:59 PM Certainly it will. Either you need to increase your memory OR refine your query. Eventhough you display paginated result. The first couple of pages will display fine and going towards last may face problem. This is because, 200,000 objects is created and iterated, 190,900 objects are skipped and last100 objects are returned. The memory is consumed in creating these objects. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 4:14 PM, ilkay polat wrote: > Hello Friends; > > Recently, I have problem with lucene search - memory problem on the basis > that indexed file is so big. (I have indexed some kinds of information and > this indexed file's size is nearly more than 40 gigabyte. ) > > I search the lucene indexed file with > org.apache.lucene.search.Searcher.search(query, null, offset + limit, new > Sort(new SortField("time", SortField.LONG, true))); > (This provides to find (offset + limit) records to back.) > > I use searching by range. For example, in web page I firstly search records > which are in [0, 100] range then second page [100, 200] > I have nearly 200,000 records at all. When I go to last page which means > records between 200,000 -100, 200,0, there is a memory problem(I have 4gb > ram on running machine) in jvm( out of memory error). > > Is there a way to overcome this memory problem? > > Thanks > > -- > ilkay POLAT Software Engineer > TURKEY > > Gsm : (+90) 532 542 36 71 > E-mail : ilkay_po...@yahoo.com > > >
Re: Out of memory problem in search
I have also confused about the memory management of lucene. Where is this out of memory problem is mainly arised from Reason-1 or Reason-2 reason? Reason-1 : Problem is sourced from searching is done in big indexed file (nearly 40 GB) If there is 100(small number of records) records returned from search in 60 GB indexed file, problem will again arised. OR Reason-2 : Problem is sourced from finding so many records(nearly 200,000 records), so in memory 200, 000 java object in heap? If file's sizeis 10 GB(small file size ) but returned records are so many, problem will again arised. Is there any document which tells the general memory management issues in searching in lucene? Thanks ilkay POLAT Software Engineer Gsm : (+90) 532 542 36 71 E-mail : ilkay_po...@yahoo.com --- On Wed, 7/14/10, ilkay polat wrote: From: ilkay polat Subject: Re: Out of memory problem in search To: java-user@lucene.apache.org Date: Wednesday, July 14, 2010, 3:54 PM Hi, We have hardware restrictions(Max RAM can be 8GB). So, unfortunately, increasing memory can not be option for us for today's situation. Yes, as you said that problem is faced when goes to last pages of search screen because of using search method which is find top n records. In other way, this is meaning "searching all the thinngs returns all". I am now researching whether there is a way which consumes time instead of memory in this search mechanism in lucene? Any other ideas? Thanks --- On Wed, 7/14/10, findbestopensource wrote: From: findbestopensource Subject: Re: Out of memory problem in search To: java-user@lucene.apache.org Date: Wednesday, July 14, 2010, 2:59 PM Certainly it will. Either you need to increase your memory OR refine your query. Eventhough you display paginated result. The first couple of pages will display fine and going towards last may face problem. This is because, 200,000 objects is created and iterated, 190,900 objects are skipped and last100 objects are returned. The memory is consumed in creating these objects. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 4:14 PM, ilkay polat wrote: > Hello Friends; > > Recently, I have problem with lucene search - memory problem on the basis > that indexed file is so big. (I have indexed some kinds of information and > this indexed file's size is nearly more than 40 gigabyte. ) > > I search the lucene indexed file with > org.apache.lucene.search.Searcher.search(query, null, offset + limit, new > Sort(new SortField("time", SortField.LONG, true))); > (This provides to find (offset + limit) records to back.) > > I use searching by range. For example, in web page I firstly search records > which are in [0, 100] range then second page [100, 200] > I have nearly 200,000 records at all. When I go to last page which means > records between 200,000 -100, 200,0, there is a memory problem(I have 4gb > ram on running machine) in jvm( out of memory error). > > Is there a way to overcome this memory problem? > > Thanks > > -- > ilkay POLAT Software Engineer > TURKEY > > Gsm : (+90) 532 542 36 71 > E-mail : ilkay_po...@yahoo.com > > >