Guy, ulimit -n is your friend. As is the compound index format.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Guy Gavriely <[EMAIL PROTECTED]> > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > Sent: Thursday, September 11, 2008 10:28:34 AM > Subject: Terms with different boosts > > Hi, > > I have to index terms with different boosts, meaning that if the word A > appears > in two documents one document will be ranked higher. > > I've tried to index them by putting them in different fields and give the > fields > different boost but i ran into too many files (caused by too many fields I > guess) problem. > Below is the stack trace: > > java.io.FileNotFoundException: /tmp/nutch/crawl/indexes/part-00000/_0.nrm > (Too > many open files) > > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:106) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.(RawLocalFileSystem.java:87) > at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:142) > at > org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:110) > at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:330) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:245) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput$Descriptor.(FsDirectory.java:160) > at > org.apache.nutch.indexer.FsDirectory$DfsIndexInput.(FsDirectory.java:169) > at org.apache.nutch.indexer.FsDirectory.openInput(FsDirectory.java:115) > at org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java:522) > at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:185) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:140) > at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:126) > at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:155) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:579) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:179) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:142) > at > org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.(DeleteDuplicates.java:167) > at > org.apache.nutch.indexer.DeleteDuplicates$InputFormat.getRecordReader(DeleteDuplicates.java:240) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:139) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:126) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439) > at com.sightix.nutch.crawl.Crawl.main(Crawl.java:139) > > Can you suggest another way to index terms in different boosts? > Thanks, > Guy > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]