After a week away, I'm back and still working to get to the bottom of this issue. We run Lucene from the binaries, so making changes to the source code is not something we are really setup to do right now.
I have, however, created a trivial Java app that just opens an IndexReader for our problematic index and then closes it: try { IndexReader indexReader = IndexReader.open(getIndexDirectory(indexPath)); System.out.println("Successfully opened index at " + indexPath); indexReader.close(); System.out.println("Successfully closed index at " + indexPath); } catch (Exception ex) { System.out.println("Exception while opening index: " + ex.getMessage()); } I've run this simple app with the hprof commands suggested below and it appears that a huge amount of the CPU work is spent on String function(s). Below is the summary from the end of the java.hprof.txt. I'm happy to attach the whole file, but I wasn't sure whether that was appropriate for this mailing list. Thanks, Mark CPU SAMPLES BEGIN (total = 5295) Wed Nov 17 11:54:15 2010 rank self accum count trace method 1 80.40% 80.40% 4257 300165 java.lang.String.intern 2 1.83% 82.23% 97 300189 sun.nio.ch.FileDispatcher.pread0 3 0.83% 83.06% 44 300232 java.util.HashMap.transfer 4 0.72% 83.78% 38 300201 sun.nio.ch.FileDispatcher.pread0 5 0.70% 84.48% 37 300252 org.apache.lucene.util.SimpleStringInterner.intern 6 0.60% 85.08% 32 300191 java.lang.StringCoding$StringDecoder.decode 7 0.59% 85.67% 31 300202 java.lang.System.arraycopy 8 0.38% 86.04% 20 300098 java.util.zip.ZipFile.read 9 0.36% 86.40% 19 300203 java.util.Arrays.copyOfRange 10 0.36% 86.76% 19 300224 sun.nio.ch.FileDispatcher.pread0 11 0.32% 87.08% 17 300089 java.lang.Class.forName0 12 0.32% 87.40% 17 300237 java.lang.Thread.currentThread 13 0.28% 87.69% 15 300049 java.lang.ClassLoader.findBootstrapClass 14 0.28% 87.97% 15 300102 java.util.zip.ZipFile.read 15 0.26% 88.23% 14 300180 java.util.zip.ZipFile.read 16 0.26% 88.50% 14 300255 java.lang.Thread.currentThread 17 0.26% 88.76% 14 300335 sun.nio.ch.FileDispatcher.pread0 18 0.25% 89.01% 13 300164 java.lang.System.arraycopy 19 0.25% 89.25% 13 300286 sun.nio.ch.NativeThread.current 20 0.23% 89.48% 12 300240 sun.nio.ch.FileDispatcher.pread0 21 0.23% 89.71% 12 300242 java.lang.System.arraycopy 22 0.21% 89.92% 11 300207 java.lang.Thread.currentThread 23 0.21% 90.12% 11 300231 java.lang.System.getSecurityManager 24 0.19% 90.31% 10 300155 java.util.zip.ZipFile.read 25 0.19% 90.50% 10 300216 java.lang.ClassLoader.findBootstrapClass 26 0.19% 90.69% 10 300239 java.nio.Bits.copyToByteArray 27 0.19% 90.88% 10 300350 java.util.HashMap.values 28 0.17% 91.05% 9 300034 sun.net.www.protocol.file.Handler.createFileURLConnection 29 0.17% 91.22% 9 300283 sun.nio.ch.FileDispatcher.pread0 30 0.15% 91.37% 8 300006 java.util.jar.JarFile.getBytes 31 0.15% 91.52% 8 300008 java.util.zip.ZipFile.getInputStream 32 0.15% 91.67% 8 300166 java.util.zip.ZipFile.read 33 0.15% 91.82% 8 300179 java.lang.ClassLoader.findBootstrapClass 34 0.15% 91.97% 8 300209 sun.nio.ch.FileDispatcher.pread0 35 0.13% 92.11% 7 300123 java.lang.ClassLoader$NativeLibrary.load 36 0.13% 92.24% 7 300140 sun.nio.ch.FileDispatcher.pread0 37 0.13% 92.37% 7 300225 sun.nio.ch.FileDispatcher.pread0 38 0.13% 92.50% 7 300246 java.nio.Bits.copyToByteArray 39 0.11% 92.62% 6 300031 java.util.zip.ZipFile.read 40 0.11% 92.73% 6 300059 java.io.FileInputStream.readBytes 41 0.11% 92.84% 6 300101 java.lang.ClassLoader.findBootstrapClass 42 0.11% 92.96% 6 300138 java.lang.ClassLoader.findBootstrapClass 43 0.11% 93.07% 6 300241 sun.nio.ch.FileDispatcher.pread0 44 0.11% 93.18% 6 300282 java.lang.Thread.currentThread 45 0.11% 93.30% 6 300290 org.apache.lucene.index.TermInfosReader.<init> 46 0.11% 93.41% 6 300311 org.apache.lucene.util.UnicodeUtil.UTF8toUTF16 47 0.09% 93.50% 5 300047 java.util.zip.ZipFile.read 48 0.09% 93.60% 5 300057 java.io.UnixFileSystem.getBooleanAttributes0 49 0.09% 93.69% 5 300064 sun.security.jca.Providers.<clinit> 50 0.09% 93.79% 5 300254 sun.nio.ch.NativeThread.current 51 0.09% 93.88% 5 300324 org.apache.lucene.index.SegmentTermEnum.next 52 0.09% 93.98% 5 300340 java.util.HashMap.put 53 0.08% 94.05% 4 300007 java.util.zip.ZipFile.getInputStream 54 0.08% 94.13% 4 300009 java.util.zip.ZipFile.getInflater 55 0.08% 94.20% 4 300010 java.util.jar.JarFile.getManifestFromReference 56 0.08% 94.28% 4 300051 java.lang.ClassLoader.findBootstrapClass 57 0.08% 94.35% 4 300054 java.lang.ClassLoader.findBootstrapClass 58 0.08% 94.43% 4 300083 java.util.HashMap.entrySet0 59 0.08% 94.50% 4 300108 java.util.zip.ZipFile.read 60 0.08% 94.58% 4 300135 java.util.zip.ZipFile.read 61 0.08% 94.66% 4 300142 java.util.zip.ZipFile.read 62 0.08% 94.73% 4 300238 java.lang.Thread.currentThread 63 0.08% 94.81% 4 300247 sun.nio.ch.FileDispatcher.pread0 64 0.08% 94.88% 4 300253 java.lang.Thread.currentThread 65 0.08% 94.96% 4 300257 java.util.HashMap.resize 66 0.08% 95.03% 4 300275 sun.nio.ch.FileDispatcher.pread0 67 0.08% 95.11% 4 300295 org.apache.lucene.index.TermBuffer.read 68 0.08% 95.18% 4 300299 org.apache.lucene.index.SegmentTermEnum.next 69 0.06% 95.24% 3 300004 java.util.zip.ZipFile.getEntry 70 0.06% 95.30% 3 300021 sun.misc.URLClassPath$3.run 71 0.06% 95.35% 3 300050 java.util.zip.ZipFile.read 72 0.06% 95.41% 3 300055 java.security.MessageDigest.getInstance 73 0.06% 95.47% 3 300124 java.lang.ClassLoader$NativeLibrary.load 74 0.06% 95.52% 3 300249 java.util.HashMap.getEntry 75 0.06% 95.58% 3 300250 java.util.HashMap.getEntry 76 0.06% 95.64% 3 300261 java.lang.System.arraycopy 77 0.06% 95.69% 3 300267 java.util.Arrays.copyOf 78 0.06% 95.75% 3 300276 org.apache.lucene.index.TermInfosReader.<init> 79 0.06% 95.81% 3 300277 org.apache.lucene.index.SegmentTermEnum.next 80 0.06% 95.86% 3 300300 org.apache.lucene.index.TermInfosReader.<init> 81 0.06% 95.92% 3 300304 org.apache.lucene.store.IndexInput.readVLong 82 0.06% 95.98% 3 300318 sun.nio.ch.NativeThread.current 83 0.06% 96.03% 3 300338 sun.nio.cs.UTF_8.updatePositions 84 0.06% 96.09% 3 300339 org.apache.lucene.util.SimpleStringInterner.intern 85 0.04% 96.13% 2 300001 java.lang.ClassLoader.findBootstrapClass 86 0.04% 96.17% 2 300079 java.lang.Math.floor 87 0.04% 96.20% 2 300085 java.security.Provider.parseLegacyPut 88 0.04% 96.24% 2 300107 org.apache.lucene.index.IndexReader.open 89 0.04% 96.28% 2 300119 java.io.RandomAccessFile.getChannel 90 0.04% 96.32% 2 300190 java.lang.System.arraycopy 91 0.04% 96.36% 2 300197 java.nio.ByteBuffer.hasArray 92 0.04% 96.39% 2 300198 java.util.HashMap.put 93 0.04% 96.43% 2 300199 java.util.HashMap.hash 94 0.04% 96.47% 2 300210 java.util.HashMap.addEntry 95 0.04% 96.51% 2 300217 java.util.HashMap.hash 96 0.04% 96.54% 2 300220 java.nio.Buffer.position 97 0.04% 96.58% 2 300222 org.apache.lucene.index.FieldInfos.read 98 0.04% 96.62% 2 300235 java.util.Arrays.copyOf 99 0.04% 96.66% 2 300243 java.lang.System.arraycopy 100 0.04% 96.69% 2 300248 org.apache.lucene.index.FieldInfos.hasVectors 101 0.04% 96.73% 2 300258 java.lang.Thread.currentThread 102 0.04% 96.77% 2 300262 org.apache.lucene.util.StringHelper.intern 103 0.04% 96.81% 2 300264 java.lang.Thread.isInterrupted 104 0.04% 96.85% 2 300292 org.apache.lucene.index.TermInfosReader.<init> 105 0.04% 96.88% 2 300297 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal 106 0.04% 96.92% 2 300309 sun.nio.ch.FileDispatcher.pread0 107 0.04% 96.96% 2 300319 org.apache.lucene.store.IndexInput.readVLong 108 0.04% 97.00% 2 300321 org.apache.lucene.store.IndexInput.readVLong 109 0.04% 97.03% 2 300323 org.apache.lucene.store.BufferedIndexInput.refill 110 0.04% 97.07% 2 300326 sun.nio.ch.NativeThread.current 111 0.04% 97.11% 2 300332 org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores 112 0.04% 97.15% 2 300334 org.apache.lucene.index.SegmentReader.get 113 0.04% 97.19% 2 300341 java.util.HashMap.put 114 0.04% 97.22% 2 300344 org.apache.lucene.util.SimpleStringInterner.intern 115 0.04% 97.26% 2 300346 org.apache.lucene.store.IndexInput.readString 116 0.02% 97.28% 1 300003 java.util.zip.ZipFile.open 117 0.02% 97.30% 1 300014 java.util.jar.Attributes.putValue 118 0.02% 97.32% 1 300019 sun.misc.URLClassPath.getLoader 119 0.02% 97.34% 1 300023 sun.misc.URLClassPath$JarLoader.ensureOpen 120 0.02% 97.36% 1 300027 sun.misc.URLClassPath$JarLoader.checkResource 121 0.02% 97.37% 1 300029 sun.security.util.ManifestEntryVerifier.<init> 122 0.02% 97.39% 1 300036 sun.net.www.URLConnection.<init> 123 0.02% 97.41% 1 300039 java.io.FilePermission$1.run 124 0.02% 97.43% 1 300060 java.util.Properties$LineReader.readLine 125 0.02% 97.45% 1 300066 sun.security.jca.ProviderList.<clinit> 126 0.02% 97.47% 1 300071 java.security.Provider.<init> 127 0.02% 97.49% 1 300075 sun.security.jca.ProviderConfig.getLock 128 0.02% 97.51% 1 300081 sun.security.provider.NativePRNG.initIO 129 0.02% 97.53% 1 300086 java.lang.Character.toUpperCaseEx 130 0.02% 97.54% 1 300088 java.util.HashMap.put 131 0.02% 97.56% 1 300093 org.apache.lucene.store.FSDirectory.<clinit> 132 0.02% 97.58% 1 300095 java.lang.ClassLoader.defineClass1 133 0.02% 97.60% 1 300096 java.util.zip.Inflater.inflateBytes 134 0.02% 97.62% 1 300097 sun.security.provider.MD5.implDigest 135 0.02% 97.64% 1 300099 java.util.zip.Inflater.inflateBytes 136 0.02% 97.66% 1 300103 java.lang.ClassLoader.defineClass1 137 0.02% 97.68% 1 300104 java.util.Arrays.copyOf 138 0.02% 97.70% 1 300105 java.lang.String.indexOf 139 0.02% 97.71% 1 300106 java.util.zip.InflaterInputStream.<init> 140 0.02% 97.73% 1 300109 java.util.zip.ZipFile.read 141 0.02% 97.75% 1 300111 java.util.Arrays.copyOfRange 142 0.02% 97.77% 1 300113 java.util.zip.Inflater.inflateBytes 143 0.02% 97.79% 1 300115 java.util.Arrays.copyOf 144 0.02% 97.81% 1 300116 java.lang.ClassLoader.findBootstrapClass 145 0.02% 97.83% 1 300117 java.lang.String.lastIndexOf 146 0.02% 97.85% 1 300122 sun.security.action.LoadLibraryAction.<init> 147 0.02% 97.87% 1 300127 sun.nio.ch.FileChannelImpl.<init> 148 0.02% 97.88% 1 300132 java.nio.DirectByteBuffer.<init> 149 0.02% 97.90% 1 300136 java.util.Arrays.copyOfRange 150 0.02% 97.92% 1 300141 java.lang.ref.SoftReference.get 151 0.02% 97.94% 1 300143 java.util.zip.Inflater.inflateBytes 152 0.02% 97.96% 1 300145 org.apache.lucene.index.SegmentInfos.read 153 0.02% 97.98% 1 300146 java.util.zip.CRC32.update 154 0.02% 98.00% 1 300147 java.nio.CharBuffer.hasArray 155 0.02% 98.02% 1 300148 java.lang.ClassLoader.defineClass1 156 0.02% 98.04% 1 300149 org.apache.lucene.index.DirectoryReader.<init> 157 0.02% 98.05% 1 300150 java.lang.ClassLoader.defineClass1 158 0.02% 98.07% 1 300151 java.lang.AbstractStringBuilder.<init> 159 0.02% 98.09% 1 300154 java.lang.ClassLoader.defineClass1 160 0.02% 98.11% 1 300157 org.apache.lucene.index.SegmentReader$CoreReaders.<init> 161 0.02% 98.13% 1 300159 org.apache.lucene.util.StringHelper.<clinit> 162 0.02% 98.15% 1 300161 org.apache.lucene.util.SimpleStringInterner.<init> 163 0.02% 98.17% 1 300167 java.lang.ClassLoader.defineClass1 164 0.02% 98.19% 1 300168 org.apache.lucene.index.SegmentReader$CoreReaders.<init> 165 0.02% 98.21% 1 300170 org.apache.lucene.index.SegmentTermEnum.<init> 166 0.02% 98.22% 1 300174 java.util.zip.Inflater.inflateBytes 167 0.02% 98.24% 1 300177 java.security.AccessController.doPrivileged 168 0.02% 98.26% 1 300181 java.util.zip.Inflater.inflateBytes 169 0.02% 98.28% 1 300182 java.util.Arrays.copyOf 170 0.02% 98.30% 1 300183 java.util.Arrays.copyOf 171 0.02% 98.32% 1 300184 java.lang.ClassLoader.defineClass1 172 0.02% 98.34% 1 300185 java.lang.ref.SoftReference.get 173 0.02% 98.36% 1 300186 java.lang.String.replace 174 0.02% 98.38% 1 300187 org.apache.lucene.index.SegmentReader.openNorms 175 0.02% 98.39% 1 300188 java.nio.charset.CharsetDecoder.flush 176 0.02% 98.41% 1 300192 sun.nio.cs.UTF_8$Decoder.decodeLoop 177 0.02% 98.43% 1 300193 java.lang.StringCoding.decode 178 0.02% 98.45% 1 300194 java.lang.System.arraycopy 179 0.02% 98.47% 1 300195 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop 180 0.02% 98.49% 1 300196 java.util.HashMap.hash 181 0.02% 98.51% 1 300200 sun.nio.cs.UTF_8$Decoder.isMalformed2 182 0.02% 98.53% 1 300204 java.lang.System.arraycopy 183 0.02% 98.55% 1 300205 java.io.RandomAccessFile.open 184 0.02% 98.56% 1 300206 org.apache.lucene.util.SimpleStringInterner.intern 185 0.02% 98.58% 1 300208 java.nio.charset.CoderResult.isUnderflow 186 0.02% 98.60% 1 300211 org.apache.lucene.util.SimpleStringInterner.intern 187 0.02% 98.62% 1 300212 java.util.HashMap.addEntry 188 0.02% 98.64% 1 300213 org.apache.lucene.store.IndexInput.readVInt 189 0.02% 98.66% 1 300214 java.util.ArrayList.RangeCheck 190 0.02% 98.68% 1 300218 org.apache.lucene.index.FieldInfos.read 191 0.02% 98.70% 1 300219 java.lang.Thread.currentThread 192 0.02% 98.72% 1 300221 java.util.HashMap.transfer 193 0.02% 98.73% 1 300223 java.lang.System.arraycopy 194 0.02% 98.75% 1 300226 org.apache.lucene.store.IndexInput.readString 195 0.02% 98.77% 1 300227 org.apache.lucene.store.BufferedIndexInput.readBytes 196 0.02% 98.79% 1 300228 java.util.ArrayList.size 197 0.02% 98.81% 1 300229 java.lang.StringCoding.access$100 198 0.02% 98.83% 1 300230 java.lang.Thread.currentThread 199 0.02% 98.85% 1 300233 java.lang.Throwable.fillInStackTrace 200 0.02% 98.87% 1 300236 java.lang.StringCoding.decode 201 0.02% 98.89% 1 300244 sun.nio.ch.FileDispatcher.pread0 202 0.02% 98.90% 1 300245 sun.nio.ch.NativeThread.current 203 0.02% 98.92% 1 300251 java.util.HashMap.getEntry 204 0.02% 98.94% 1 300259 java.nio.Bits.copyToByteArray 205 0.02% 98.96% 1 300260 java.nio.DirectByteBuffer.get 206 0.02% 98.98% 1 300263 java.lang.Thread.currentThread 207 0.02% 99.00% 1 300265 sun.nio.ch.FileChannelImpl.read 208 0.02% 99.02% 1 300266 java.nio.channels.spi.AbstractInterruptibleChannel.begin 209 0.02% 99.04% 1 300268 sun.nio.ch.NativeThread.current 210 0.02% 99.06% 1 300269 sun.nio.ch.FileChannelImpl.read 211 0.02% 99.07% 1 300270 java.nio.Bits.copyToByteArray 212 0.02% 99.09% 1 300271 java.lang.Thread.currentThread 213 0.02% 99.11% 1 300272 java.lang.Object.clone 214 0.02% 99.13% 1 300273 org.apache.lucene.index.TermInfosReader.<init> 215 0.02% 99.15% 1 300274 org.apache.lucene.index.TermInfosReader.<init> 216 0.02% 99.17% 1 300278 org.apache.lucene.index.SegmentTermEnum.next 217 0.02% 99.19% 1 300279 java.nio.channels.spi.AbstractInterruptibleChannel.isOpen 218 0.02% 99.21% 1 300280 java.lang.Thread.isInterrupted 219 0.02% 99.23% 1 300281 java.lang.Thread.currentThread 220 0.02% 99.24% 1 300284 java.lang.System.arraycopy 221 0.02% 99.26% 1 300285 java.lang.System.arraycopy 222 0.02% 99.28% 1 300287 java.lang.Thread.currentThread 223 0.02% 99.30% 1 300288 java.nio.Bits.copyToByteArray 224 0.02% 99.32% 1 300289 org.apache.lucene.store.BufferedIndexInput.readBytes 225 0.02% 99.34% 1 300291 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal 226 0.02% 99.36% 1 300293 java.lang.Thread.isInterrupted 227 0.02% 99.38% 1 300294 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal 228 0.02% 99.40% 1 300296 org.apache.lucene.store.BufferedIndexInput.readByte 229 0.02% 99.41% 1 300298 sun.nio.ch.FileChannelImpl.read 230 0.02% 99.43% 1 300301 java.lang.Thread.currentThread 231 0.02% 99.45% 1 300302 sun.nio.ch.FileChannelImpl.ensureOpen 232 0.02% 99.47% 1 300303 sun.nio.ch.FileDispatcher.pread0 233 0.02% 99.49% 1 300305 sun.nio.ch.FileDispatcher.pread0 234 0.02% 99.51% 1 300306 sun.nio.ch.NativeThread.current 235 0.02% 99.53% 1 300307 org.apache.lucene.index.TermBuffer.toTerm 236 0.02% 99.55% 1 300308 org.apache.lucene.store.BufferedIndexInput.refill 237 0.02% 99.57% 1 300310 sun.nio.ch.FileChannelImpl.read 238 0.02% 99.58% 1 300312 sun.nio.ch.NativeThread.current 239 0.02% 99.60% 1 300313 sun.nio.ch.FileChannelImpl.read 240 0.02% 99.62% 1 300314 sun.nio.ch.NativeThread.current 241 0.02% 99.64% 1 300315 org.apache.lucene.store.BufferedIndexInput.refill 242 0.02% 99.66% 1 300316 org.apache.lucene.store.BufferedIndexInput.refill 243 0.02% 99.68% 1 300317 sun.nio.ch.FileChannelImpl.read 244 0.02% 99.70% 1 300320 org.apache.lucene.index.TermBuffer.read 245 0.02% 99.72% 1 300322 org.apache.lucene.store.BufferedIndexInput.refill 246 0.02% 99.74% 1 300325 org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal 247 0.02% 99.75% 1 300327 org.apache.lucene.store.IndexInput.readVInt 248 0.02% 99.77% 1 300328 org.apache.lucene.store.BufferedIndexInput.refill 249 0.02% 99.79% 1 300329 java.nio.Bits.copyToByteArray 250 0.02% 99.81% 1 300330 org.apache.lucene.store.BufferedIndexInput.refill 251 0.02% 99.83% 1 300331 sun.nio.ch.NativeThread.current 252 0.02% 99.85% 1 300333 sun.misc.Unsafe.setMemory 253 0.02% 99.87% 1 300336 org.apache.lucene.index.TermInfosReader.<init> 254 0.02% 99.89% 1 300337 java.lang.Thread.currentThread 255 0.02% 99.91% 1 300342 org.apache.lucene.index.FieldInfos.read 256 0.02% 99.92% 1 300343 org.apache.lucene.store.IndexInput.readString 257 0.02% 99.94% 1 300345 sun.nio.ch.IOUtil.read 258 0.02% 99.96% 1 300347 java.nio.channels.spi.AbstractInterruptibleChannel.begin 259 0.02% 99.98% 1 300348 org.apache.lucene.index.FieldInfos.addInternal 260 0.02% 100.00% 1 300349 java.nio.channels.spi.AbstractInterruptibleChannel.begin CPU SAMPLES END On Nov 5, 2010, at 10:53 AM, Michael McCandless wrote: > Hmm... > > So, I was going on this output from your CheckIndex: > > test: field norms.........OK [296713 fields] > > But in fact I just looked and that number is bogus -- it's always > equal to total number of fields, not number of fields with norms > enabled. I'll open an issue to fix this, but in the meantime can you > apply this patch to your CheckIndex and run it again? > > Index: src/java/org/apache/lucene/index/CheckIndex.java > =================================================================== > --- src/java/org/apache/lucene/index/CheckIndex.java (revision 1031678) > +++ src/java/org/apache/lucene/index/CheckIndex.java (working copy) > @@ -570,8 +570,10 @@ > } > final byte[] b = new byte[reader.maxDoc()]; > for (final String fieldName : fieldNames) { > - reader.norms(fieldName, b, 0); > - ++status.totFields; > + if (reader.hasNorms(fieldName)) { > + reader.norms(fieldName, b, 0); > + ++status.totFields; > + } > } > > msg("OK [" + status.totFields + " fields]"); > > So if in fact you have already disabled norms then something else is > the source of the sudden slowness. Though, such a huge number of > unique field names is not an area of Lucene that's very well tested... > perhaps there's something silly somewhere. Maybe you can try > profiling just the init of your IndexReader? (Eg, run java with > -agentlib:hprof=cpu=samples,depth=16,interval=1). > > Yes, both Index.NOT_ANALYZED_NO_NORMS and Index.NO will disable norms > as long as no document in the index ever had norms on (yes it does > "infect" heh). > > Mike > > On Fri, Nov 5, 2010 at 1:37 PM, Mark Kristensson > <mark.kristens...@smartsheet.com> wrote: >> While most of our Lucene indexes are used for more traditional searching, >> this index in particular is used more like a reporting repository. Thus, we >> really do need to have that many fields indexed and they do need to be >> broken out into separate fields. There may be another way to structure the >> index to reduce the number of fields, but I'm hoping we can optimize the >> current design and avoid (yet another) index redesign. >> >> I'll look into the tweaking the merge policy, but I'm more interested in >> disabling norms because scoring really doesn't matter for us. Basically, we >> need nothing more than a binary answer from Lucene: either a record meets >> the provided criteria (which can be a rather complex boolean query with many >> subqueries) or it doesn't. If the record does match, then we get the IDs >> from lucene and run off to get the live data from our primary data store and >> sort it (in Java) based upon criteria provided by the user, not by score. >> >> After our initial design mushroomed in size, we redesigned and now (I >> thought) do not have norms on any of the fields in this index. So, I'm >> wondering if there was something in the results from the CheckIndex that I >> provided which indicates to you that we may have norms still enabled? I know >> that if you have norms on any one document's field, then any other document >> with that same field will get "infected" with norms as well. >> >> My understanding is that any field that uses the constants >> Index.NOT_ANALYZED_NO_NORMS or Index.NO will not have norms on it, >> regardless of whether or not the field is stored. Is that not correct? >> >> Thanks, >> Mark >> >> >> >> On Nov 4, 2010, at 2:56 AM, Michael McCandless wrote: >> >>> Likely what happened is you had a bunch of smaller segments, and then >>> suddenly they got merged into that one big segment (_aiaz) in your >>> index. >>> >>> The representation for norms in particular is not sparse, so this >>> means the size of the norms file for a given segment will be >>> number-of-unique-indexed-fields X number-of-documents. >>> >>> So this count grows quadratically on merge. >>> >>> Do these fields really need to be indexed? If so, it'd be better to >>> use a single field for all users for the indexable text if you can. >>> >>> Failing that, a simple workaround is to set the maxMergeMB/Docs on the >>> merge policy; this'd prevent big segments from being produced. >>> Disabling norms should also workaround this, though that will affect >>> hit scores... >>> >>> Mike >>> >>> On Wed, Nov 3, 2010 at 7:37 PM, Mark Kristensson >>> <mark.kristens...@smartsheet.com> wrote: >>>> Yes, we do have a large number of unique field names in that index, >>>> because they are driven by user named fields in our application (with some >>>> cleaning to remove illegal chars). >>>> >>>> This slowness problem has appeared very suddenly in the last couple of >>>> weeks and the number of unique field names has not spiked in the last few >>>> weeks. Have we crept over some threshold with our linear growth in the >>>> number of unique field names? Perhaps there is a limit driven by the >>>> amount of RAM in the machine that we are violating? Are there any >>>> guidelines for the maximum number, or suggested number, of unique fields >>>> names in an index or segment? Any suggestions for potentially mitigating >>>> the problem? >>>> >>>> Thanks, >>>> Mark >>>> >>>> >>>> On Nov 3, 2010, at 2:02 PM, Michael McCandless wrote: >>>> >>>>> On Wed, Nov 3, 2010 at 4:27 PM, Mark Kristensson >>>>> <mark.kristens...@smartsheet.com> wrote: >>>>>> >>>>>> I've run checkIndex against the index and the results are below. That >>>>>> net is that it's telling me nothing is wrong with the index. >>>>> >>>>> Thanks. >>>>> >>>>>> I did not have any instrumentation around the opening of the >>>>>> IndexSearcher (we don't use an IndexReader), just around the actual >>>>>> query execution so I had to add some additional logging. What I found >>>>>> surprised me, opening a search against this index takes the same 6 to 8 >>>>>> seconds that closing the indexWriter takes. >>>>> >>>>> IndexWriter opens a SegmentReader for each segment in the index, to >>>>> apply deletions, so I think this is the source of the slowness. >>>>> >>>>> From the CheckIndex output, it looks like you have many (296,713) >>>>> unique fields names on that one large segment -- does that sound >>>>> right? I suspect such a very high field count is the source of the >>>>> slowness... >>>>> >>>>> Mike >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >