Steve, i use your idea it works for me great,once again i say thanks to you.But when i use (Index.No_NORMS ) it increase the size in the same time when i use(Index.TOKENIZED)it will reduce the size.
i use the code given by you BigInteger _bi = new java.math.BigInteger("9198408365809", 10); System.out.println(_bi.toString(36)); other RADIX increase the size. Modifications I made in my code is below: String outgoingNumber="9198408365809"; String incomingNumber="9840861114"; String datesc="070601"; String imsiNumber="444021365987"; String callType="1"; String outgoingRoute="DJZ01" ; String incomingRoute="BSC01"; BigInteger _on = new java.math.BigInteger(outgoingNumber, 10); String compOutgoingNumber= _on.toString(36); BigInteger _in = new java.math.BigInteger( incomingNumber, 10); String compIncomingNumber= _in.toString(36); BigInteger _ds = new java.math.BigInteger(dateSc, 10); String compDateSc= _ds.toString(36); BigInteger _im = new java.math.BigInteger(imsiNumber, 10); String compImsiNumber= _im.toString(36); String contents(compOutgoingNumber+" "+compIncomingNumber+" "+compDateSc+" "+compImsiNumber+callTYpe); String records=((compOutgoingNumber+" "+compIncomingNumber+" "+compDateSc+ " " +outgoingRoute+" "+incomingRoute); File indexDir = new File("/home/Mediation/Index"); IndexWriter indexWriter =new IndexWriter(indexDir, new StandardAnalyzer(), true); Document doc=new Document(); doc.add("contents",contents,Field.Store.NO,Field.Index.TOKENIZED); doc.add("records",records,Field.Store.YES ,Field.Index.No); indexWriter.addDocument(document); please help me to acheive that Sebastin wrote: > > Hi Steve, > thanks for your reply a lot.its now compress upto 50% of the original > size.is there any other possiblity using this code compress upto 80%. > > Steve Liles wrote: >> >> Compression aside you could index the "contents" as terms in separate >> fields instead of tokenized text, and disable storing of norms: >> >> String outgoingNumber="9198408365809"; >> String incomingNumber="9840861114"; >> >> _doc.add(new Field("outgoingNumber", outgoingNumber, Store.NO, >> Index.NO_NORMS)); >> _doc.add(new Field("incomingNumber", incomingNumber, Store.NO, >> Index.NO_NORMS)); >> >> According to the docs "Index.NO_NORMS" will save you one byte per >> document in the index. >> >> Or you could index all of the data as separate terms in the same >> "contents" field if you wanted (make the first param "contents" for all >> of the terms), which is more comparable to what you are currently doing. >> (Another advantage is that the Analyzer will not be used for fields >> which are untokenized, and indexing should be faster.) >> >> ... >> >> One way to compress numerical data (possibly not the best - i'm no >> expert) is to change the base of the number that is indexed / stored in >> the index. >> >> java.lang.Long and java.math.BigInteger have methods for converting from >> one radix to another. Taking your "outgoingNumber" as an example: >> >> //compression >> BigInteger _bi = new java.math.BigInteger("9198408365809", 10); >> System.out.println(_bi.toString(36)); >> >> > 39douufap >> >> //decompression >> BigInteger _bi = new java.math.BigInteger("39douufap", 36); >> System.out.println(_bi.toString(10)); >> >> >9198408365809 >> >> Converting to a higher radix will give you better compression but you'll >> have to do it yourself as the jdk classes only work up to base 36 >> <http://en.wikipedia.org/wiki/Base_36>. >> >> It's worth compressing your unstored "contents" field as well as your >> stored "records" field, as the unique terms in the "contents" field will >> effectively be stored. >> >> Also don't forget to convert the terms when you search too, otherwise >> you won't find anything ;) >> >> Steve. >> >> >> Sebastin wrote: >>> When i use the standardAnalyzer storage size increases.how can i >>> minimize >>> index store >>> >>> Sebastin wrote: >>> >>>> >>>> String outgoingNumber="9198408365809"; >>>> String incomingNumber="9840861114"; >>>> String datesc="070601"; >>>> String imsiNumber="444021365987"; >>>> String callType="1"; >>>> >>>> //Search Fields >>>> String contents=(outgoingNumber+" "+incomingNumber+" "+dateSc+" >>>> "+imsiNumber+" "+callType ); >>>> >>>> //Display Fields >>>> >>>> String records=(callingPartyNumber+" >>>> "+calledPartyNumber+" "+dateSc+" "+chargDur+" "+incomingRoute+" >>>> "+outgoingRoute+" "+timeSc); >>>> >>>> >>>> IndexWriter indexWriter = new >>>> IndexWriter(indexDir,new StandardAnalyzer(),true); >>>> >>>> Document document = new Document(); >>>> >>>> document.add(new >>>> Field("contents",contents,Field.Store.NO,Field.Index.TOKENIZED)); >>>> >>>> >>>> >>>> document.add(new >>>> Field("records",records,Field.Store.YES,Field.Index.NO)); >>>> >>>> >>>> indexWriter.setUseCompoundFile(true); >>>> indexWriter.addDocument(document); >>>> } >>>> >>>> please help me to acheive the minimum size >>>> >>>> >>>> >>>> >>>> >>>> Erick Erickson wrote: >>>> >>>>> Show us the code you use to index. Are you storing the fields? >>>>> omitting norms? Throwing out stop words? >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On 6/19/07, Sebastin <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> Hi Does anyone give me an idea to reduce the Index size to down.now i >>>>>> am >>>>>> getting 42% compression in my index store.i want to reduce upto 70%.i >>>>>> use >>>>>> standardanalyzer to write the document.when i use SimpleAnalyzer it >>>>>> reduce >>>>>> upto 58% but i couldnt search the document.please help me to acheive. >>>>>> >>>>>> Thanks in advance >>>>>> >>>>>> Jeff-188 wrote: >>>>>> >>>>>>>> I found that reducing my index from 8G to 4G (through not stemming) >>>>>>>> >>>>>> gave >>>>>> me >>>>>> >>>>>>> about a 10% performance improvement. >>>>>>> >>>>>>> How did you do this? I don't see this as an option. >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://www.nabble.com/ways-to-minimize-index-size--tf3401213.html#a11195406 >>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > -- View this message in context: http://www.nabble.com/ways-to-minimize-index-size--tf3401213.html#a11253761 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]