On Jun 23, 2005, at 10:17 AM, Ulrich Schinz wrote:
Field.Text(String, Reader) is not a stored field. This is why doc.get("contents") is empty.



ok, i read that in javadoc of lucene... in dont understand what Field.Text(String,Reader,boolean) does... if i set boolean to true,
what is the stortermvector??

Term vectors is additional index storage allowing you to find out how many of each term occurred in a field. For your purposes, you don't need to use that feature.

You have some options... change to using a stored field by reading the file contents into a String and using Field.Text(String, String) instead. Or, when rendering the results, go directly to the file pointed to by doc.get("filename") and read its contents then. There are pros/cons to both of these approaches.



ok, i started to try this... but i also try to index pdf-files.. so i get an InputStream from pdftotext. if i try to convert that to a String it takes really long time,
and we have a lot of data to index....
i tried different ways to get that done:
1.
String ret = "";
InputStream is=null;
String[] cmd = {"/usr/bin/pdftotext", "test.pdf", "-"};
byte[] buffer = new byte[80];
child = Runtime.getRuntime().exec(cmd);
is = child.getInputStream();
BufferedInputStream bis = new BufferedInputStream(is,80);
while(next != -1){
++t;
next = bis.read(buffer,bis.pos, 80);
String input = new String(buffer,0,next);
ret += input;
}

not really that way, but conceptual (in real it compiles :-) )
2.
String ret = "";
InputStream is=null;
String[] cmd = {"/usr/bin/pdftotext", "test.pdf", "-"};
byte[] buffer = new byte[80];
child = Runtime.getRuntime().exec(cmd);
is = child.getInputStream();
while(next != -1){
++t;
next = is.read();
ret += String(next);
}

but those versions are both really slow... it takes me more than 20 minutes (minimum) to get a pdf file of size 900 k...
is there a way to get that faster???

You should consider using PDFBox for reading PDF files.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to