I am using WGET to download content from the www with ---save-header option. The save-header option saves the hppt header to the downloaded files. Does Lucene make use of content type while indexing or I have to parse the header , determine the content-type and determine the right set of actions to do ?
Thanks !