50% completed...
I managed to map the pages, and the position of the cut and capture content
properly. Now we need to navigate back and capture the topics and subtopics.
Ok...thanks...
2014-07-06 12:56 GMT-03:00 Erick Erickson :
> This isn't a Solr problem, but a PDF problem. The Tika
> projec
Hello list,
We have a fairly large Lucene database for a 30+ million post forum.
Users post and search for all kinds of things. To make sure users don't
have to type exact matches, we combine a WordDelimiterFilter with a
(Dutch) SnowballFilter.
Unfortunately users sometimes find examples of
I am trying to understand why I am seeing very small segment sizes during
indexing. I am using elasticsearch and one node sees heavy merge activity.
After enabling info stream logs it seems that the node is doing more, smaller
merges than the other nodes. In the TMP logs, I see a lot of merges o
This isn't a Solr problem, but a PDF problem. The Tika
project is what's used to extract the PDF info, including
a bunch of metadata.
Tika uses PDFBox, which at least allows you to
extract a page at a time and maybe much more (I just
barely looked at the interface)...
You can use Tika from a Java
I'm building a new system where I will have several pdf files.
The content you will have to have in my indexes are:
1. Name
2. No. of Pages
3. Data File
4. Archive
When I run the search by the system, I will be typing full names that are
stored within the file in the index, then I need that syste
Hi Smitha,
You need to have your own custom analyzer which breaks the word by - or
_. Use the same analyzer for indexing and searching.
Regards
Aditya
www.findbestopensource.com
On 7/4/2014 11:41 AM, Smitha Kuldeep (smtt) wrote:
Hello team,
We are using lucen-core-2.9.1.jar for indexing and