[
https://issues.apache.org/jira/browse/SOLR-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064927#comment-16064927
]
Cassandra Targett commented on SOLR-10299:
------------------------------------------
Parsing the raw content is one approach that might be successful. Indexing the
generated HTML is another option. Seeing what happens with {{bin/post}} on the
HTML files would be another simple experiment to try. I'm not sure it would be
preferable, but will reflect what end-users see. We don't do this yet, but
someday we will have raw content files that do not stand alone but are snippets
included inside another file that together become a single HTML page.
The harder questions IMO are going to be how to integrate it with the CMS,
keeping the index up to date, the facet options, the end-user UI, etc.
bq. One thing that might help in the short term could be enabling fuzzy search
mentioned on https://github.com/christian-fei/Simple-Jekyll-Search ? the
search.json file we have doesn't mention it and the docs doesn't specify
whether it is true or false by default
As I've mentioned a few times to the list(s), we're currently using a
JavaScript to generate the title title-keyword approach that's in use now. That
doesn't come from Jekyll, but from an open-source Jekyll theme that I borrowed
for the basic layout of the pages. That Javascript _can_ index the body when
it's generated, but the author of it notes in his documentation that it can
cause problems. I never had time to try it to see what these problems are so I
can't speak to it being a satisfactory stopgap - I'll guess, though, that the
problems are related to performance, relevance, and proper parsing of text
(only, you know, all the problems that we know plague inadequate attempts at
full-text search).
If you are interested, though, here are the docs for the keyword lookup that's
currently in place:
http://idratherbewriting.com/documentation-theme-jekyll/mydoc_search_configuration.html.
You will see immediately the similarities between that site and ours ;)
I have seen the Simple-Jekyll-Search project early on, but I suspect it's going
to be also inadequate for similar reasons the current JavaScript solution is
inadequate. Since the theme I used already had a JavaScript-based lookup, I
didn't bother to investigate another solution in favor of other issues that
needed to be dealt with. Perhaps it's worth a look, I'm not sure.
By the way, the title-keyword lookup was 100% intended as *the* stopgap
solution. I knew it would be unsatisfactory, but I also know that despite all I
know of Solr, I cannot carry the majority of the weight to make this feature
happen.
> Provide search for online Ref Guide
> -----------------------------------
>
> Key: SOLR-10299
> URL: https://issues.apache.org/jira/browse/SOLR-10299
> Project: Solr
> Issue Type: Sub-task
> Security Level: Public(Default Security Level. Issues are Public)
> Components: documentation
> Reporter: Cassandra Targett
>
> The POC to move the Ref Guide off Confluence did not address providing
> full-text search of the page content. Not because it's hard or impossible,
> but because there were plenty of other issues to work on.
> The current HTML page design provides a title index, but to replicate the
> current Confluence experience, the online version(s) need to provide a
> full-text search experience.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]