Obviously, there's lots we don't know about your system and your plans, but
the narrow view your email gives us looks like you may misunderstand the
nature of Solr. Solr is a search index, and its primary function is to help
you FIND your data based on the text, or other data (i.e. spatial data) it
contains, and do calculations (relevancy ranking, counts, facets, analytics
etc) relating to the documents found. Storage of data is a secondary
mission for Solr.

If storage is your main mission, you might step back and ask the question
of whether or not Solr is the right tool for the job. Furthermore, storing
a "full" web page for most web pages since the late 1990's involves storing
several files (html, css, javascript, images, etc).

If your main goal is storage, Databases (RDBMS or NoSQL) are usually better
destinations for the content, offering better features with respect to
transactions, data normalization, backups, etc.

If you store the content in a database, you can still index the stored
pages with solr, and add a field that stores a database id (or several ids,
one for each file) for retrieval.

-Gus

On Mon, May 12, 2025 at 9:52 AM Thomas Corthals <tho...@klascement.net>
wrote:

> Can you include an example of the content you index and the result you're
> seeing?
>
>
> If you index this:
>
>
> <html lang="en"><title>test</title></html>
>
>
> And it looks like this in the raw result:
>
>
> "<html lang=\"en\"><title>test<\/title><\/html>"
>
>
> That's just the escaping that needs to be done for JSON. It's applied on
> the response data before sending it off, not stored like that in the index.
> Any tool that decodes that JSON result will be working on the string as it
> was indexed.
>
>
> If it's something else, please also include the relevant field definition
> from your Solr schema so we can see what's going on there.
>
>
> Thomas
>
> Op zo 11 mei 2025 om 20:59 schreef anon anon <anonimoussech...@gmail.com>:
>
> > Hello.
> >
> > I want to store a FULL web page including tags in it full original
> content.
> > It is for a cyber security tool.
> >
> > Once visiting at http://localhost:8984/solr/#/MYCOLLECTION/query?q=*:*
> , I
> > see escaping (that I do not wish), I still do not see escape when
> browsing
> > at : http://localhost:8984/solr/MYCOLLECTION/select?q=* and then when I
> > curl with curl "http://localhost:8984/solr/MYCOLLECTION/select?q=*"; I
> > still
> > have the escape.
> >
> > How to store html in a non escaped result please?
> >
> > Best regards.
> >
>


-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Reply via email to