Obviously, there's lots we don't know about your system and your plans, but the narrow view your email gives us looks like you may misunderstand the nature of Solr. Solr is a search index, and its primary function is to help you FIND your data based on the text, or other data (i.e. spatial data) it contains, and do calculations (relevancy ranking, counts, facets, analytics etc) relating to the documents found. Storage of data is a secondary mission for Solr.
If storage is your main mission, you might step back and ask the question of whether or not Solr is the right tool for the job. Furthermore, storing a "full" web page for most web pages since the late 1990's involves storing several files (html, css, javascript, images, etc). If your main goal is storage, Databases (RDBMS or NoSQL) are usually better destinations for the content, offering better features with respect to transactions, data normalization, backups, etc. If you store the content in a database, you can still index the stored pages with solr, and add a field that stores a database id (or several ids, one for each file) for retrieval. -Gus On Mon, May 12, 2025 at 9:52 AM Thomas Corthals <tho...@klascement.net> wrote: > Can you include an example of the content you index and the result you're > seeing? > > > If you index this: > > > <html lang="en"><title>test</title></html> > > > And it looks like this in the raw result: > > > "<html lang=\"en\"><title>test<\/title><\/html>" > > > That's just the escaping that needs to be done for JSON. It's applied on > the response data before sending it off, not stored like that in the index. > Any tool that decodes that JSON result will be working on the string as it > was indexed. > > > If it's something else, please also include the relevant field definition > from your Solr schema so we can see what's going on there. > > > Thomas > > Op zo 11 mei 2025 om 20:59 schreef anon anon <anonimoussech...@gmail.com>: > > > Hello. > > > > I want to store a FULL web page including tags in it full original > content. > > It is for a cyber security tool. > > > > Once visiting at http://localhost:8984/solr/#/MYCOLLECTION/query?q=*:* > , I > > see escaping (that I do not wish), I still do not see escape when > browsing > > at : http://localhost:8984/solr/MYCOLLECTION/select?q=* and then when I > > curl with curl "http://localhost:8984/solr/MYCOLLECTION/select?q=*" I > > still > > have the escape. > > > > How to store html in a non escaped result please? > > > > Best regards. > > > -- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)