I think the word “payload” is confusing me. The client is sending a JSON document. That JSON document has a “content” field which is string-valued and is escaped (stringified) JSON. Correct?
You want to parse that JSON and treat it as additional fields to index? So this content (fragment): "content": "{\"Page\":{\"Id\":\"2ff99d1a-a21b-4391-9c47-af2865acb753\",\"Name\":\"Ronald McDonald House Idaho meals\",\"Url\":\"/blogs/st-lukes/news-and-community/2021/jan/ronald-mcdonald-house-idaho-meals\",\"Date\":\"2022-10-03T12:30:17.3388537\",\"ContentType\":\"Blog\",\"Body\":{\"Fields\":[{\"Name\":\"Heading Background Image\",\"Type\":\"Image\",\"Value\":\"\”},... would add fields like Page, and Id and Name under that? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 28, 2022, at 1:31 PM, Matthew Castrigno <castr...@slhs.org> wrote: > > Hello Walter, > > Thank you for your reply. Yes, it is invalid JSON. However, it is "my" > problem unfortunately. > > I am looking for a way to filter the payload as a character string. > > The charFilter would be great, however for that to work it would have to be > first recognized as a valid field. > > Is there a way in SOLR to process the entire payload in this way so I can > turn it into proper JSON by filtering out the "/" s ? > > Thank you. > From: Walter Underwood <wun...@wunderwood.org <mailto:wun...@wunderwood.org>> > Sent: Monday, November 28, 2022 2:23 PM > To: users@solr.apache.org <mailto:users@solr.apache.org> > <users@solr.apache.org <mailto:users@solr.apache.org>> > Subject: Re: Is there a way to run the entire payload of a request through a > charFilter and not just the fields? > > This Message Is From an External Sender > This message came from outside the St. Luke's email system. > That is invalid JSON. The client needs to fix it. I’m surprised it indexes at > all. This should not be your problem. > > Past that string into this: > https://urldefense.com/v3/__https://jsonlint.com__;!!FkC3_z_N!K2Droj3x11Rpw7SE8FWYfCL5RP-Csp8j-9RRLv1EYypmATl8cteTtGrFKpxLPrxknF9jN0quzVuaiamt$ > > <https://urldefense.com/v3/__https://jsonlint.com__;!!FkC3_z_N!K2Droj3x11Rpw7SE8FWYfCL5RP-Csp8j-9RRLv1EYypmATl8cteTtGrFKpxLPrxknF9jN0quzVuaiamt$> > > wunder > Walter Underwood > wun...@wunderwood.org <mailto:wun...@wunderwood.org> > https://urldefense.com/v3/__http://observer.wunderwood.org/__;!!FkC3_z_N!K2Droj3x11Rpw7SE8FWYfCL5RP-Csp8j-9RRLv1EYypmATl8cteTtGrFKpxLPrxknF9jN0quzRPZK7sh$ > > <https://urldefense.com/v3/__http://observer.wunderwood.org/__;!!FkC3_z_N!K2Droj3x11Rpw7SE8FWYfCL5RP-Csp8j-9RRLv1EYypmATl8cteTtGrFKpxLPrxknF9jN0quzRPZK7sh$> > (my blog) > > > On Nov 28, 2022, at 12:57 PM, Matthew Castrigno <castr...@slhs.org > > <mailto:castr...@slhs.org>> wrote: > > > > Hello Mikhail, > > > > I have to work with the payload as is, I cannot modify it. My entire > > solution has a lot of other things going on which would just confuse the > > discussion. > > > > The issue I am having can be recreated using the update handler with the > > script enabled (as shown in the documentation example) and json.command > > set to false. > > > > Solr does not recognize a field with escape characters "\" > > > > Here is a much smaller payload that demonstrates the issue: > > > > from the script: > > doc = cmd.solrDoc; // org.apache.solr.common.SolrInputDocument > > logger.warn(doc.toString()); > > > > sending this payload: > > > > {"partner":"88027688-62c4-459a-b4d5-a8ecf9edd1bf","command":"add","doc_id":"2ff99d1a-a21b-4391-9c47-af2865acb753","content":"Page\"} > > > > results in this output in the console, notice the "content" field is not > > listed. Solr cannot parse this part of the payload, it simply ignores it. > > > > SolrInputDocument(fields: > > [partner=88027688-62c4-459a-b4d5-a8ecf9edd1bf,​ command=add,​ > > doc_id=2ff99d1a-a21b-4391-9c47-af2865acb753]) > > > > I am trying to find a way to filter out these escape characters so Solr, > > specificaly, org.apache.solr.common.SolrInputDocument, will recognize the > > fields that have them. > > > > Thank you. > > ________________________________ > > From: Mikhail Khludnev <m...@apache.org <mailto:m...@apache.org>> > > Sent: Monday, November 28, 2022 1:07 PM > > To: users@solr.apache.org <mailto:users@solr.apache.org> > > <users@solr.apache.org <mailto:users@solr.apache.org>> > > Subject: Re: Is there a way to run the entire payload of a request through > > a charFilter and not just the fields? > > > > Hello, It's still not clear. Which update request params (or curl) you use? > > What if you put content as a tiny string, and then complicate it step by > > step? On Mon, Nov 28, 2022 at 7: 27 PM Matthew Castrigno <castrigm@ slhs. > > org> wrote: > > > ZjQcmQRYFpfptBannerStart > > This Message Is From an External Sender > > This message came from outside the St. Luke's email system. > > > > ZjQcmQRYFpfptBannerEnd > > > > Hello, > > It's still not clear. Which update request params (or curl) you use? What > > if you put content as a tiny string, and then complicate it step by step? > > > > On Mon, Nov 28, 2022 at 7:27 PM Matthew Castrigno <castr...@slhs.org > > <mailto:castr...@slhs.org>> wrote: > > > >> Hi Mikhail, > >> > >> Thank you for your response. I am currently using the script update > >> processor, but I have not been able to access the entire payload for > >> processing. cmd.solrDoc is not correctly reading the payload. I have a > >> payload where it is not recognizing a field value. This payload had four > >> fields, the last one is "content" but if I do this: > >> doc = cmd.solrDoc; > >> logger.warn(doc.toString()); > >> The content field is not shown. > >> > >> I want to filter that field an remove the quotes, so it is recognized as > >> additional JSON for me to process. > >> > >> logger output: > >> SolrInputDocument(fields: > >> [partner=88027688-62c4-459a-b4d5-a8ecf9edd1bf,​ command=add,​ > >> doc_id=2ff99d1a-a21b-4391-9c47-af2865acb753]) > >> Here is the payload: > >> { > >> "partner": "88027688-62c4-459a-b4d5-a8ecf9edd1bf", > >> "command": "add", > >> "doc_id": "2ff99d1a-a21b-4391-9c47-af2865acb753", > >> "content": > >> "{\"Page\":{\"Id\":\"2ff99d1a-a21b-4391-9c47-af2865acb753\",\"Name\":\"Ronald > >> McDonald House Idaho > >> meals\",\"Url\":\"/blogs/st-lukes/news-and-community/2021/jan/ronald-mcdonald-house-idaho-meals\",\"Date\":\"2022-10-03T12:30:17.3388537\",\"ContentType\":\"Blog\",\"Body\":{\"Fields\":[{\"Name\":\"Heading > >> Background Image\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"Tile Wide > >> Image\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"Specialties\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"Blog > >> Post Name\",\"Type\":\"Single-Line Text\",\"Value\":\"Ronald McDonald > >> House, St. Luke’s Children’s find new ways to help > >> families\"},{\"Name\":\"Blog Summary\",\"Type\":\"Rich > >> Text\",\"Value\":\"\"},{\"Name\":\"Share Summary\",\"Type\":\"Multi-Line > >> Text\",\"Value\":\"\"},{\"Name\":\"ShareTitle\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Ronald McDonald House, St. Luke’s Children’s find new > >> ways to help families\"},{\"Name\":\"Blog Post > >> Date\",\"Type\":\"Datetime\",\"Value\":\"2021-01-18T10:10:00Z\"},{\"Name\":\"Heading\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Better Together\"},{\"Name\":\"Rss > >> Link\",\"Type\":\"General > >> Link\",\"Value\":\"\"},{\"Name\":\"Providers\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"Main > >> Blog Image Caption\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"\"},{\"Name\":\"Procedures and > >> Treatments\",\"Type\":\"Multilist\",\"Value\":\"\"},{\"Name\":\"Special > >> Services\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"Page > >> Title\",\"Type\":\"Single-Line Text\",\"Value\":\"Ronald McDonald House, > >> St. Luke’s Children’s finding new ways to help > >> families\"},{\"Name\":\"ShareImage\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"Share > >> Image\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"Channels\",\"Type\":\"Multilist\",\"Value\":\"Better > >> Together\"},{\"Name\":\"Blog > >> Tags\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"TileHeadline\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Ronald McDonald House, St. Luke’s Children’s find new > >> ways to help families\"},{\"Name\":\"Include in > >> Sitemap\",\"Type\":\"Checkbox\",\"Value\":\"1\"},{\"Name\":\"Facilities\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"TileImage\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"Tile > >> Category\",\"Type\":\"Droptree\",\"Value\":\"Blog > >> Post\"},{\"Name\":\"BreadcrumbTitle\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Ronald McDonald House, St. Luke’s Children’s find new > >> ways to help > >> families\"},{\"Name\":\"Author\",\"Type\":\"Droplink\",\"Value\":\"{E9CF1FC9-EF41-4B6F-9D78-F206A5997A84}\"},{\"Name\":\"Restricted > >> To Pages\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"Meta > >> Keywords\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"\"},{\"Name\":\"Associated Content > >> Type\",\"Type\":\"Droptree\",\"Value\":\"Blog Post\"},{\"Name\":\"Main Blog > >> Image\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"TileSummary\",\"Type\":\"Rich > >> Text\",\"Value\":\"\"},{\"Name\":\"Meta Description\",\"Type\":\"Multi-Line > >> Text\",\"Value\":\"\"},{\"Name\":\"Health > >> Topics\",\"Type\":\"Multilist\",\"Value\":\"\"},{\"Name\":\"Icon\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"NavigationTitle\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"St. Luke’s Blogs\"},{\"Name\":\"Include in Search > >> Index\",\"Type\":\"Checkbox\",\"Value\":\"1\"},{\"Name\":\"Conditions\",\"Type\":\"Treelist\",\"Value\":\"\"},{\"Name\":\"Heading > >> Sub Text\",\"Type\":\"Single-Line Text\",\"Value\":\"Highlights from St. > >> Luke’s and our community partners to improve > >> health.\"},{\"Name\":\"typeaheadRollupCat\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"\"},{\"Name\":\"BlogPostYear\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"2021\"},{\"Name\":\"AuthorName\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Anna > >> Fritz\"},{\"Name\":\"BlogCategory\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"News and > >> Community\"}],\"Modules\":{\"Fields\":[{\"Name\":\"Content\",\"Type\":\"Rich > >> Text\",\"Value\":\"\"},{\"Name\":\"Image > >> Position\",\"Type\":\"Droptree\",\"Value\":\"Right\"},{\"Name\":\"Image > >> Source\",\"Type\":\"Image\",\"Value\":\"\"},{\"Name\":\"Image > >> Content\",\"Type\":\"Rich Text\",\"Value\":\"<p>For more than three > >> decades, Ronald McDonald House Charities of Idaho has provided housing to > >> families with children seeking medical care.</p>\\n<p>It also has found new > >> ways to help during the COVID-19 era. </p>\\n<p>When the novel coronavirus > >> gained a foothold in Idaho in March 2020, the organization had to put > >> safety first and made the tough decision to temporarily stop accepting new > >> families not already staying at its new Boise house. Instead, the > >> organization paid for hotel rooms for families it could not > >> accommodate. </p>\\n<p>“When everything happened, because we had to > >> pull back services, we were trying to look for other ways we could help > >> families,” said Taylor Munson, communications manager at Ronald McDonald > >> House Charities of Idaho. </p>\\n<p>“They are obviously already in a > >> stressful situation with a sick child, but the pandemic amplified that > >> because there is even more unknown now.”</p>\\n<p>So, how could their staff > >> keep serving families with kids in need? </p>\\n<p>The team at RMHCI > >> decided to start assembling lunch boxes filled with meals for families with > >> kids at St. Luke’s Children’s Hospital. </p>\\n<p>Since March, the staff > >> has provided 4,770 meals to families at St. Luke’s.</p>\\n<p>“The lunches > >> provided by the Ronald McDonald House have been a true blessing for our > >> families in pediatrics, the pediatric ICU and the newborn ICU,” said Sherry > >> Iverson, director of patient and family services at St. Luke’s > >> Children’s.</p>\\n<p>“Being at the bedside of their children of all ages is > >> top priority for parents and remembering to take care of themselves is > >> easily forgotten. These lunches carefully assembled by the Ronald McDonald > >> team and then delivered to their room provide a break, healthy food and a > >> chance to reenergize during a very stressful time.” </p>\\n<p>An additional > >> 920 meals have been provided by RMHCI to families with children receiving > >> care at Saint Alphonsus Health System. </p>\\n<p>The Ronald McDonald staff > >> provides the lunch boxes four days a week, typically including sandwiches, > >> fruit and chips, as well as snack bags. The total cost of the lunches so > >> far has been about $24,000.</p>\\n<p>“Without these wonderful care > >> packages, many parents would go all day without food,” Iverson said. “This > >> partnership has been so important during this COVID > >> pandemic.”</p>\\n<p>Some of the food items are donated from local > >> organizations, while others are purchased by staff and assembled in the > >> kitchen at the new Ronald McDonald House facility, near the St. Luke’s > >> Boise Medical Center. </p>\\n<p>St. Luke’s employees pick up the meals and > >> take them across the street to the children’s hospital.</p>\\n<p>“The > >> feedback that we’ve gotten from families and nurses and people over at St. > >> Luke’s is that it’s so helpful because families either may not have money > >> to get food or they don’t want to leave their child’s bedside,” Munson > >> said.</p>\\n<p>The Ronald McDonald House started accepting new families > >> again at its facility in May 2020. St. Luke’s Children’s Hospital is the > >> only children’s hospital in Idaho, which has led to a strong partnership > >> between the medical center and nearby Ronald McDonald House. </p>\\n<p>“It > >> has been very collaborative with St. Luke’s. We wanted to make sure what we > >> were going to be doing was beneficial,” Munson said. </p>\\n<p>The program > >> will continue through the end of March, marking one full year of providing > >> meals, and then the Ronald McDonald House staff will reevaluate for short- > >> and long-term plans, Munson said. </p>\\n<p>“The pandemic obviously isn’t > >> ideal, but it did allow us find new ways of helping families,” Munson said. > >> “A lot of our focus is family centered care—that’s really our goal, and > >> feeding families is a big part of > >> that.”</p>\\n<p><br>\\n</p>\"},{\"Name\":\"Channel\",\"Type\":\"Droptree\",\"Value\":\"Better > >> Together\"},{\"Name\":\"Heading\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Better Together\"},{\"Name\":\"Number to > >> Display\",\"Type\":\"Integer\",\"Value\":\"4\"},{\"Name\":\"Related > >> Item\",\"Type\":\"Droptree\",\"Value\":\"St Lukes Childrens > >> Hospital\"},{\"Name\":\"Heading\",\"Type\":\"Single-Line > >> Text\",\"Value\":\"Related > >> Hospital\"}]}},\"Facets\":[\"Blogs\",\"Article\"],\"Title\":\"Ronald > >> McDonald House, St. Luke’s Children’s find new ways to help > >> families\",\"Summary\":\"\"}}" > >> } > >> > >> ________________________________ > >> From: Mikhail Khludnev <m...@apache.org <mailto:m...@apache.org>> > >> Sent: Saturday, November 26, 2022 1:28 PM > >> To: users@solr.apache.org <mailto:users@solr.apache.org> > >> <users@solr.apache.org <mailto:users@solr.apache.org>> > >> Subject: Re: Is there a way to run the entire payload of a request through > >> a charFilter and not just the fields? > >> > >> Hi Matthew. Can it be https: //urldefense. com/v3/__https: //solr. apache. > >> org/guide/solr/latest/configuration-guide/script-update-processor. > >> html__;!!FkC3_z_N!ON6B9iNNwK7AkdwAKGpLzLAzNKXR4m8SIom95HENXZNK381f6vhLlbAf5l7Z2mpVNUNJWAP2dw$ > >> ? On Sat, > >> ZjQcmQRYFpfptBannerStart > >> This Message Is From an External Sender > >> This message came from outside the St. Luke's email system. > >> > >> ZjQcmQRYFpfptBannerEnd > >> > >> Hi Matthew. > >> Can it be > >> > >> https://urldefense.com/v3/__https://solr.apache.org/guide/solr/latest/configuration-guide/script-update-processor.html__;!!FkC3_z_N!ON6B9iNNwK7AkdwAKGpLzLAzNKXR4m8SIom95HENXZNK381f6vhLlbAf5l7Z2mpVNUNJWAP2dw$ > >> <https://urldefense.com/v3/__https://solr.apache.org/guide/solr/latest/configuration-guide/script-update-processor.html__;!!FkC3_z_N!ON6B9iNNwK7AkdwAKGpLzLAzNKXR4m8SIom95HENXZNK381f6vhLlbAf5l7Z2mpVNUNJWAP2dw$%3E%3E>? > >> > >> On Sat, Nov 26, 2022 at 1:15 AM Matthew Castrigno <castr...@slhs.org > >> <mailto:castr...@slhs.org>> > >> wrote: > >> > >>> I need to filter out some characters in a payload so that SOLR will > >>> recognize the payload as a JSON document. > >>> > >>> The solr.MappingCharFilterFactory functionality is what I need but I need > >>> to run over the entire payload and not just the fields. > >>> > >>> I cannot change the payload prior to submitting to SOLR. > >>> > >>> Is there any way to accomplish this? > >>> > >>> Any insights are most appreciated. > >>> > >>> Thank you. > >>> > >>> ---------------------------------------------------------------------- > >>> "This message is intended for the use of the person or entity to which it > >>> is addressed and may contain information that is confidential or > >>> privileged, the disclosure of which is governed by applicable law. If the > >>> reader of this message is not the intended recipient, you are hereby > >>> notified that any dissemination, distribution, or copying of this > >>> information is strictly prohibited. If you have received this message by > >>> error, please notify us immediately and destroy the related message." > >>> > >> > >> > >> -- > >> Sincerely yours > >> Mikhail Khludnev > >> > >> > >> ---------------------------------------------------------------------- > >> "This message is intended for the use of the person or entity to which it > >> is addressed and may contain information that is confidential or > >> privileged, the disclosure of which is governed by applicable law. If the > >> reader of this message is not the intended recipient, you are hereby > >> notified that any dissemination, distribution, or copying of this > >> information is strictly prohibited. If you have received this message by > >> error, please notify us immediately and destroy the related message." > >> > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > > > > > ---------------------------------------------------------------------- > > "This message is intended for the use of the person or entity to which it > > is addressed and may contain information that is confidential or > > privileged, the disclosure of which is governed by applicable law. If the > > reader of this message is not the intended recipient, you are hereby > > notified that any dissemination, distribution, or copying of this > > information is strictly prohibited. If you have received this message by > > error, please notify us immediately and destroy the related message." > > "This message is intended for the use of the person or entity to which it is > addressed and may contain information that is confidential or privileged, the > disclosure of which is governed by applicable law. If the reader of this > message is not the intended recipient, you are hereby notified that any > dissemination, distribution, or copying of this information is strictly > prohibited. If you have received this message by error, please notify us > immediately and destroy the related message."