On Fri, Sep 8, 2023 at 11:39 AM Dominique Devienne <ddevie...@gmail.com> wrote:
> On Thu, Sep 7, 2023 at 10:22 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Erik Wienhold <e...@ewie.name> writes: >> > Looks like "Huge input lookup" as reported in [1] (also from Sai) and >> that >> > error is from libxml. >> >> Ah, thanks for the pointer. It looks like for the DOCUMENT case, >> we could maybe relax this restriction by passing the XML_PARSE_HUGE >> option to xmlCtxtReadDoc(). However, there are things to worry about: >> > > Just a remark from the sidelines, from someone having done a fair bit of > XML in years past. > > That XPath is simple, and a streaming parser (SAX or StAX) could handle > it. While that > XML_PARSE_HUGE option probably applies to a DOM parser. So is there a > work-around > to somehow force using a streaming parser instead of one that must produce > the whole Document, > just so a few elements are picked out of it? FWIW. --DD > If push comes to shove, the streaming-based extraction can be done outside the DB, stored in a new column or table, and index that instead. This is in fact exactly the approach I took on one server handling XML I wrote. To be honest, in my case, the XMLs were never large, so I used rapidxml which is also a DOM parser, but the same principle applies though, i.e. extract the data from the XML outside the DB using SAX (push) / StAX (pull), to avoid having a (too) large document in memory at any time (client or server side). --DD