https://github.com/epugh/playing-with-solr-streaming-expressions/tree/master/streaming_expressions/src/main/java/com/o19s/solr/streaming has an example of parsing JSONL formatted docs and an example of using atomic updates ;-)
https://github.com/epugh/playing-with-solr-streaming-expressions/blob/interact_with_tika_server/streaming_expressions/src/main/java/com/o19s/solr/streaming/SpaCyStream.java is an example of interacting with SpaCy ;-) > On Dec 29, 2021, at 10:01 AM, Damiano Albani <damiano.alb...@gmail.com> wrote: > > Hi Eric, > > Thanks for your feedback, I highly appreciate it. > I don't mind going the route of implementing something myself. I will have > a try. > By any chance, apart from looking at the official codebase, do you know of > any examples out there I could draw my inspiration from? > > Regards, > > On Wed, Dec 29, 2021 at 3:08 PM Eric Pugh <ep...@opensourceconnections.com > <mailto:ep...@opensourceconnections.com>> > wrote: > >> Damiano, I don’t really have a direct answer for you. However, one of >> the aspects of Streaming that I really like is that it’s relatively easy to >> create your own operators and add them to Solr. I find that I often just >> create my own operator to fill in the gap of what is available. >> >> I do think joining disparate datasets to make new datasets is one of the >> most interesting uses of Streaming, so would love to see what you cook up. >> >> Eric >> >>> On Dec 29, 2021, at 6:39 AM, Damiano Albani <damiano.alb...@gmail.com> >> wrote: >>> >>> Hello, >>> >>> I'm new to streaming expressions, so I'm trying to understand their >>> features and limitations. >>> In particular the so-called "stream operators" implementing join >> operations. >>> Like "innerJoin", "leftOuterJoin", etc. >>> >>> I see that they support a "on" parameter, defining the *equality* check >> to >>> be performed. >>> But, coming from the SQL world, I'm used to being able to use a variety >> of >>> comparison operators in join predicates. That is, not only equality, as >> in >>> "equi-joins". >>> >>> Is there a reason why the current implementation of Solr supports >>> equi-joins only? Would it be technically possible (and desired) to >> support >>> other comparison operators with joins? >>> And maybe somehow allow the use of the available stream evaluators >>> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>? >>> >>> To give the context of my question: I'm trying to join 2 sets of >> documents >>> with a hierarchical relationship. >>> My goal is to join them using a "path" field on one side and >>> "descendent_path" field on the other side. >>> But it looks like that only doc values are accessible (and not analyzed >>> ones) in streams, so I suppose I'd be left with a join criteria like this >>> pseudo-code: >>> >>>> on="starts_with(right.path, left.path)" >>> >>> Where, in this hypothetical example: >>> >>>> left.path=/categories/category1" >>>> right.path=/categories/category1/sub-categories/sub-category-a" >>> >>> >>> Or do I completely misunderstand how Solr (streams) work? ;-) >>> Thanks for your help! >>> >>> Regards, >>> >>> -- >>> Damiano Albani >> >> _______________________ >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | >> http://www.opensourceconnections.com < >> http://www.opensourceconnections.com/ >> <http://www.opensourceconnections.com/>> | My Free/Busy < >> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed < >> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw >> >> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>> >> >> This e-mail and all contents, including attachments, is considered to be >> Company Confidential unless explicitly stated otherwise, regardless of >> whether attachments are marked as such. >> >> > > -- > Damiano Albani _______________________ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw> This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.