As you mentioned currently only the equi-join is supported. But you could
pretty quickly adapt an existing join to do what you want.

https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/client/solrj/io/stream/LeftOuterJoinStream.java

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Dec 29, 2021 at 10:23 AM Eric Pugh <ep...@opensourceconnections.com>
wrote:

>
> https://github.com/epugh/playing-with-solr-streaming-expressions/tree/master/streaming_expressions/src/main/java/com/o19s/solr/streaming
> has an example of parsing JSONL formatted docs and an example of using
> atomic updates ;-)
>
>
> https://github.com/epugh/playing-with-solr-streaming-expressions/blob/interact_with_tika_server/streaming_expressions/src/main/java/com/o19s/solr/streaming/SpaCyStream.java
> is an example of interacting with SpaCy ;-)
>
>
>
> > On Dec 29, 2021, at 10:01 AM, Damiano Albani <damiano.alb...@gmail.com>
> wrote:
> >
> > Hi Eric,
> >
> > Thanks for your feedback, I highly appreciate it.
> > I don't mind going the route of implementing something myself. I will
> have
> > a try.
> > By any chance, apart from looking at the official codebase, do you know
> of
> > any examples out there I could draw my inspiration from?
> >
> > Regards,
> >
> > On Wed, Dec 29, 2021 at 3:08 PM Eric Pugh <
> ep...@opensourceconnections.com <mailto:ep...@opensourceconnections.com>>
> > wrote:
> >
> >> Damiano,  I don’t really have a direct answer for you.   However, one of
> >> the aspects of Streaming that I really like is that it’s relatively
> easy to
> >> create your own operators and add them to Solr.   I find that I often
> just
> >> create my own operator to fill in the gap of what is available.
> >>
> >> I do think joining disparate datasets to make new datasets is one of the
> >> most interesting uses of Streaming, so would love to see what you cook
> up.
> >>
> >> Eric
> >>
> >>> On Dec 29, 2021, at 6:39 AM, Damiano Albani <damiano.alb...@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm new to streaming expressions, so I'm trying to understand their
> >>> features and limitations.
> >>> In particular the so-called "stream operators" implementing join
> >> operations.
> >>> Like "innerJoin", "leftOuterJoin", etc.
> >>>
> >>> I see that they support a "on" parameter, defining the *equality* check
> >> to
> >>> be performed.
> >>> But, coming from the SQL world, I'm used to being able to use a variety
> >> of
> >>> comparison operators in join predicates. That is, not only equality, as
> >> in
> >>> "equi-joins".
> >>>
> >>> Is there a reason why the current implementation of Solr supports
> >>> equi-joins only? Would it be technically possible (and desired) to
> >> support
> >>> other comparison operators with joins?
> >>> And maybe somehow allow the use of the available stream evaluators
> >>> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
> >>>
> >>> To give the context of my question: I'm trying to join 2 sets of
> >> documents
> >>> with a hierarchical relationship.
> >>> My goal is to join them using a "path" field on one side and
> >>> "descendent_path" field on the other side.
> >>> But it looks like that only doc values are accessible (and not analyzed
> >>> ones) in streams, so I suppose I'd be left with a join criteria like
> this
> >>> pseudo-code:
> >>>
> >>>> on="starts_with(right.path, left.path)"
> >>>
> >>> Where, in this hypothetical example:
> >>>
> >>>> left.path=/categories/category1"
> >>>> right.path=/categories/category1/sub-categories/sub-category-a"
> >>>
> >>>
> >>> Or do I completely misunderstand how Solr (streams) work? ;-)
> >>> Thanks for your help!
> >>>
> >>> Regards,
> >>>
> >>> --
> >>> Damiano Albani
> >>
> >> _______________________
> >> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> >> http://www.opensourceconnections.com <
> >> http://www.opensourceconnections.com/ <
> http://www.opensourceconnections.com/>> | My Free/Busy <
> >> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
> >> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >>
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >>
> >>
> >> This e-mail and all contents, including attachments, is considered to be
> >> Company Confidential unless explicitly stated otherwise, regardless of
> >> whether attachments are marked as such.
> >>
> >>
> >
> > --
> > Damiano Albani
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>

Reply via email to