https://github.com/epugh/playing-with-solr-streaming-expressions/tree/master/streaming_expressions/src/main/java/com/o19s/solr/streaming
 has an example of parsing JSONL formatted docs and an example of using atomic 
updates ;-)

https://github.com/epugh/playing-with-solr-streaming-expressions/blob/interact_with_tika_server/streaming_expressions/src/main/java/com/o19s/solr/streaming/SpaCyStream.java
 is an example of interacting with SpaCy ;-)



> On Dec 29, 2021, at 10:01 AM, Damiano Albani <damiano.alb...@gmail.com> wrote:
> 
> Hi Eric,
> 
> Thanks for your feedback, I highly appreciate it.
> I don't mind going the route of implementing something myself. I will have
> a try.
> By any chance, apart from looking at the official codebase, do you know of
> any examples out there I could draw my inspiration from?
> 
> Regards,
> 
> On Wed, Dec 29, 2021 at 3:08 PM Eric Pugh <ep...@opensourceconnections.com 
> <mailto:ep...@opensourceconnections.com>>
> wrote:
> 
>> Damiano,  I don’t really have a direct answer for you.   However, one of
>> the aspects of Streaming that I really like is that it’s relatively easy to
>> create your own operators and add them to Solr.   I find that I often just
>> create my own operator to fill in the gap of what is available.
>> 
>> I do think joining disparate datasets to make new datasets is one of the
>> most interesting uses of Streaming, so would love to see what you cook up.
>> 
>> Eric
>> 
>>> On Dec 29, 2021, at 6:39 AM, Damiano Albani <damiano.alb...@gmail.com>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> I'm new to streaming expressions, so I'm trying to understand their
>>> features and limitations.
>>> In particular the so-called "stream operators" implementing join
>> operations.
>>> Like "innerJoin", "leftOuterJoin", etc.
>>> 
>>> I see that they support a "on" parameter, defining the *equality* check
>> to
>>> be performed.
>>> But, coming from the SQL world, I'm used to being able to use a variety
>> of
>>> comparison operators in join predicates. That is, not only equality, as
>> in
>>> "equi-joins".
>>> 
>>> Is there a reason why the current implementation of Solr supports
>>> equi-joins only? Would it be technically possible (and desired) to
>> support
>>> other comparison operators with joins?
>>> And maybe somehow allow the use of the available stream evaluators
>>> <https://solr.apache.org/guide/8_11/stream-evaluator-reference.html>?
>>> 
>>> To give the context of my question: I'm trying to join 2 sets of
>> documents
>>> with a hierarchical relationship.
>>> My goal is to join them using a "path" field on one side and
>>> "descendent_path" field on the other side.
>>> But it looks like that only doc values are accessible (and not analyzed
>>> ones) in streams, so I suppose I'd be left with a join criteria like this
>>> pseudo-code:
>>> 
>>>> on="starts_with(right.path, left.path)"
>>> 
>>> Where, in this hypothetical example:
>>> 
>>>> left.path=/categories/category1"
>>>> right.path=/categories/category1/sub-categories/sub-category-a"
>>> 
>>> 
>>> Or do I completely misunderstand how Solr (streams) work? ;-)
>>> Thanks for your help!
>>> 
>>> Regards,
>>> 
>>> --
>>> Damiano Albani
>> 
>> _______________________
>> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
>> http://www.opensourceconnections.com <
>> http://www.opensourceconnections.com/ 
>> <http://www.opensourceconnections.com/>> | My Free/Busy <
>> http://tinyurl.com/eric-cal <http://tinyurl.com/eric-cal>>
>> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
>> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
>>  
>> <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>>
>> 
>> This e-mail and all contents, including attachments, is considered to be
>> Company Confidential unless explicitly stated otherwise, regardless of
>> whether attachments are marked as such.
>> 
>> 
> 
> -- 
> Damiano Albani

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
    
This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

Reply via email to