Expanding child document matches with parent fields

2024-01-25 Thread Frederic Font Corbera
Hi everyone,

I'm one of the developers behind the Freesound website (
https://freesound.org, a sound sharing website), we use Solr as our search
engine and I'm currently experimenting with a new feature that I'd like to
implement using Solr. In summary, we have a Solr index with one document
per sound in our database and we do standard search operations there.
However, I'd like to add child documents to each of the main documents
which include specific information about the sounds at different points in
time. For example, I have a main document with basic properties like sound
title and tags, but then have N child documents that have a timestamp field
and some extra information  associated with that time stamp. Here is
simplified example of a document that could be indexed (normally my child
documents would include also dense vector fields):

[
  {
"ID": "1",
"title": "Recording of a street ambience",
"tags": ['urban', 'ambience', 'dogs', 'birds'],
"duration": 1:21,
"events": [{
"ID": "1/events#0",
"timestamp": 0:23,
"event_description": "Dog barking"
  },{
"ID": "1/events#1",
"timestamp": 0:47,
"event_description": "Bird calls"
  },{
"ID": "1/events#2",
"timestamp": 1:05,
"event_description": "Dog barking"
  },
  ...
]
  },
  ...
]

What I want to achieve is to do a query that matches child documents and
sorts them according to some score, but I want to do faceting based on
parent document fields. For example, I want to get all documents in which a
"Dog barking" event happens (and if a document has 2 such events like in
the example, I want the document returned 2 times), I want them sorted by
the score of the child document, but I want to include faceting data for,
e.g. the "duration" field (which refers to the parent document).

One solution would be to duplicate all the parent document fields in every
child document at index time. This would work, but then I would get a lot
of redundant information in the index.

What I think would work best would be a way to extend the child document
fields and include the fields of the parent at "query time". So I'd like to
specify the field list with something like
"fl=timestamp,event_description,__parent__.duration". Is that possible?

I tried other approaches that might work like the parent query parser which
will return parent documents whose child documents match some criteria, but
this has the problems of not telling me which of the child documents
matched the query, and also it will not sort them as expected because the
score is not propagated to the parent document.

That is all, thanks a lot for the support!

Cheers,

frederic





--
Frederic Font - ffont.github.io
Music Technology Group, UPF - mtg.upf.edu 
Freesound - freesound.org


Re: Expanding child document matches with parent fields

2024-01-25 Thread Mikhail Khludnev
Hello Federic,
It sounds like blockParent domain change see.
https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#block-join-domain-changes

On Thu, Jan 25, 2024 at 12:15 PM Frederic Font Corbera <
frederic.f...@upf.edu> wrote:

> Hi everyone,
>
> I'm one of the developers behind the Freesound website (
> https://freesound.org, a sound sharing website), we use Solr as our search
> engine and I'm currently experimenting with a new feature that I'd like to
> implement using Solr. In summary, we have a Solr index with one document
> per sound in our database and we do standard search operations there.
> However, I'd like to add child documents to each of the main documents
> which include specific information about the sounds at different points in
> time. For example, I have a main document with basic properties like sound
> title and tags, but then have N child documents that have a timestamp field
> and some extra information  associated with that time stamp. Here is
> simplified example of a document that could be indexed (normally my child
> documents would include also dense vector fields):
>
> [
>   {
> "ID": "1",
> "title": "Recording of a street ambience",
> "tags": ['urban', 'ambience', 'dogs', 'birds'],
> "duration": 1:21,
> "events": [{
> "ID": "1/events#0",
> "timestamp": 0:23,
> "event_description": "Dog barking"
>   },{
> "ID": "1/events#1",
> "timestamp": 0:47,
> "event_description": "Bird calls"
>   },{
> "ID": "1/events#2",
> "timestamp": 1:05,
> "event_description": "Dog barking"
>   },
>   ...
> ]
>   },
>   ...
> ]
>
> What I want to achieve is to do a query that matches child documents and
> sorts them according to some score, but I want to do faceting based on
> parent document fields. For example, I want to get all documents in which a
> "Dog barking" event happens (and if a document has 2 such events like in
> the example, I want the document returned 2 times), I want them sorted by
> the score of the child document, but I want to include faceting data for,
> e.g. the "duration" field (which refers to the parent document).
>
> One solution would be to duplicate all the parent document fields in every
> child document at index time. This would work, but then I would get a lot
> of redundant information in the index.
>
> What I think would work best would be a way to extend the child document
> fields and include the fields of the parent at "query time". So I'd like to
> specify the field list with something like
> "fl=timestamp,event_description,__parent__.duration". Is that possible?
>
> I tried other approaches that might work like the parent query parser which
> will return parent documents whose child documents match some criteria, but
> this has the problems of not telling me which of the child documents
> matched the query, and also it will not sort them as expected because the
> score is not propagated to the parent document.
>
> That is all, thanks a lot for the support!
>
> Cheers,
>
> frederic
>
>
>
>
>
> --
> Frederic Font - ffont.github.io
> Music Technology Group, UPF - mtg.upf.edu 
> Freesound - freesound.org
>


-- 
Sincerely yours
Mikhail Khludnev


Connection refused in solr 9.4.0

2024-01-25 Thread Gummadi, Ramesh
I am trying to upgrade solr cloud from version 8.11.1 to 9.4.0. Solr
service started but connection is getting refused.
See the below error in the solr.log. Any pointers.

ERROR (updateExecutor-8-thread-2-processing-hostname:8988_solr
rc2_addresses_shard1_replica_n1 rc2_addresses shard1 core_node3)
[rc2_addresses shard1 core_node3 rc2_addresses_shard1_replica_n1]
o.a.s.c.SyncStrategy
http://hostname:8988/solr/rc2_addresses_shard1_replica_n1/: Could not tell
a replica to recover => org.apache.solr.client.solrj.SolrServerException:
Server refused connection at: http://hostname:8988/solr
About to connect() to hostname port 8988 (#0)
*   Trying
* Connection refused
* Failed connect to hostname:8988; Connection refused
* Closing connection 0 (edited)

-- 
Ramesh

-- 
The information contained in this email message and its attachments is 
intended only for the private and confidential use of the recipient(s) 
named above, unless the sender expressly agrees otherwise. Transmission of 
email over the Internet is not a secure communications medium. If you are 
requesting or have requested the transmittal of personal data, as defined 
in applicable privacy laws by means of email or in an attachment to email, 
you must select a more secure alternate means of transmittal that supports 
your obligations to protect such personal data. 

If the reader of this 
message is not the intended recipient and/or you have received this email 
in error, you must take no action based on the information in this email 
and you are hereby notified that any dissemination, misuse or copying or 
disclosure of this communication is strictly prohibited. If you have 
received this communication in error, please notify us immediately by email 
and delete the original message. 


Re: Connection refused in solr 9.4.0

2024-01-25 Thread Jan Høydahl
This same question was asked yesterday on this list. Answer is to set 
SOLR_JETTY_HOST=0.0.0.0 to bind to any network interface, not only local.

Jan

> 25. jan. 2024 kl. 14:05 skrev Gummadi, Ramesh 
> :
> 
> I am trying to upgrade solr cloud from version 8.11.1 to 9.4.0. Solr
> service started but connection is getting refused.
> See the below error in the solr.log. Any pointers.
> 
> ERROR (updateExecutor-8-thread-2-processing-hostname:8988_solr
> rc2_addresses_shard1_replica_n1 rc2_addresses shard1 core_node3)
> [rc2_addresses shard1 core_node3 rc2_addresses_shard1_replica_n1]
> o.a.s.c.SyncStrategy
> http://hostname:8988/solr/rc2_addresses_shard1_replica_n1/: Could not tell
> a replica to recover => org.apache.solr.client.solrj.SolrServerException:
> Server refused connection at: http://hostname:8988/solr
> About to connect() to hostname port 8988 (#0)
> *   Trying
> * Connection refused
> * Failed connect to hostname:8988; Connection refused
> * Closing connection 0 (edited)
> 
> -- 
> Ramesh
> 
> -- 
> The information contained in this email message and its attachments is 
> intended only for the private and confidential use of the recipient(s) 
> named above, unless the sender expressly agrees otherwise. Transmission of 
> email over the Internet is not a secure communications medium. If you are 
> requesting or have requested the transmittal of personal data, as defined 
> in applicable privacy laws by means of email or in an attachment to email, 
> you must select a more secure alternate means of transmittal that supports 
> your obligations to protect such personal data. 
> 
> If the reader of this 
> message is not the intended recipient and/or you have received this email 
> in error, you must take no action based on the information in this email 
> and you are hereby notified that any dissemination, misuse or copying or 
> disclosure of this communication is strictly prohibited. If you have 
> received this communication in error, please notify us immediately by email 
> and delete the original message. 



Re: Optimizing Solr for low memory

2024-01-25 Thread Shawn Heisey

On 1/24/24 01:27, uyil...@vivaldi.net.INVALID wrote:

Is there a general guideline to optimize Solr for very little number of 
documents in the core and low memory? For example, let's say 2000 documents and 
100mb of memory. It crashes often due to OOM error with the default 
configuration.

Are there places in the Solr config where we can look to make it need less heap 
when document count is very low? This is just for regular indexing and regular 
searches by the way, nothing fancy like facets.


The default heap size Solr starts with out of the box is 512MB.  This is 
quite small.  It is enough to run Solr, but from what I have seen, as 
soon as you add data and start to actually use it for more than the most 
simple queries, you'll need to increase the heap.


It is not going to be possible to run Solr on a system with only 100MB 
of memory.


I would say that the absolute minimum system memory requirement for 
running Solr on a non-Windows operating system is going to be about 1GB, 
and 4GB would be a lot better.


One thing you can do to reduce heap requirements is disable all the 
caches - just delete or comment the definitions in solrconfig.xml.


Enabling docValues on fields that you use for things other than 
searching (one example is sorting) can help.


Thanks,
Shawn



Re: Expanding child document matches with parent fields

2024-01-25 Thread Frederic Font Corbera
Hi Mikhail,

Thanks a lot for your quick response! I did not know about that and this
seems to be exactly what I was looking for. I did some quick tests with the
JSON facets API (previously I was using the non-JSON faceting method) and
it allows me to query child document but facet by parents, just as you
described. This is perfect for me.

There is one extra issue that I did not mention in my previous email which
is, similar to the faceting problem which is now solved, I have a grouping
problem because I'd like to group child document by a field of the parent.
Again I could fix that by indexing the parent fields with the child (and
because I only need one field it would not be too bad in this case). But
maybe there is a similar solution to that of the facets? I searched the
docs but could not find it.

Thanks a lot!!!


frederic


--
Frederic Font - ffont.github.io
Music Technology Group, UPF - mtg.upf.edu 
Freesound - freesound.org



On Thu, 25 Jan 2024 at 13:02, Mikhail Khludnev  wrote:

> Hello Federic,
> It sounds like blockParent domain change see.
>
> https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#block-join-domain-changes
>
> On Thu, Jan 25, 2024 at 12:15 PM Frederic Font Corbera <
> frederic.f...@upf.edu> wrote:
>
> > Hi everyone,
> >
> > I'm one of the developers behind the Freesound website (
> > https://freesound.org, a sound sharing website), we use Solr as our
> search
> > engine and I'm currently experimenting with a new feature that I'd like
> to
> > implement using Solr. In summary, we have a Solr index with one document
> > per sound in our database and we do standard search operations there.
> > However, I'd like to add child documents to each of the main documents
> > which include specific information about the sounds at different points
> in
> > time. For example, I have a main document with basic properties like
> sound
> > title and tags, but then have N child documents that have a timestamp
> field
> > and some extra information  associated with that time stamp. Here is
> > simplified example of a document that could be indexed (normally my child
> > documents would include also dense vector fields):
> >
> > [
> >   {
> > "ID": "1",
> > "title": "Recording of a street ambience",
> > "tags": ['urban', 'ambience', 'dogs', 'birds'],
> > "duration": 1:21,
> > "events": [{
> > "ID": "1/events#0",
> > "timestamp": 0:23,
> > "event_description": "Dog barking"
> >   },{
> > "ID": "1/events#1",
> > "timestamp": 0:47,
> > "event_description": "Bird calls"
> >   },{
> > "ID": "1/events#2",
> > "timestamp": 1:05,
> > "event_description": "Dog barking"
> >   },
> >   ...
> > ]
> >   },
> >   ...
> > ]
> >
> > What I want to achieve is to do a query that matches child documents and
> > sorts them according to some score, but I want to do faceting based on
> > parent document fields. For example, I want to get all documents in
> which a
> > "Dog barking" event happens (and if a document has 2 such events like in
> > the example, I want the document returned 2 times), I want them sorted by
> > the score of the child document, but I want to include faceting data for,
> > e.g. the "duration" field (which refers to the parent document).
> >
> > One solution would be to duplicate all the parent document fields in
> every
> > child document at index time. This would work, but then I would get a lot
> > of redundant information in the index.
> >
> > What I think would work best would be a way to extend the child document
> > fields and include the fields of the parent at "query time". So I'd like
> to
> > specify the field list with something like
> > "fl=timestamp,event_description,__parent__.duration". Is that possible?
> >
> > I tried other approaches that might work like the parent query parser
> which
> > will return parent documents whose child documents match some criteria,
> but
> > this has the problems of not telling me which of the child documents
> > matched the query, and also it will not sort them as expected because the
> > score is not propagated to the parent document.
> >
> > That is all, thanks a lot for the support!
> >
> > Cheers,
> >
> > frederic
> >
> >
> >
> >
> >
> > --
> > Frederic Font - ffont.github.io
> > Music Technology Group, UPF - mtg.upf.edu 
> > Freesound - freesound.org
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Expanding child document matches with parent fields

2024-01-25 Thread Mikhail Khludnev
Probably you are talking about searching parents and then roll over parents
to children via
https://solr.apache.org/guide/solr/latest/query-guide/document-transformers.html#child-childdoctransformerfactory

On Thu, Jan 25, 2024 at 7:16 PM Frederic Font Corbera 
wrote:

> Hi Mikhail,
>
> Thanks a lot for your quick response! I did not know about that and this
> seems to be exactly what I was looking for. I did some quick tests with the
> JSON facets API (previously I was using the non-JSON faceting method) and
> it allows me to query child document but facet by parents, just as you
> described. This is perfect for me.
>
> There is one extra issue that I did not mention in my previous email which
> is, similar to the faceting problem which is now solved, I have a grouping
> problem because I'd like to group child document by a field of the parent.
> Again I could fix that by indexing the parent fields with the child (and
> because I only need one field it would not be too bad in this case). But
> maybe there is a similar solution to that of the facets? I searched the
> docs but could not find it.
>
> Thanks a lot!!!
>
>
> frederic
>
>
> --
> Frederic Font - ffont.github.io
> Music Technology Group, UPF - mtg.upf.edu 
> Freesound - freesound.org
>
>
>
> On Thu, 25 Jan 2024 at 13:02, Mikhail Khludnev  wrote:
>
> > Hello Federic,
> > It sounds like blockParent domain change see.
> >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#block-join-domain-changes
> >
> > On Thu, Jan 25, 2024 at 12:15 PM Frederic Font Corbera <
> > frederic.f...@upf.edu> wrote:
> >
> > > Hi everyone,
> > >
> > > I'm one of the developers behind the Freesound website (
> > > https://freesound.org, a sound sharing website), we use Solr as our
> > search
> > > engine and I'm currently experimenting with a new feature that I'd like
> > to
> > > implement using Solr. In summary, we have a Solr index with one
> document
> > > per sound in our database and we do standard search operations there.
> > > However, I'd like to add child documents to each of the main documents
> > > which include specific information about the sounds at different points
> > in
> > > time. For example, I have a main document with basic properties like
> > sound
> > > title and tags, but then have N child documents that have a timestamp
> > field
> > > and some extra information  associated with that time stamp. Here is
> > > simplified example of a document that could be indexed (normally my
> child
> > > documents would include also dense vector fields):
> > >
> > > [
> > >   {
> > > "ID": "1",
> > > "title": "Recording of a street ambience",
> > > "tags": ['urban', 'ambience', 'dogs', 'birds'],
> > > "duration": 1:21,
> > > "events": [{
> > > "ID": "1/events#0",
> > > "timestamp": 0:23,
> > > "event_description": "Dog barking"
> > >   },{
> > > "ID": "1/events#1",
> > > "timestamp": 0:47,
> > > "event_description": "Bird calls"
> > >   },{
> > > "ID": "1/events#2",
> > > "timestamp": 1:05,
> > > "event_description": "Dog barking"
> > >   },
> > >   ...
> > > ]
> > >   },
> > >   ...
> > > ]
> > >
> > > What I want to achieve is to do a query that matches child documents
> and
> > > sorts them according to some score, but I want to do faceting based on
> > > parent document fields. For example, I want to get all documents in
> > which a
> > > "Dog barking" event happens (and if a document has 2 such events like
> in
> > > the example, I want the document returned 2 times), I want them sorted
> by
> > > the score of the child document, but I want to include faceting data
> for,
> > > e.g. the "duration" field (which refers to the parent document).
> > >
> > > One solution would be to duplicate all the parent document fields in
> > every
> > > child document at index time. This would work, but then I would get a
> lot
> > > of redundant information in the index.
> > >
> > > What I think would work best would be a way to extend the child
> document
> > > fields and include the fields of the parent at "query time". So I'd
> like
> > to
> > > specify the field list with something like
> > > "fl=timestamp,event_description,__parent__.duration". Is that possible?
> > >
> > > I tried other approaches that might work like the parent query parser
> > which
> > > will return parent documents whose child documents match some criteria,
> > but
> > > this has the problems of not telling me which of the child documents
> > > matched the query, and also it will not sort them as expected because
> the
> > > score is not propagated to the parent document.
> > >
> > > That is all, thanks a lot for the support!
> > >
> > > Cheers,
> > >
> > > frederic
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Frederic Font - ffont.github.io
> > > Music Technology Group, UPF - mtg.upf.edu <
> https://www.upf.edu/web