Re: Aligning Shards from different Collections on the same Solr server based on Date Range

2021-07-09 Thread Joel Bernstein
Can you solve this problem by adding all documents into the same collection
and performing self joins. You could add a field called rec_type to
differentiate between the records.

There are two good reasons for wanting to do this.

1) This allows you to route by the join key and easily co-locate records.

2) There is an optimized self join which is extremely fast that you could
take advantage of if you did this.

Let me know if this might be an option for you and we can discuss the
optimized self join in more detail.

Joel









Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper  wrote:

> After some research, it appears the following approach may help in this
> situation and relieve the requirement of collocating indexes for Joins.  It
> appears one drawback maybe the types of fields supported for the JOIN
> field.
>
> https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join
>
> Matt
>
> On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper  wrote:
>
> > Hi Solr Group,
> >
> > I am not sure the following is a viable use-case, welcoming input and any
> > implementation recommendations.
> >
> > I would like to perform joins over two sharded collections.  Where docs
> > are routed to specific shards based on a date range and are the same for
> > shards in each collection.
> >
> > I understand that this means that the replicas from each collection that
> > hold data to be joined need to be collated on the same Solr Server.   I
> > have read solutions that use ADD REPLICA to add a Collection B replica to
> > all SolrServers assuming Collection B has only one Shard.  For my use
> case
> > I need Collection B to have multiple shards.
> >
> > *Collection ACollection B  SolrServer *
> > Shard1_2020  Shard1_2020   172.33.0.1:8983_solr
> > Shard2_2021  Shard2_2021   172.33.0.2:8983_solr
> > Shard3_2022  Shard3_2022   172.33.0.3:8983_solr
> >
> > I think my question comes down to how do I break shards by a date range,
> > and do it in a way that both Collections A and B would be defined by the
> > same date range?  If could reliably break shards by date, and know the
> date
> > range of the shard, I think I could use ADD REPLICA api to align.
> >
> > Not sure a compositeId routing approach would work, but thinking an
> > implicit id may be hard to manage over time.
> >
> > Is an approach like this viable, concerned a bit about
> > maintenance concerns, other ideas to support this join?
> >
> > Note: I am considering this within Time series collections...
> >
> > Matt
> >
>


Re: Microsoft.sqlserver.jdbc.SQLServerException on SOLR after upgrading SQL Server from 2012 --> 2019

2021-07-09 Thread Dwane Hall
Hey Lulu,

While this is not a Solr specific issue I suspect your server is not configured 
to accept connections over older TLS/SSL versions but without seeing the 
handshake it can only be an assumption.  So in this instance setting your 
client to handshake over TLS 1.1 is not going to fix your problem if the server 
is rejecting this protocol (i.e. your driver setting of sslProtocol=TLSv1).

I think you have two options:

1 Download and use a new version of the sql server driver and try connecting 
over TLSv2 - similar issue addressed here and discussion on how older versions 
of the sql driver ignore the sslProtocol driver setting 
(https://stackoverflow.com/questions/48464863/java-1-8-0-enable-tls1-2-in-jdbc-connection).
  I assume you're using Java 1.8 on the older Solr 5x instance.
[https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-i...@2.png?v=73d79a89bded]
sql server - Java 1.8.0 enable TLS1.2 in JDBC connection - Stack 
Overflow
I have an SQL Server 2014 updated to the latest fixpack (12.0.5207). In the 
environment, the only protocol enabled is TLS1.2 (the registry keys has been 
set for the purpose). I can connect to the SQL
stackoverflow.com

2 Confirm what TLS/SSL connections your sql server instance is configured to 
accept and enable the older protocols if you need to (not recommended as these 
older versions are disabled for a reason particularly if this is a public 
facing system).

Thanks,

Dwane

Link to jdbc driver
https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-ver15



From: Paul, Lulu 
Sent: Thursday, 8 July 2021 8:54 PM
To: solr-u...@lucene.apache.org ; 
users@solr.apache.org 
Subject: Microsoft.sqlserver.jdbc.SQLServerException on SOLR after upgrading 
SQL Server from 2012 --> 2019

Hi Team,

Hi SOLR team,

Our .net project (currently running on solr-5.2.1) recently updated the DB end 
(from SQL Server 2012 to SQL Server 2019).
We made all the necessary changes with respect to app configs and SOLR configs 
- Changed data-config.xml, Restarted SOLR instance, re-started the server, 
Perform a full import on SOLR UI. But below is the error that gets flagged up 
on the Logging section of SOLR UI -

org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select ID from

Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could 
not establish a secure connection to SQL Server by using Secure Sockets Layer 
(SSL) encryption. Error: "SQL Server did not return a response. The connection 
has been closed.".
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1412)
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1058)
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:833)
at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:716)
at 
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:841)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:191)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:171)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:440)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:308)


Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could 
not establish a secure connection to SQL Server by using Secure Sockets Layer 
(SSL) encryption. Error: "SQL Server did not return a response. The connection 
has been closed.".

at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)

at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1412)

at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1058)

at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:833)

at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:716)

at 
com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:841)

at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:191)

at 
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:171)

at 
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:440)

at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcData

Re: Aligning Shards from different Collections on the same Solr server based on Date Range

2021-07-09 Thread Matt Kuiper
Thanks Joel!

On my list is to investigate Block Joins and Nested Child docs.

https://solr.apache.org/guide/8_8/other-parsers.html#block-join-query-parsers

https://solr.apache.org/guide/8_8/indexing-nested-documents.html#indexing-nested-documents

However, it looks like you are not suggesting using nested docs, but
specifying a type field to differentiate between types of docs and then a
join field.  Not having to build nested docs prior to updates would be an
advantage.  And it makes sense that the join field would allow for reliable
routing to appropriate the shard for both doc types.

I will take a further look and see if this approach will work, and get back
if more info is needed on the optimized self join.

Thanks again,
Matt


On Fri, Jul 9, 2021 at 7:01 AM Joel Bernstein  wrote:

> Can you solve this problem by adding all documents into the same collection
> and performing self joins. You could add a field called rec_type to
> differentiate between the records.
>
> There are two good reasons for wanting to do this.
>
> 1) This allows you to route by the join key and easily co-locate records.
>
> 2) There is an optimized self join which is extremely fast that you could
> take advantage of if you did this.
>
> Let me know if this might be an option for you and we can discuss the
> optimized self join in more detail.
>
> Joel
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper  wrote:
>
> > After some research, it appears the following approach may help in this
> > situation and relieve the requirement of collocating indexes for Joins.
> It
> > appears one drawback maybe the types of fields supported for the JOIN
> > field.
> >
> >
> https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join
> >
> > Matt
> >
> > On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper  wrote:
> >
> > > Hi Solr Group,
> > >
> > > I am not sure the following is a viable use-case, welcoming input and
> any
> > > implementation recommendations.
> > >
> > > I would like to perform joins over two sharded collections.  Where docs
> > > are routed to specific shards based on a date range and are the same
> for
> > > shards in each collection.
> > >
> > > I understand that this means that the replicas from each collection
> that
> > > hold data to be joined need to be collated on the same Solr Server.   I
> > > have read solutions that use ADD REPLICA to add a Collection B replica
> to
> > > all SolrServers assuming Collection B has only one Shard.  For my use
> > case
> > > I need Collection B to have multiple shards.
> > >
> > > *Collection ACollection B  SolrServer *
> > > Shard1_2020  Shard1_2020   172.33.0.1:8983_solr
> > > Shard2_2021  Shard2_2021   172.33.0.2:8983_solr
> > > Shard3_2022  Shard3_2022   172.33.0.3:8983_solr
> > >
> > > I think my question comes down to how do I break shards by a date
> range,
> > > and do it in a way that both Collections A and B would be defined by
> the
> > > same date range?  If could reliably break shards by date, and know the
> > date
> > > range of the shard, I think I could use ADD REPLICA api to align.
> > >
> > > Not sure a compositeId routing approach would work, but thinking an
> > > implicit id may be hard to manage over time.
> > >
> > > Is an approach like this viable, concerned a bit about
> > > maintenance concerns, other ideas to support this join?
> > >
> > > Note: I am considering this within Time series collections...
> > >
> > > Matt
> > >
> >
>


Re: Aligning Shards from different Collections on the same Solr server based on Date Range

2021-07-09 Thread Joel Bernstein
Block join is another option. If that works for you, from an indexing
standpoint, it's the most performant query time join.

If block indexing doesn't work for you then the optimized self join is
almost as fast.


Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 9, 2021 at 11:31 AM Matt Kuiper  wrote:

> Thanks Joel!
>
> On my list is to investigate Block Joins and Nested Child docs.
>
>
> https://solr.apache.org/guide/8_8/other-parsers.html#block-join-query-parsers
>
>
> https://solr.apache.org/guide/8_8/indexing-nested-documents.html#indexing-nested-documents
>
> However, it looks like you are not suggesting using nested docs, but
> specifying a type field to differentiate between types of docs and then a
> join field.  Not having to build nested docs prior to updates would be an
> advantage.  And it makes sense that the join field would allow for reliable
> routing to appropriate the shard for both doc types.
>
> I will take a further look and see if this approach will work, and get back
> if more info is needed on the optimized self join.
>
> Thanks again,
> Matt
>
>
> On Fri, Jul 9, 2021 at 7:01 AM Joel Bernstein  wrote:
>
> > Can you solve this problem by adding all documents into the same
> collection
> > and performing self joins. You could add a field called rec_type to
> > differentiate between the records.
> >
> > There are two good reasons for wanting to do this.
> >
> > 1) This allows you to route by the join key and easily co-locate records.
> >
> > 2) There is an optimized self join which is extremely fast that you could
> > take advantage of if you did this.
> >
> > Let me know if this might be an option for you and we can discuss the
> > optimized self join in more detail.
> >
> > Joel
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper  wrote:
> >
> > > After some research, it appears the following approach may help in this
> > > situation and relieve the requirement of collocating indexes for Joins.
> > It
> > > appears one drawback maybe the types of fields supported for the JOIN
> > > field.
> > >
> > >
> >
> https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join
> > >
> > > Matt
> > >
> > > On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper 
> wrote:
> > >
> > > > Hi Solr Group,
> > > >
> > > > I am not sure the following is a viable use-case, welcoming input and
> > any
> > > > implementation recommendations.
> > > >
> > > > I would like to perform joins over two sharded collections.  Where
> docs
> > > > are routed to specific shards based on a date range and are the same
> > for
> > > > shards in each collection.
> > > >
> > > > I understand that this means that the replicas from each collection
> > that
> > > > hold data to be joined need to be collated on the same Solr Server.
>  I
> > > > have read solutions that use ADD REPLICA to add a Collection B
> replica
> > to
> > > > all SolrServers assuming Collection B has only one Shard.  For my use
> > > case
> > > > I need Collection B to have multiple shards.
> > > >
> > > > *Collection ACollection B  SolrServer *
> > > > Shard1_2020  Shard1_2020   172.33.0.1:8983_solr
> > > > Shard2_2021  Shard2_2021   172.33.0.2:8983_solr
> > > > Shard3_2022  Shard3_2022   172.33.0.3:8983_solr
> > > >
> > > > I think my question comes down to how do I break shards by a date
> > range,
> > > > and do it in a way that both Collections A and B would be defined by
> > the
> > > > same date range?  If could reliably break shards by date, and know
> the
> > > date
> > > > range of the shard, I think I could use ADD REPLICA api to align.
> > > >
> > > > Not sure a compositeId routing approach would work, but thinking an
> > > > implicit id may be hard to manage over time.
> > > >
> > > > Is an approach like this viable, concerned a bit about
> > > > maintenance concerns, other ideas to support this join?
> > > >
> > > > Note: I am considering this within Time series collections...
> > > >
> > > > Matt
> > > >
> > >
> >
>


Solr operator and pod disruption budget

2021-07-09 Thread Joel Bernstein
WIth the Solr operator I see there is an updateStrategy for the managed
Solr upgrades. But I don't see anything mentioned about pod disruption
budgets. I saw that a Zookeeper pod disruption budget is installed with the
Zookeeper that deploys with the Solr operator but I didn't see one for Solr.

Does the updateStrategy take the place of the pod disruption budgets or do
we need to specify one for the lower level (OS etc...) pod updates?


Joel Bernstein
http://joelsolr.blogspot.com/


Re: Solr operator and pod disruption budget

2021-07-09 Thread Houston Putman
Unfortunately the podDisruptionBudget isn't so useful for Solr applications.

If we are merely talking about doing restarts of the Pod Spec (which could
include the OS that the Solr image uses), then the
updateStrategy takes care of that very well. If you are talking about doing
upgrades of the Kubernetes Nodes, then the updateStrategy
doesn't come into play at all.

The issue with PodDisruptionBudget is that it treats all pods the same.
This works well for Zookeeper, where each pod does
contain the same data. For Solr however, we can have 12 Solr nodes, 3
shards split across 4 nodes each. We might then want to
include a podDisruptionBudget of 3, so that we only take down one node of
each shard-group. However Kubernetes doesn't know what
our shard-groups are, so it could take down node(s) that contain 3 pods of
the same shard-group, completely ignoring the point of our
podDisruptionBudget.

We could manually create shard-groups by using multiple statefulSets, at
which point you can use a separate podDisruptionBudget for each.
And this is probably a future option that the operator will enable, but
hasn't been planned for yet.

In the meantime, you are able to create PodDisruptionBudget resources in
conjunction with your SolrCloud object. This is no different
in the end to what the Zookeeper Operator does, you just have to manage it
yourself.

- Houston

On Fri, Jul 9, 2021 at 11:52 AM Joel Bernstein  wrote:

> WIth the Solr operator I see there is an updateStrategy for the managed
> Solr upgrades. But I don't see anything mentioned about pod disruption
> budgets. I saw that a Zookeeper pod disruption budget is installed with the
> Zookeeper that deploys with the Solr operator but I didn't see one for
> Solr.
>
> Does the updateStrategy take the place of the pod disruption budgets or do
> we need to specify one for the lower level (OS etc...) pod updates?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>


Re: Solr operator and pod disruption budget

2021-07-09 Thread Joel Bernstein
Thanks Houston, that answers my question perfectly.




Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Jul 9, 2021 at 12:02 PM Houston Putman 
wrote:

> Unfortunately the podDisruptionBudget isn't so useful for Solr
> applications.
>
> If we are merely talking about doing restarts of the Pod Spec (which could
> include the OS that the Solr image uses), then the
> updateStrategy takes care of that very well. If you are talking about doing
> upgrades of the Kubernetes Nodes, then the updateStrategy
> doesn't come into play at all.
>
> The issue with PodDisruptionBudget is that it treats all pods the same.
> This works well for Zookeeper, where each pod does
> contain the same data. For Solr however, we can have 12 Solr nodes, 3
> shards split across 4 nodes each. We might then want to
> include a podDisruptionBudget of 3, so that we only take down one node of
> each shard-group. However Kubernetes doesn't know what
> our shard-groups are, so it could take down node(s) that contain 3 pods of
> the same shard-group, completely ignoring the point of our
> podDisruptionBudget.
>
> We could manually create shard-groups by using multiple statefulSets, at
> which point you can use a separate podDisruptionBudget for each.
> And this is probably a future option that the operator will enable, but
> hasn't been planned for yet.
>
> In the meantime, you are able to create PodDisruptionBudget resources in
> conjunction with your SolrCloud object. This is no different
> in the end to what the Zookeeper Operator does, you just have to manage it
> yourself.
>
> - Houston
>
> On Fri, Jul 9, 2021 at 11:52 AM Joel Bernstein  wrote:
>
> > WIth the Solr operator I see there is an updateStrategy for the managed
> > Solr upgrades. But I don't see anything mentioned about pod disruption
> > budgets. I saw that a Zookeeper pod disruption budget is installed with
> the
> > Zookeeper that deploys with the Solr operator but I didn't see one for
> > Solr.
> >
> > Does the updateStrategy take the place of the pod disruption budgets or
> do
> > we need to specify one for the lower level (OS etc...) pod updates?
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
>


Re: Solr Phonetic Search funny

2021-07-09 Thread David Smiley
Using debug=all you can review in what field(s) the match was on.  That
should give clues.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jul 7, 2021 at 3:23 PM Flowerday, Matthew J <
matthew.flower...@gb.unisys.com> wrote:

> Hi There
>
>
>
> I have just noticed a funny with DoubleMetaphone phonetic searching
> involving records with certain values in date fields causing a phonetic
> match to be returned for that record.
>
>
>
> I was searching for the word ‘main’ and I found results being returned
> with nothing being highlighted via the unified highlighter. I tracked the
> issue down to these date fields holding this values
>
>
>
> "*statementDate_dtr*":"2019-10-28T00:00:00Z",
>
> "*statementDate_dt*":"28/10/2019",
>
>
>
> Now the word ‘main’ has a phonetic value of ‘MN’ (according to the Solr
> Admin Tool analysis feature). Searching for either ‘main’ or ‘mn’ in the
> Admin tool returns a match on the record with these date values.
>
>
>
> The fields are configured as
>
>
>
>  stored="true"/>
>
> 
>
>
>
> I suspect the issue is down to the _dtr field as the _dt field is
> basically a string field.
>
>
>
> If I place the text 2019-10-28T00:00:00Z in a standard string field on
> another record the phonetic search does not match for ‘main’ on that record
> which does seem to point to the issue being related to a date field.
>
>
>
> If I then update the record and change the date to be say
>
>
>
> "*takenDate_dtr*":"2021-07-07T00:00:00Z",
>
> "*takenDate_dt*":"07/07/2021",
>
>
>
> Then a search for ‘main’ does not find a phonetic match.
>
>
>
> I am using Solr 8.8.1 and Solr 8.9.0 and the issue appears in both
> versions.
>
>
>
> I was wondering if anyone has seen this before?
>
>
>
> Many Thanks
>
>
>
> Matthew
>
>
>
> *Matthew Flowerday* | Consultant | ULEAF
>
> Unisys | 01908 774830| matthew.flower...@unisys.com
>
> Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
> 8LX
>
>
>
> [image: unisys_logo] 
>
>
>
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is for use only by the intended recipient. If you received
> this in error, please contact the sender and delete the e-mail and its
> attachments from all devices.
>
> [image: Grey_LI]   [image:
> Grey_TW]  [image: Grey_YT]
> [image: Grey_FB]
> [image: Grey_Vimeo]
> [image: Grey_UB] 
>
>
>
>
>
>
>
> *Matthew Flowerday* | Consultant | ULEAF
>
> Unisys | 01908 774830| matthew.flower...@unisys.com
>
> Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
> 8LX
>
>
>
> [image: unisys_logo] 
>
>
>
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is for use only by the intended recipient. If you received
> this in error, please contact the sender and delete the e-mail and its
> attachments from all devices.
>
> [image: Grey_LI]   [image:
> Grey_TW]  [image: Grey_YT]
> [image: Grey_FB]
> [image: Grey_Vimeo]
> [image: Grey_UB] 
>
>
>