Solr Data Import | inconsistent output

2023-02-27 Thread HariBabu kuruva
Hi All,

Our Developers have initiated data import. And it shows different outputs
on each refresh.
One time it shows as Data import is completed and the other time it says in
progress. Screenshots below. Please help.

[image: image.png]

[image: image.png]


-- 

Thanks and Regards,
 Hari
Mobile:9790756568


Solr Connection Pool

2023-02-27 Thread Paul Ryder
Hi All

Solr 8.1.1 - Java 8 - 1 master 2 slaves - been running fine for months

Getting a java illegal state error "Connection Pool shutdown" on both Solr 
slaves - was also showing "Invalid Master" on replication screens on both slaves

Restarted slaves, all back ok

Is this a system resource that's running out on Solr master?

Thanks in advance


RE: Cores renamed

2023-02-27 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
This has happened yet again.

Does anyone yet have any input on the idea of using the Leader's collection 
name in Leader/Follower replication (or pre-Solr8.7 Master/Slave replication), 
rather than the core name?

-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C]  
Sent: Thursday, June 3, 2021 10:30 AM
To: users@solr.apache.org
Subject: RE: Cores renamed

As a potential solution, I was wondering about implementing Master/Slave 
replication using the collection name of the Master rather than the core name. 
My initial experiment with this in a test environment seemed to work. Does 
anyone have any input on the idea of using the Master's collection name in 
Master/Slave replication, rather than the core name?

-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C]  
Sent: Wednesday, June 02, 2021 5:46 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

It happened again this morning.

Attached is an excerpt from solr.log (with port #s & IP addresses redacted) and 
below is the current CLUSTERSTATUS (with port #s redacted)

Is there yet any explanation?

{
  "responseHeader":{
"status":0,
"QTime":10},
  "cluster":{
"collections":{
  "ipg_report_large":{
"pullReplicas":"0",
"replicationFactor":"1",
"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node8":{
"core":"ipg_report_large_shard1_replica_n7",
"base_url":"http://solrdbprod26.be-md:/solr";,
"node_name":"solrdbprod26.be-md:_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true"},
  "core_node10":{
"core":"ipg_report_large_shard1_replica_n9",
"base_url":"http://solrdbprod25.be-md:/solr";,
"node_name":"solrdbprod25.be-md:_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"router":{"name":"compositeId"},
"maxShardsPerNode":"1",
"autoAddReplicas":"false",
"nrtReplicas":"1",
"tlogReplicas":"0",
"znodeVersion":741,
"configName":"ipg_report_large"}},
"live_nodes":["solrdbprod26.be-md:_solr",
  "solrdbprod25.be-md:_solr"]}}

-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C]  
Sent: Monday, May 17, 2021 5:01 PM
To: users@solr.apache.org
Subject: RE: Cores renamed

The entire directory for the old core gets removed

Here is CLUSTERSTATUS (again with port numbers redacted). I ran CLUSTERSTATUS 
on both nodes, and the only difference was QTime (that is, there was no real 
difference):

{
  "responseHeader":{
"status":0,
"QTime":5},
  "cluster":{
"collections":{
  "ipg_report_large":{
"pullReplicas":"0",
"replicationFactor":"1",
"shards":{"shard1":{
"range":"8000-7fff",
"state":"active",
"replicas":{
  "core_node4":{
"core":"ipg_report_large_shard1_replica_n3",
"base_url":"http://solrdbprod26.be-md:/solr";,
"node_name":"solrdbprod26.be-md:_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false"},
  "core_node6":{
"core":"ipg_report_large_shard1_replica_n5",
"base_url":"http://solrdbprod25.be-md:/solr";,
"node_name":"solrdbprod25.be-md:_solr",
"state":"active",
"type":"NRT",
"force_set_state":"false",
"leader":"true",
"router":{"name":"compositeId"},
"maxShardsPerNode":"1",
"autoAddReplicas":"false",
"nrtReplicas":"1",
"tlogReplicas":"0",
"znodeVersion":710,
"configName":"ipg_report_large"}},
"live_nodes":["solrdbprod26.be-md:_solr",
  "solrdbprod25.be-md:_solr"]}}

-Original Message-
From: matthew sporleder  
Sent: Monday, May 17, 2021 4:34 PM
To: users@solr.apache.org
Subject: Re: Cores renamed

Can you verify all of your zkHost connection params across the entire
cluster, and share the replicationFactor, autoAddReplicas, etc for the
collection?

My theory is that you have two zookeeper configs conflicting as master
elections happens, causing new replicas to get created on-the-fly.

Also -- do these cores get deleted from the filesystem or left around?

On Mon, May 17, 2021 at 4:11 PM Oakley, Craig (NIH/NLM/NCBI) [C]
 wrote:
>
> > What does the core renames itself to, that would probably be the biggest 
> > hint.
>
> At 4:01pm 1/14/21, Solr decided on its own to drop the core 
> ipg_report_large_shard1_replica_n1 and to create the core 
> ipg_report_large_shard1_replica_n7 in its place
>
> At 4:33am 1/16/21, Solr decided on its o

Re: Cores renamed

2023-02-27 Thread Shawn Heisey

On 2/27/23 09:03, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:

This has happened yet again.

Does anyone yet have any input on the idea of using the Leader's collection 
name in Leader/Follower replication (or pre-Solr8.7 Master/Slave replication), 
rather than the core name?


If you're in cloud mode, you should not try to configure or initiate 
replication.  SolrCloud takes over the replication handler and uses it 
for its own purposes.  SolrCloud handles replicating the index when you 
have multiple replicas and you do not need to do anything.


In recent Solr versions, I wouldn't even include the replication handler 
in solrconfig.xml ... it is implicitly defined and does not need to be 
there.


Thanks,
Shawn


Neural search in production

2023-02-27 Thread Bruno Osiek
Hi,

We intend to use Solr neural search in production. Is anyone aware of who
is using this functionality and can share how it is performing and, maybe,
a few lessons learned?

Thanks

-- 
[]s,
B.


Reversed leftOuterJoin on clause returns incorrect results

2023-02-27 Thread Geren White
Hello,

When testing out joins in solr streams we noticed that when the on clause
is reversed the results are incorrect and the join will return as if
everything matched.

For example if you have steamA and streamB with the following tuples:

streamA:
{
  item_id_1: "123",
  item_id_2: "456"
}

streamB:
{
  item_id: "789",
  user_id: "0"
}

Executing a stream like below:
leftOuterJoin(
  search(collection-a, q=*:*, fq="item_id_1:123", fl="item_id_1,item_id_2",
qt="/export", sort="item_id_2 desc"),
  search(collection-b,
fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
desc"),
on="item_id=item_id_2")

This will return something like this where all tuples are joined even
though item_id doesn't match item_id_2:
{
  item_id_1: "123",
  item_id_2: "456",
  item_id: "789",
  user_id: "0"
}

Note that the first column in the on clause is from the second table.

Is this expected behavior? We're running solr 8.11.1 and noticed it
while setting up a new query. It's an easy fix to switch the on clause but
seems like it should throw an error or handle it properly. Happy to open up
a bug ticket if this isn't expected.

Thanks,
-- 
*Geren White | Senior Director, Engineering*
*(e)* ge...@1stdibs.com


Re: Number of Collections in a SolrCloud

2023-02-27 Thread Natarajan, Rajeswari
HI Brain and everyone,

How many solr nodes per solrcloud you have to support 1000 collections , 
replication factor and memory allotted.
In the mailing list several times the max limit of solr collections on  
solrcloud was discussed. Interested in the specifics and also I assume this  is 
in prod.

Thanks,
Rajeswari


On 6/29/21, 7:39 AM, "Brian Lininger" mailto:brian.linin...@veeva.com>> wrote:


Hi Matt,
Solr instance == Solr JVM. 80-90M docs is the total count of docs across
all collections we typically have several hundred collections per Solr
instance as we have a multi-tenent service and we keep all data segregated
by tenent.
Brian






On Mon, Jun 28, 2021, 7:41 PM mtn search mailto:search...@gmail.com>> wrote:


> Thanks Brian! Valuable information!
>
> Followup question. When you say Solr instance, in each case do you mean
> SolrCloud instance? It seems so when you speak of replica count, however
> when you stated 80-90 million docs I wondered if you meant Solr collection.
>
> Matt
>
> On Mon, Jun 28, 2021 at 5:17 PM Brian Lininger  >
> wrote:
>
> > Hi Matt,
> > We're currently running Solr 6.6.6 using Solr Cloud. Depending on the
> > application and load, we've been able to stably run upwards of 1,000
> > collections without a problem in a single SolrCloud. We try to keep the
> > total replica count per Solr instance to less than 500, but have run
> > 600-700 replicas per Solr instance without issue if the user load is
> > light. Our Solr document sizes are pretty large, but we're able to
> handle
> > 80-90M docs per instance with 700-800G of total index size. 300B docs
> does
> > seem quite large, but if the size of your docs aren't huge and you've got
> > enough shards in your collection then I wouldn't be surprised if it
> worked
> > fine. The only thing we learned is that we had to change the number of
> > threads Solr uses for loading replicas because of our high numbers 8
> > threads would take forever upon startup (look at 'coreLoadThreads') . At
> > the very least, perf test out something on a similar scale of what you're
> > thinking and see how it scales.
> > Best of Luck,
> > Brian
> >
> > On Mon, Jun 28, 2021 at 12:50 PM mtn search  > > wrote:
> >
> > > I am guessing the consideration of hitting the limit of the number of
> > > collections within a SolrCloud is not a common experience. I wanted to
> > > raise this question again if perhaps anyone has any lessons learned or
> > > things to consider. We are currently planning work to migrate 300
> > billion
> > > plus docs on the master nodes of a legacy master/slave installation to
> > > SolrCloud. I figure that we will push the limits of a single SolrCloud
> > > instance.
> > >
> > > Thanks again,
> > > Matt
> > >
> > > On Fri, Jun 25, 2021 at 10:15 AM mtn search  > > >
> wrote:
> > >
> > > > Hello,
> > > >
> > > > I am interested to learn what others have experienced in terms of
> > hitting
> > > > a limit for the number of collections supported by a SolrCloud
> > instance.
> > > >
> > > > Also, does anyone have any tips/questions for evaluating when to
> > create a
> > > > new SolrCloud and begin adding new collections to it rather than grow
> > the
> > > > original SolrCloud instance?
> > > >
> > > > I realize there are likely a number of characteristics of a SolrCloud
> > to
> > > > evaluate. My guess is network resources will be the key factor. I
> am
> > > > thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble. With
> > > > Collections containing 10-30 million docs, small doc size, heavy
> > > indexing,
> > > > small query load.
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > >
> >
> >
> > --
> >
> >
> > *Brian Lininger*
> > Technical Architect, Infrastructure & Search
> > *Veeva Systems *
> > brian.linin...@veeva.com 
> >
> > *Zoom:* https://veeva.zoom.us/j/8113896271 
> > 
> >
> > www.veeva.com
> >
> >
> > *This email and the information it contains are intended for the intended
> > recipient only, are confidential and may be privileged information exempt
> > from disclosure by law.*
> > *If you have received this email in error, please notify us immediately
> by
> > reply email and delete this message from your computer.*
> > *Please do not retain, copy or distribute this email.*
> >
>





Re: Reversed leftOuterJoin on clause returns incorrect results

2023-02-27 Thread t sornin
Your join key is reversed.  It should be "on=item_id_2=item_id" which only
returns the left stream (first stream param for leftOuterJoin) since there
is no match.

Hope this helps.

Mathew

On Mon, Feb 27, 2023, 4:25 PM Geren White  wrote:

> Hello,
>
> When testing out joins in solr streams we noticed that when the on clause
> is reversed the results are incorrect and the join will return as if
> everything matched.
>
> For example if you have steamA and streamB with the following tuples:
>
> streamA:
> {
>   item_id_1: "123",
>   item_id_2: "456"
> }
>
> streamB:
> {
>   item_id: "789",
>   user_id: "0"
> }
>
> Executing a stream like below:
> leftOuterJoin(
>   search(collection-a, q=*:*, fq="item_id_1:123", fl="item_id_1,item_id_2",
> qt="/export", sort="item_id_2 desc"),
>   search(collection-b,
> fq="user_id:0",q="*:*",qt="/export",fl="item_id,user_id",sort="item_id
> desc"),
> on="item_id=item_id_2")
>
> This will return something like this where all tuples are joined even
> though item_id doesn't match item_id_2:
> {
>   item_id_1: "123",
>   item_id_2: "456",
>   item_id: "789",
>   user_id: "0"
> }
>
> Note that the first column in the on clause is from the second table.
>
> Is this expected behavior? We're running solr 8.11.1 and noticed it
> while setting up a new query. It's an easy fix to switch the on clause but
> seems like it should throw an error or handle it properly. Happy to open up
> a bug ticket if this isn't expected.
>
> Thanks,
> --
> *Geren White | Senior Director, Engineering*
> *(e)* ge...@1stdibs.com
>


Re: About Using Hadoop in SolrCloud

2023-02-27 Thread David Smiley
Yes; this was shocking to me at first because the implications are big and
it's almost a secret.  Ideally the ref guide would scream this loudly;
users today care *way* more about S3 than HDFS.  The "HDFS" Solr module
uses the HDFS client API which has a pluggable back-end, and thus you can
have it talk to S3.  You can search the user list for this; maybe JIRA.
I've briefly dabbled with it (got stuck with incompatible versions) but I
know others have done this (presumably at earlier versions than what I used
at the time).  It's a simple matter of adding the correct JAR files and
some trivial configuration.  The main problem is that such a home-brew
concoction of theoretically compatible things is on your shoulders to
debug/support.  Solr isn't testing its support for this; it will fail for
some versions as it did for me.  Maybe Solr *should* test/support this.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Feb 23, 2023 at 10:59 PM Zara Parst  wrote:

> David, you made a point. Is it true we can keep indexes to S3? I mean index
> under use not the backup ?
>
> On Fri, Feb 24, 2023 at 1:11 AM David Smiley  wrote:
>
> > I agree with Eric, but wish to add one point:  Separation of compute from
> > storage to get: better redundancy (HDFS or S3 will do it better, maybe
> > cheaper), better elasticity (since Solr nodes become stateless; easy to
> add
> > more nodes), better cost?  Sacrifice indexing performance and a bit of
> > query.  Admittedly I don't have real experience here but this is my
> > thinking.  The most annoying thing about Solr's HDFS support is that
> > SolrCloud's replication is quite redundant/wasteful with that at the
> > storage layer, thus adding cost inefficiency. There is potential for
> > improvements there.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Thu, Feb 23, 2023 at 7:45 AM Eric Pugh <
> ep...@opensourceconnections.com
> > >
> > wrote:
> >
> > > I am replying, but just to the users mailing list, as it’s not
> > appropriate
> > > for dev@.
> > >
> > > I think the short answer is that if you are already super into the
> Hadoop
> > > ecosystem, then you already have strong reasons why, and you can answer
> > all
> > > of your questions listed already ;-).  You then look at Solr on Hadoop
> as
> > > “hey, it works with what I am already doing” at my enterprise.
> > >
> > > If you aren’t already in the Hadoop ecosystem, then there isn’t any
> > > special Solr specific reason to go this way, and indeed many reasons
> NOT
> > > to.   Hadoop isn’t for the faint of heart….
> > >
> > > Not an answer per se….
> > >
> > > > On Feb 23, 2023, at 5:57 AM, Zara Parst 
> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I read at many places about using Hadoop in solrCloud. I try to find
> > the
> > > > reason why to use Hadoop in place of a local file system. Can someone
> > > > briefly explain why to use Hadoop with SolrCloud when solr is just
> > using
> > > > Hadoop for indexing and storing logs in Hadoop. Is there any
> compelling
> > > > reason to do that?
> > > >
> > > > Is Hadoop having any advantage over the local file system with solr,
> > > since
> > > > I can achieve cloud mod storing index in the local file system and
> can
> > > > still use shard and replica.  So my question is what advantage Hadoop
> > > will
> > > > give me, does Hadoop do indexing fast, does Hadoop take less space to
> > > store
> > > > index, is that distributed file system is better in Hadoop, like
> > > sharding,
> > > > replication etc. Or does it take backup automatically?
> > > >
> > > > Please do answer this question as much as possible,
> > >
> > > ___
> > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> |
> > > http://www.opensourceconnections.com <
> > > http://www.opensourceconnections.com/> | My Free/Busy <
> > > http://tinyurl.com/eric-cal>
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> > >
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > >
> > > This e-mail and all contents, including attachments, is considered to
> be
> > > Company Confidential unless explicitly stated otherwise, regardless of
> > > whether attachments are marked as such.
> > >
> > >
> >
>


Re: Return field aliasing for id field does not work

2023-02-27 Thread Rajani Maski
Thank you, appreciate it. I will follow along on the jira for a solution
and workaround if any.
Can I create a jira account myself?


On Sat, Feb 25, 2023 at 7:36 AM Mikhail Khludnev  wrote:

> Hello, Rajani.
> I've got the point. I filled
> https://issues.apache.org/jira/browse/SOLR-16681
> Do you need a jira account to join discussion?
>
> On Fri, Feb 24, 2023 at 6:19 PM Rajani Maski 
> wrote:
>
> > Hi Mikhail,
> >
> > The issue is when "id"  is the name of the alias. Map the value of "id"
> to
> > "old_id"  (fl=old_id:id) and then map "id" to "new_id" so the return
> field
> > list is "fl=old_id:id,id:new_id"
> >
> >
> > On Thu, Feb 23, 2023, 9:42 AM Mikhail Khludnev  wrote:
> >
> > > Hello, Rajani.
> > > I build and launched fresh [main] branch. fl aliasing works there,
> here's
> > > the proof pic
> > > https://pasteboard.co/KNFAyGPzBKFP.png
> > >
> > >
> > > On Thu, Feb 23, 2023 at 5:06 AM Rajani Maski 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > >  Solr 9.x, aliasing "id" field in return fields list with a different
> > > field
> > > > value does not work the same as 7.x.  fl=unique_key:id,id:some_field.
> > > This
> > > > returns an empty docs array in the response. Any alternative?
> > > >
> > > > Thanks,
> > > > rajani
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > https://t.me/MUST_SEARCH
> > > A caveat: Cyrillic!
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>