Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-23 Thread Shawn Heisey

On 10/20/2023 5:23 PM, David Filip wrote:

 From a matter of perspective, however, I think what I am not clear on is 
having more than one ZK “server”, and when and why I would need more than one?

Perhaps it is just terminology, but if I have three (3x) Solr instances (cores) 
running on three (3x) separate physical servers (different hardware), and I want to 
replicate shards between those three, do I have all three (3x) Solr instances 
(cores) taking to the same single (1x) ZooKeeper “server"?

Or if I have three (3x) Solr instances (cores) replicating shards between them, 
do I also need three (3x) ZooKeeper “servers”, e.g., server.1, server.2, 
server.3, each “server” assigned to one specific Solr instance (core)?


You need three ZK "servers" each running on different physical hardware 
so that ZK has fault tolerance.  This requirement of a three server 
minimum is inherent in ZK's design and cannot be changed.


You need two Solr servers minimum so that Solr has fault tolerance.

You can run Solr on the same hardware as you run ZK, but it is STRONGLY 
recommended that ZK be a completely separate service from Solr, so that 
if you restart Solr, ZK does not go down, and vice versa.  For best 
performance, it is also recommended that ZK's data directory reside on a 
separate physical storage device from other processes like Solr, but if 
you have a small SolrCloud cluster and/or fast disks such as SSD, that 
may not be required.


ZK servers must all know about each other in order to maintain a 
coherent cluster.


Each Solr instance must know about all the ZK servers, which is why the 
zkhost string must list them all with an optional chroot.  Every Solr 
instance will maintain connections to all of the ZK servers simultaneously.


As I noted before, a SolrCloud collection is composed of one or more 
shards.  Each shard is composed of one or more replicas, each of which 
is a Solr core.  One Solr instance can host many cores.  I would 
recommend NOT running multiple Solr instances on each machine.


Thanks,
Shawn



Re: SOLR 8.11.2 Cloud issue

2023-10-23 Thread Scott Q.
Oops, here we go:

https://imgur.com/a/jSVKgkZ

On Monday, 23/10/2023 at 02:24 Ishan Chattopadhyaya wrote:



Unfortunately, the mailing list doesn't allow images to be attached. I
think it is time to retire this mailing list, move to a modern
discussion
forum. Thankfully, some community members are taking a lead on that
front.

On Mon, 23 Oct, 2023, 11:47 am Scott Q.,  wrote:

> Hmm...yes.
>
> Attaching image - this is when I access the ~cloud URL via direct
link.
> But notice there's no Cloud menu on the left side...
>
>
> If I access the direct links
> On Sunday, 22/10/2023 at 21:38 Ishan Chattopadhyaya wrote:
>
> Are you sure you're running Solr in SolrCloud mode?
>
> On Sun, 22 Oct, 2023, 3:36 pm Scott Q.,  wrote:
>
> > I have SOLR 8.11.2 running with 5 nodes in Cloud
> > configuration and it works great. In fact it was an upgrade from
an
> > earlier version that's been stable for a few years.
> >
> >
> > Since upgrading to 8.11.2 the WEB UI no longer displays the Cloud
menu
> > on the left. I can still access it via direct URL because I know
the
> > links and I can see the graph & nodes. But the nodes for example
don't
> > show any status such as uptime, used space, etc. I don't see any
> > errors in the logs.
> >
> >
> > In the browser console I see it tries to access two URLs which
always
> > fail and show up in red:
> >
> >
> >
http://192.168.1.10:8983/solr/admin/info/system?_=1697493975084&wt=json
> >
> >
> >
> > and this one:
> >
> >
> >
> >
>
http://192.168.1.10:8983/solr/admin/info/system?_=1697493975084&nodes=192.168.1.10:8983_solr,192.168.1.11:8983_solr,192.168.1.13:8983_solr,192.168.1.4:8983_solr,192.168.1.9:8983_solr&wt=json
> >
> >
> >
> > with payload:
> >
> >
> >
> > *
> > _:
> > 1697493975084
> > *
> > nodes:
> > 192.168.1.10:8983_solr,192.168.1.11:8983_solr,192.168.1.13:8983
> > _solr,192.168.1.4:8983_solr,192.168.1.9:8983_solr
> > *
> > wt:
> > json
> >
> >
> >
> > Again, the system works fine but it's the Cloud menu that's wonky.
> >
> >
> > Is this a known bug or does anyone know what else might be
happening ?
> >
> >
> > Thanks!
> >
>
>


Re: what is SOLR syntax to remove duplicated documents

2023-10-23 Thread Thomas Corthals
Probably not very helpful for the original question, but for the sake of
completeness: you can use the Lucene documentID with the Luke Request
Handler.

https://solr.apache.org/guide/solr/latest/indexing-guide/luke-request-handler.html

You can not use it as a reliable identifier for your Solr document. Lucene
may change it during operations such as optimize or merge.

Thomas

Op zo 22 okt 2023 om 21:59 schreef Mikhail Khludnev :

> You can get internal unique identifier a kind of analogy to ROW_NUMBER via
> [docid] see
>
> https://solr.apache.org/guide/solr/latest/query-guide/document-transformers.html#docid-docidaugmenterfactory
> .
> But I'm not aware about a query accepting this number.
>


Paging a delete query?

2023-10-23 Thread Koen De Groote
I'm looking to delete a big set of documents, based on a list of ids I have.

Testing this in a test environment has caused the VM to be unresponsive for
several minutes, which isn't ideal.

Are there options to paginate a delete query?

Also, are there options to ask solr about a specific query it was asked to
process, if it's still handling it, and all that?

Regards,
Koen De Groote


Re: Paging a delete query?

2023-10-23 Thread Thomas Corthals
Hi Koen,

You'll have to implement that on the client side.

If you happen to use PHP, the Solarium PHP client has a plugin that does
just that:
https://solarium.readthedocs.io/en/stable/plugins/#buffereddelete-plugin

(Full disclosure: I wrote the plugin.)

Thomas

Op ma 23 okt 2023 om 10:47 schreef Koen De Groote :

> I'm looking to delete a big set of documents, based on a list of ids I
> have.
>
> Testing this in a test environment has caused the VM to be unresponsive for
> several minutes, which isn't ideal.
>
> Are there options to paginate a delete query?
>
> Also, are there options to ask solr about a specific query it was asked to
> process, if it's still handling it, and all that?
>
> Regards,
> Koen De Groote
>


Performance issue in Wildcard Query in Solr 8.9.0

2023-10-23 Thread Vishal Patel
We are using Solr 8.9.0. We have configured Solr cloud like 2 shards and each 
shard has one replica. We have used 5 zoo keepers for Solr cloud.

 We have created collection name documents and index size of one shard is 21GB. 
Schema fields like here















We want to search data which contains test. So, we are making our query 
doc_ref:*test*. I think wildcard query is taking high memory and CPU. Sometimes 
we faced issue that collection goes into recovery mode due to usage of wildcard 
query.
Fo better performance, We have implemented ReversedWildcardFilterFactory: 
https://risdenk.github.io/2018/10/25/apache-solr-leading-wildcard-queries-reversedwildcardfilterfactory.html

How can we search after the applying ReversedWildcardFilterFactory? We are not 
getting benefits in term of query execution time if we search in same manner 
doc_ref_rev:*test*

Can you please suggest best approach when we want to search wildcard 
string(*test*) when index size is large?

Regards,

Vishal



Re: SOLR 9.3 - JVM Crashes

2023-10-23 Thread Ing. Andrea Vettori


> On 22 Oct 2023, at 22:28, rajani m  wrote:
> 
> Hi Andrea,
> 
>   Good to hear that.

Would be nice to understand why it’s happening however… :)

> What type of *cache* config do you have in the old
> configuration? I ask because Legacy Solr Cache from 8.x  does not work when
> solr is upgraded to 9.x. As per the docs
> ,
> Legacy SolrCache implementations (LRUCache, LFUCache, FastLRUCache) have
> been removed. Users have to modify their existing configurations to use
> CaffeineCache instead, so I am curious to know what cache works if not
> CaffeineCache in 9.x?

The only one that I can see in the admin UI are :

fieldCache: org.apache.solr.search.SolrFieldCacheBean
fieldValueCache: org.apache.solr.search.CaffeineCache


Thanks

— 
Ing. Andrea Vettori
Sistemi Informativi
B2BIres s.r.l.

Re: SOLR 9.3 - JVM Crashes

2023-10-23 Thread Ing. Andrea Vettori

> On 22 Oct 2023, at 23:31, Shawn Heisey  wrote:
> 
> On 10/21/2023 2:31 AM, Ing. Andrea Vettori wrote:
>> Hello, we’re using two SOLR servers (same hw, same version of solr and java, 
>> same solr config). The SOLR version is 9.3 and JVM is Adoptium JDK 17.0.8.1 
>> on Linux.
>> They both were running fine since a couple years (we upgraded from SOLR 8 to 
>> 9 with full reindexing some time ago).
> 
> DEB and RPM-based distros typically make it very easy to install OpenJDK out 
> of the box, there is no need to download something like Adoptium:

After some years using Oracle JDK we started using AdoptJDK after trying a few 
VM distributions (not sure it is the correct name) and next Adopotium. I never 
had any issue. And as far as I know it’s the same code base of OpenJDK…

>> Yesterday one of the server died with JVM crash with the following reason (I 
>> have the full JVM trace if needed).
>> Once restarted the server ran fine and received data updates every 15 
>> minutes, and responded to queries during the day.
>> Today the server died around the same time with the same JVM trace.
>> The time it died two times is early in the morning when we upload a lot of 
>> data. Then during the day the updates are less heavy in terms of size.
>> One strange thing is that only one of the server died, the other one is 
>> running fine and it’s receiving the same data.
>> Another thing to note is that in solrconfig we still had the “old” caches of 
>> SOLR 8 configured. Two days ago we changed the configuration to use 
>> CaffeineCache on one of the four cores (the biggest one). Not sure if it’s 
>> related but the time is suspicious… but why would it crash only on one of 
>> the servers since they’re both identical in configuration, version and 
>> hardware? Anyway I replaced solrconfig with the old configuration to see 
>> what happens tomorrow.
> 
> Sig11 crashes that are confined to one system usually indicate bad hardware.  
> It could be a bad DIMM, a bad motherboard, or a bad CPU ... in that order, 
> with the DIMM being the most likely problem.

This could be the correct reason but why restoring the old configuration it’s 
now working well on both servers as it did in the previous two years ? Today 
it’s the second morning it has not “failed” so we have two day with crash and 
the new config and two days ok with the old config. I guess we must leave it 
running for a few days to see that will happen :)

> 
> Solr 9.3 includes the workaround for the caffeine-related Java crash, and the 
> version of Java that you are running doesn't have that bug anyway.

I think the bug has been fixed on java >= 20 only ?

Thanks
— 
Ing. Andrea Vettori
Sistemi Informativi
B2BIres s.r.l.

Re: Paging a delete query?

2023-10-23 Thread Koen De Groote
Hello Thomas,

Thanks for the reply. Using nodejs here, but thanks for the information, it
does help.

Kind regards,
Koen

On Mon, Oct 23, 2023 at 11:22 AM Thomas Corthals 
wrote:

> Hi Koen,
>
> You'll have to implement that on the client side.
>
> If you happen to use PHP, the Solarium PHP client has a plugin that does
> just that:
> https://solarium.readthedocs.io/en/stable/plugins/#buffereddelete-plugin
>
> (Full disclosure: I wrote the plugin.)
>
> Thomas
>
> Op ma 23 okt 2023 om 10:47 schreef Koen De Groote :
>
> > I'm looking to delete a big set of documents, based on a list of ids I
> > have.
> >
> > Testing this in a test environment has caused the VM to be unresponsive
> for
> > several minutes, which isn't ideal.
> >
> > Are there options to paginate a delete query?
> >
> > Also, are there options to ask solr about a specific query it was asked
> to
> > process, if it's still handling it, and all that?
> >
> > Regards,
> > Koen De Groote
> >
>


Solved/Working but Network Interface Question (was: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1))

2023-10-23 Thread David Filip
Thanks Shawn!

I have since created 3x physical servers (Virtual Machines running on a cloud 
server), each one running its own instance of both ZooKeeper and Solr, and 
every Zookeeper/Solr instance configured to know about the others.  Everything 
seems to be working as expected, and any configuration changes are replicated 
across all 3 Solr servers, and I can replicate shards across all 3 Solr servers.

What is even better is that I can make an update on any one of the 3 Solr 
servers, and search for those changes on the other two.  This is in contrast to 
the old model I found, where all updates were made on a master that replicated 
out to read-only slaves.  So a big improvement over that model (which I also 
got confused by after reading about it in older versions of the on-line 
documentation, as even though I had installed Solr 9.2, as most of the 
documentation I found was for <= 6).

I can take one node down, see it as ‘DEAD’, and the other two nodes continue to 
operate (both for updates and reading).  When I bring that node back online, 
within a few seconds, it is automatically updated.  So overall, I think I’m 
good now!

My confusion was partly in trying to figure out how to use the ZooKeeper that 
came bundled with each Solr instance … much easier just to ignore it and 
install ZooKeeper separately … and understanding the one-to-one relationship 
between ZooKeeper and Solr instances.  Every example I found in the 
documentation included installing everything — including multiple servers — on 
the same physical node (localhost), which sent me down the wrong path.

The only hiccup I have is an oddity having to do with my servers being 
dual-hosted (two network connections):

Each server has a 10.0.2.0/24 Ethernet network connection, as well as a 
10.0.1.0/24 WiFi network connection.  The Ethernet network (10.0.2.0/24) is in 
the hosting center, and what I want the servers to use for communication 
between them, and the WiFi network (10.0.1.0/24) is so that I can access the 
servers from outside the hosting center (e.g., my office):

1. I have defined the Ethernet network addresses (10.0.2.0/24) in all of the 
ZooKeeper instances (zoo.cfg)

2. I have defined the Ethernet network addresses (10.0.2.0/24) when starting up 
each Solr instance (-z)

3. ZooKeeper from Solr Admin (Cloud => ZK Status) displays the Ethernet network 
addresses (10.0.2.0/24)

4. However, Solr from Solr Admin (Cloud => Nodes) displays the WiFi network 
addresses (10.0.1.0/24) !!!

So how do I get Solr to use the Ethernet network addresses instead of the WiFi 
network addresses?

I don’t believe this is just a display artifact, because Solr Admin could not 
display other Solr instances (Cloud => Nodes), or replicate between Solr 
instances, until I opened up the network on the WiFi network.  Once I opened up 
the firewall on the WiFi network, then everything started working.

Ideally, I would like Solr to communicate between nodes on the Ethernet 
network, but also be able to answer queries from either network (Ethernet or 
WiFi).  Is that possible?  Can I control which network interface it uses for 
inter-node communications?  The reasons are twofold:

1. Ethernet network is much faster

2. WiFi network is much less reliable (sporadic outages, slow-downs, relying on 
WiFi extenders that have to be periodically restarted)

In theory I guess I could bind Solr only to the Ethernet network (it currently 
binds to 0.0.0.0, a.k.a. all network interfaces), but then I would need to set 
up a separate proxy from my WiFi network to each of the Solr servers, which I 
would rather not do.  So is it possible to control which network interface Solr 
binds to for inter-server communication, while leaving other network interfaces 
open to queries?

Thanks,

Dave.

> On Oct 23, 2023, at 3:18 AM, Shawn Heisey  wrote:
> 
> On 10/20/2023 5:23 PM, David Filip wrote:
>> From a matter of perspective, however, I think what I am not clear on is 
>> having more than one ZK “server”, and when and why I would need more than 
>> one?
>> Perhaps it is just terminology, but if I have three (3x) Solr instances 
>> (cores) running on three (3x) separate physical servers (different 
>> hardware), and I want to replicate shards between those three, do I have all 
>> three (3x) Solr instances (cores) taking to the same single (1x) ZooKeeper 
>> “server"?
>> Or if I have three (3x) Solr instances (cores) replicating shards between 
>> them, do I also need three (3x) ZooKeeper “servers”, e.g., server.1, 
>> server.2, server.3, each “server” assigned to one specific Solr instance 
>> (core)?
> 
> You need three ZK "servers" each running on different physical hardware so 
> that ZK has fault tolerance.  This requirement of a three server minimum is 
> inherent in ZK's design and cannot be changed.
> 
> You need two Solr servers minimum so that Solr has fault tolerance.
> 
> You can run Solr on the same hardware as you run ZK, but it is STRONGLY 
> recommended 

Re: Solved/Working but Network Interface Question

2023-10-23 Thread Shawn Heisey

On 10/23/2023 8:18 AM, David Filip wrote:

In theory I guess I could bind Solr only to the Ethernet network (it currently 
binds to 0.0.0.0, a.k.a. all network interfaces), but then I would need to set 
up a separate proxy from my WiFi network to each of the Solr servers, which I 
would rather not do.  So is it possible to control which network interface Solr 
binds to for inter-server communication, while leaving other network interfaces 
open to queries?


See this section from the stock solr.xml file:

  
${host:}
${solr.port.advertise:0}
${hostContext:solr}

These define how each Solr node registers itself in the clusterstate 
info that can be found in ZK.  They reference Java system properties. 
This controls how each Solr server is reached by the others.


With that, you can use "-Dhost=1.2.3.4" and "-DhostPort=8983" on the 
Solr commandline.  This is probably easiest by adding a line like the 
following to solr.in.sh, which is found in /etc/default if the service 
installer script was used.  If you're on Windows, then it would be 
solr.in.cmd and it would look different:


SOLR_ADDL_ARGS="-Dhost=1.2.3.4 -DhostPort=8983"

Don't change the hostContext.  Many parts of Solr are hardcoded to use 
/solr and if you ask Jetty to change that, you'll have a lot of problems 
that require quite a bit of manual surgery to fix.  This is legacy stuff 
left over from much older versions where it was actually possible to 
change the context without breaking things.


Alternately you could just edit the solr.xml, but it is cleaner to use 
the system properties.  If you edit the file directly, then every server 
would have a different solr.xml config.


Thanks,
Shawn



Looking for a consultant to review our solr instance

2023-10-23 Thread Alu, Pino [EMR/ENT/IT/STL]
Hello,

Not sure if this is group to ask but we are potentially looking for a solr 
expert to examine our solr instance to help provide action items to optimize 
performance.

Thanks,

Pino Alu | HCL Commerce Administrator :: Emerson.com | Enterprise IT
Emerson | 8000 West Florissant Ave. | St. Louis | MO | 63136 | USA
T +1 314 553 1785
pino@emerson.com

Delivering Technology Solutions With Purpose
[cid:image001.png@01DA0599.D8D824F0]



Re: Looking for a consultant to review our solr instance

2023-10-23 Thread Markus Jelsma
Hello Pino, there is a list [1] of companies offering support.

Regards,
Markus

[1] https://cwiki.apache.org/confluence/display/SOLR/Support

Op ma 23 okt 2023 om 17:15 schreef Alu, Pino [EMR/ENT/IT/STL] <
pino@emerson.com>:

> Hello,
>
>
>
> Not sure if this is group to ask but we are potentially looking for a solr
> expert to examine our solr instance to help provide action items to
> optimize performance.
>
>
>
> Thanks,
>
>
>
> *Pino Alu* | HCL Commerce Administrator :: Emerson.com | Enterprise IT
>
> *Emerson *| 8000 West Florissant Ave. | St. Louis | MO | 63136 | USA
>
> T +1 314 553 1785
>
> pino@emerson.com
>
>
>
> *Delivering Technology Solutions With Purpose*
>
>
>


Re: knn query parser, number of results and filtering by score

2023-10-23 Thread Mirko Sertic

Hi!

I tested this, and it seems the problem is related to Solr cloud mode. 
It works perfectly in a non-cloud setup. Here is the debug response:


{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{
  "q":"{!knn f=TESTEMBEDDING_EU_3  topK=3}[1.0,1.0,1.0]",
  "defType":"lucene",
  "indent":"true",
  "fl":"id,[explain],[shard],TESTEMBEDDING_EU_3",
  "q.op":"OR",
  "debug.explain.structured":"true",
  "debugQuery":"true",
  "useParams":"",
  "_":"1698059075615"
    }
  },
  "response":{
    "numFound":3,
    "start":0,
    "numFoundExact":true,
    "docs":[{
  "id":"Position11",
  "TESTEMBEDDING_EU_3":[1.0,1.0,1.0],
  "[explain]":"1.0 = within top 3\n",
  "[shard]":"[not a shard request]"
    },{
  "id":"Position10",
  "TESTEMBEDDING_EU_3":[0.9,0.9,0.9],
  "[explain]":"0.97087383 = within top 3\n",
  "[shard]":"[not a shard request]"
    },{
  "id":"Position9",
  "TESTEMBEDDING_EU_3":[0.8,0.8,0.8],
  "[explain]":"0.89285713 = within top 3\n",
  "[shard]":"[not a shard request]"
    }]
  },
  "debug":{
    "rawquerystring":"{!knn f=TESTEMBEDDING_EU_3 topK=3}[1.0,1.0,1.0]",
    "querystring":"{!knn f=TESTEMBEDDING_EU_3 topK=3}[1.0,1.0,1.0]",
"parsedquery":"KnnFloatVectorQuery(KnnFloatVectorQuery:TESTEMBEDDING_EU_3[1.0,...][3])",
"parsedquery_toString":"KnnFloatVectorQuery:TESTEMBEDDING_EU_3[1.0,...][3]",
    "explain":{
  "Position11":{
    "match":true,
    "value":1.0,
    "description":"within top 3"
  },
  "Position10":{
    "match":true,
    "value":0.97087383,
    "description":"within top 3"
  },
  "Position9":{
    "match":true,
    "value":0.89285713,
    "description":"within top 3"
  }
    },
    "QParser":"KnnQParser",
    "timing":{
  "time":2.0,
  "circuitbreaker":{
    "time":0.0
  },
  "prepare":{
    "time":0.0,
    "query":{
  "time":0.0
    },
    "facet":{
  "time":0.0
    },
    "facet_module":{
  "time":0.0
    },
    "mlt":{
  "time":0.0
    },
    "highlight":{
  "time":0.0
    },
    "stats":{
  "time":0.0
    },
    "expand":{
  "time":0.0
    },
    "terms":{
  "time":0.0
    },
    "debug":{
  "time":0.0
    }
  },
  "process":{
    "time":1.0,
    "query":{
  "time":0.0
    },
    "facet":{
  "time":0.0
    },
    "facet_module":{
  "time":0.0
    },
    "mlt":{
  "time":0.0
    },
    "highlight":{
  "time":0.0
    },
    "stats":{
  "time":0.0
    },
    "expand":{
  "time":0.0
    },
    "terms":{
  "time":0.0
    },
    "debug":{
  "time":1.0
    }
  }
    }
  }
}

Please let me know where to file a bug report. Is it the Jira at 
issues.apache.org?


Mirko

Am 20.10.2023 um 12:09 schrieb Alessandro Benedetti:

I agree, you can definitely raise a bug for the debug, if you do me a
favour and also test in no Cloud mode, it will help us to understand if
it's a Solr bug or Lucene bug.

I also agree with your second point about the functional expectations, that
is a very minor though, you can create the ticket and contribute a fix if
you like, happy to review it!
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Thu, 19 Oct 2023 at 17:51, Mirko Sertic  wrote:


I've prepared a testcase. Given the following documents with
TESTEMBEDDING_EU_3 is a DenseVectorField with length 3 and euclidean
distance function. They are written to a collection made of two shards
with no further routing strategy, so they should be more or less evenly
distributed between the two shards:

{
id: 'Position1',
TESTEMBEDDING_EU_3: [0, 0, 0]
}
{
id: 'Position2',
TESTEMBEDDING_EU_3: [0.1, 0.1, 0.1]
}
{
id: 'Position3',
TESTEMBEDDING_EU_3: [0.2, 0.2, 0.2]
}
{
id: 'Position4',
TESTEMBEDDING_EU_3: [0.3, 0.3, 0.3]
}
{
id: 'Position5',
TESTEMBEDDING_EU_3: [0.4, 0.4, 0.4]
}
{
id: 'Position6',
TESTEMBEDDING_EU_3: [0.5, 0.5, 0.5]
}
{
id: 'Position7',
TESTEMBEDDING_EU_3: [0.6, 0.6, 0.6]
}
{
id: 'Position8',
TESTEMBEDDING_EU_3: [0.7, 0.7, 0.7]
}
{
id: 'Position9',
TESTEMBEDDING_EU_3: [0.8, 0.8, 0.8]
}
{
id: 'Position10',
TESTEMBEDDING_EU_3: [0.9, 0.9, 0.9]
}
{
id: 'Position11',
TESTEMBEDDING_EU_3: [1.0, 1.0, 1.0]
}

How I'll do a {!knn 

TruncateFieldUpdateProcessorFactor isn't being applied

2023-10-23 Thread Walter Underwood
I’ve defined TruncateFieldUpdateProcessorFactor in our update request processor 
chain, targeted at solr.StrField, but it isn’t truncating the content. That 
field is populated by a copyField directive. Is the copyField evaluated after 
the update chain? Could that cause this to not work?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



smime.p7s
Description: S/MIME cryptographic signature


Solr Heap usage

2023-10-23 Thread Surya R
Hi. We have a solr index with around 2.5M documents , running solr 8.11 .
We have bunch of fields that we faceet/sort/highlight on. Heap is 8GB, out
of 12 GB RAM

These fields were not defined with docValues=true due to some legacy design
issues , but if we were to rewrite the app, we would definitely have the
docvalues enabled for those fields that participate in
facet/sort/highlight.

Problem:
I do see garbage collection ( G1GC) kicking in when the traffic is moderate
to high, with a  "pause full" event that takes around 2seconds to complete.
This event happens more often when the traffic is high.

I see this GC event happening when the humongous regions go beyond 1000.
The humongous region size is 4M

I took a heap dump and analyzed with VisualVM, but I couldnt pinpoint what
objects are using the 'humougous region' .

My question is, using the VisualVM, is it possible to pinpoint which / what
objects are using the 'humongous region' ?

If for example I have an solr indexed field called  "answerSnippet" , if
that is the culprit, Is it possible to see the field answerSnippet in the
VisualVM heap dump output?

All i see is byte[] with 55% heap occupancy, but dont know what objects
constitute that byte[]

Can anybody throw some light on this ? I really appreciate the help.


thanks
-- Surya


Re: Unknown CVEs in 8.11.2?

2023-10-23 Thread Ishan Chattopadhyaya
Thanks Jamie. I've upgraded some of these in branch_8_11. I didn't have a
Hadoop cluster handy, so wasn't able to upgrade the Hadoop dependency,
though.
My naive attempt to just increase the Hadoop library version resulted in
several failed tests that I wasn't able to debug.

On Tue, 3 Oct 2023 at 01:33, Jamie Jackson  wrote:

> The nice thing about the (repeatable) way that I've structured the scan is
> that I'm automatically filtering *out* everything that the Solr team has
> identified as known and ignorable.
>
> Therefore, if their list is up-to-date (which would be a good policy, and
> hopefully, it's being maintained properly), my procedure will only identify
> the outstanding issues (ones that either the Solr team doesn't know about
> yet or ones that they've identified as problematic).
>
> Anybody should be able to run my two commands and come up with a current
> list of *outstanding *issues in any given version of Solr.
>
> It's not comforting to know that there are potentially 13 critical CVEs,
> even after excluding ignorable issues.
>
> On Mon, Oct 2, 2023 at 3:29 PM Ishan Chattopadhyaya <
> ichattopadhy...@gmail.com> wrote:
>
> > Is there any issue among those that has been known to be a vulnerability
> to
> > Solr? Often times, Solr uses multiple external dependencies that may have
> > vulnerabilities that can't be exploited through Solr, and those
> > vulnerabilities are of less concern to Solr in terms of urgency to
> address
> > it.
> >
> > On Mon, 2 Oct, 2023, 10:05 pm Jamie Jackson, 
> wrote:
> >
> > > Hello,
> > >
> > > Can anybody suggest a way to get some attention to these issues? I
> > thought
> > > this was the right place for CVE discussion but maybe I was wrong about
> > > that.
> > >
> > > Thanks,
> > > Jamie
> > >
> > > On Mon, Sep 18, 2023 at 1:30 PM Jamie Jackson 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > If I'm understanding everything correctly, there are lingering
> > > > vulnerabilities in 8.11.2 dependencies.
> > > >
> > > > Total: 96 (UNKNOWN: 0, LOW: 9, MEDIUM: 48, HIGH: 26, CRITICAL: 13)
> > > >
> > > > If you have docker installed, you can run the following without
> > > installing
> > > > anything:
> > > >
> > > > # generate a list of CVEs to IGNORE, because Solr has deemed them
> > > > irrelevant
> > > > curl https://solr.apache.org/solr.vex.json | jq -r -c
> > > > '.vulnerabilities[]|select(.analysis.state == "not_affected") |.id' >
> > > > .solr.trivyignore
> > > >
> > > > # run a trivy scan on the image, and tell trivy to ignore the
> > > > aforementioned CVEs
> > > > docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v
> > > > /tmp/.trivy_cache:/root/.cache/trivy -v
> > > > $PWD/.solr.trivyignore:/.trivyignore aquasec/trivy image solr:8.11.2
> > > > --scanners vuln --ignore-unfixed -f table
> > > >
> > > > Here's the output I get. (Sorry, wrapping will probably render it
> > > > illegible.)
> > > >
> > > > solr:8.11.2 (ubuntu 20.04)
> > > > ==
> > > > Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0)
> > > >
> > > > 2023-09-18T17:20:52.962Z INFO Table result includes only package
> > > > filenames. Use '--format json' option to get the full path to the
> > package
> > > > file.
> > > >
> > > > Java (jar)
> > > > ==
> > > > Total: 96 (UNKNOWN: 0, LOW: 9, MEDIUM: 48, HIGH: 26, CRITICAL: 13)
> > > >
> > > >
> > > >
> > >
> >
> ┌──┬─┬──┬┬───┬┬──┐
> > > > │   Library│
> > > >  Vulnerability│ Severity │ Status │ Installed Version │
> > >  Fixed
> > > > Version│Title
> > > >   │
> > > >
> > > >
> > >
> >
> ├──┼─┼──┼┼───┼┼──┤
> > > > │ com.fasterxml.jackson.core:jackson-databind  │
> > > > CVE-2022-42003  │ HIGH │ fixed  │ 2.13.3│
> 2.12.7.1,
> > > > 2.13.4.1 │ deep wrapper array nesting wrt
> > > > UNWRAP_SINGLE_VALUE_ARRAYS│
> > > > │ (jackson-databind-2.13.3.jar)│
> > > >   │  ││   │
> > > >│ https://avd.aquasec.com/nvd/cve-2022-42003
> > > > │
> > > > │  │
> > > >   │  ││   │
> > > >│
> > >   │
> > > > │  │
> > > >   │  ││   │
> > > >│
> > >   │
> > > > │  │
> > > >   │   

Re: Unknown CVEs in 8.11.2?

2023-10-23 Thread Jamie Jackson
Thanks for the reply, Ishan. Once these make it into a docker image, I'll
rescan.

On Mon, Oct 23, 2023 at 3:23 PM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Thanks Jamie. I've upgraded some of these in branch_8_11. I didn't have a
> Hadoop cluster handy, so wasn't able to upgrade the Hadoop dependency,
> though.
> My naive attempt to just increase the Hadoop library version resulted in
> several failed tests that I wasn't able to debug.
>
> On Tue, 3 Oct 2023 at 01:33, Jamie Jackson  wrote:
>
> > The nice thing about the (repeatable) way that I've structured the scan
> is
> > that I'm automatically filtering *out* everything that the Solr team has
> > identified as known and ignorable.
> >
> > Therefore, if their list is up-to-date (which would be a good policy, and
> > hopefully, it's being maintained properly), my procedure will only
> identify
> > the outstanding issues (ones that either the Solr team doesn't know about
> > yet or ones that they've identified as problematic).
> >
> > Anybody should be able to run my two commands and come up with a current
> > list of *outstanding *issues in any given version of Solr.
> >
> > It's not comforting to know that there are potentially 13 critical CVEs,
> > even after excluding ignorable issues.
> >
> > On Mon, Oct 2, 2023 at 3:29 PM Ishan Chattopadhyaya <
> > ichattopadhy...@gmail.com> wrote:
> >
> > > Is there any issue among those that has been known to be a
> vulnerability
> > to
> > > Solr? Often times, Solr uses multiple external dependencies that may
> have
> > > vulnerabilities that can't be exploited through Solr, and those
> > > vulnerabilities are of less concern to Solr in terms of urgency to
> > address
> > > it.
> > >
> > > On Mon, 2 Oct, 2023, 10:05 pm Jamie Jackson, 
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > Can anybody suggest a way to get some attention to these issues? I
> > > thought
> > > > this was the right place for CVE discussion but maybe I was wrong
> about
> > > > that.
> > > >
> > > > Thanks,
> > > > Jamie
> > > >
> > > > On Mon, Sep 18, 2023 at 1:30 PM Jamie Jackson 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > If I'm understanding everything correctly, there are lingering
> > > > > vulnerabilities in 8.11.2 dependencies.
> > > > >
> > > > > Total: 96 (UNKNOWN: 0, LOW: 9, MEDIUM: 48, HIGH: 26, CRITICAL: 13)
> > > > >
> > > > > If you have docker installed, you can run the following without
> > > > installing
> > > > > anything:
> > > > >
> > > > > # generate a list of CVEs to IGNORE, because Solr has deemed them
> > > > > irrelevant
> > > > > curl https://solr.apache.org/solr.vex.json | jq -r -c
> > > > > '.vulnerabilities[]|select(.analysis.state == "not_affected")
> |.id' >
> > > > > .solr.trivyignore
> > > > >
> > > > > # run a trivy scan on the image, and tell trivy to ignore the
> > > > > aforementioned CVEs
> > > > > docker run --rm -v /var/run/docker.sock:/var/run/docker.sock -v
> > > > > /tmp/.trivy_cache:/root/.cache/trivy -v
> > > > > $PWD/.solr.trivyignore:/.trivyignore aquasec/trivy image
> solr:8.11.2
> > > > > --scanners vuln --ignore-unfixed -f table
> > > > >
> > > > > Here's the output I get. (Sorry, wrapping will probably render it
> > > > > illegible.)
> > > > >
> > > > > solr:8.11.2 (ubuntu 20.04)
> > > > > ==
> > > > > Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0)
> > > > >
> > > > > 2023-09-18T17:20:52.962Z INFO Table result includes only package
> > > > > filenames. Use '--format json' option to get the full path to the
> > > package
> > > > > file.
> > > > >
> > > > > Java (jar)
> > > > > ==
> > > > > Total: 96 (UNKNOWN: 0, LOW: 9, MEDIUM: 48, HIGH: 26, CRITICAL: 13)
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> ┌──┬─┬──┬┬───┬┬──┐
> > > > > │   Library│
> > > > >  Vulnerability│ Severity │ Status │ Installed Version │
> > > >  Fixed
> > > > > Version│Title
> > > > >   │
> > > > >
> > > > >
> > > >
> > >
> >
> ├──┼─┼──┼┼───┼┼──┤
> > > > > │ com.fasterxml.jackson.core:jackson-databind  │
> > > > > CVE-2022-42003  │ HIGH │ fixed  │ 2.13.3│
> > 2.12.7.1,
> > > > > 2.13.4.1 │ deep wrapper array nesting wrt
> > > > > UNWRAP_SINGLE_VALUE_ARRAYS│
> > > > > │ (jackson-databind-2.13.3.jar)│
> > > > >   │  ││   │
> > > > >│ https://avd.aquasec.com/nvd/cve-2022-42003
> > > > > │
> > > 

Re: Paging a delete query?

2023-10-23 Thread Chris Hostetter


If you want to "delete by query" but you don't care about the atomicity of 
the "delete by query" (ie: queries and or other updates may process in the 
middle of the delete) then that is exactly what the delete() streaming 
expression decorator was designed for...

https://solr.apache.org/guide/solr/latest/query-guide/stream-decorator-reference.html#delete


-Hoss
http://www.lucidworks.com/


Re: TruncateFieldUpdateProcessorFactor isn't being applied

2023-10-23 Thread Mikhail Khludnev
Hello Walter.
I'm afraid the copyField directive is handled quite after update
processor chain.
May you try with CloneFieldUpdateProcessorFactory before truncating?


On Mon, Oct 23, 2023 at 8:29 PM Walter Underwood 
wrote:

> I’ve defined TruncateFieldUpdateProcessorFactor in our update request
> processor chain, targeted at solr.StrField, but it isn’t truncating the
> content. That field is populated by a copyField directive. Is the copyField
> evaluated after the update chain? Could that cause this to not work?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>

-- 
Sincerely yours
Mikhail Khludnev