Solr upload configset end up with exception

2022-06-08 Thread yang ma

Hi All,

I am using Solr 8.8.2 in cloud mode, I previously uploaded a config set having 
a name abc and created multiple collections based on that config set.

Now I need to update the config set and try to upload the new config set zip 
file to Solr and overwrite the previous one as below, I got JSONParser 
exception:

$ curl -X POST --header Content-Type:application/octet-stream --data-binary 
@conf.zip 
'http://host:port/solr/admin/configs?action=UPLOAD&name=abc&overwrite=true'
% Total% Received % Xferd  Average Speed   TimeTime Time  Current
   Dload  Upload   Total   SpentLeft  Speed
0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0{
"responseHeader":{
  "status":500,
  "QTime":1},
"error":{
  "msg":"JSON Parse Error: char=\u,position=0 AFTER='\u' BEFORE=''",
  "trace":"org.noggit.JSONParser$ParseException: JSON Parse Error: 
char=\u,position=0 AFTER='\u' BEFORE=''
at org.noggit.JSONParser.err(JSONParser.java:452)
at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.java:819)
at org.noggit.JSONParser.next(JSONParser.java:1026)
at org.noggit.JSONParser.nextEvent(JSONParser.java:1073)
at org.noggit.ObjectBuilder.(ObjectBuilder.java:84)
at org.apache.solr.common.util.Utils.lambda$static$1(Utils.java:356)
at org.apache.solr.common.util.Utils.fromJSON(Utils.java:319)
at org.apache.solr.common.util.Utils.fromJSON(Utils.java:305)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.isCurrentlyTrusted(ConfigSetsHandler.java:328)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.ensureOverwritingUntrustedConfigSet(ConfigSetsHandler.java:308)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.createBaseZnode(ConfigSetsHandler.java:269)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.handleConfigUploadRequest(ConfigSetsHandler.java:230)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.handleRequestBody(ConfigSetsHandler.java:113)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1612)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at org.eclipse.jetty.ser100 76538  100  5064  100 71474  45879   632k --:--:-- 
--:--:-- --:--:--  
685kipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKi

Update Processor in Nested Document

2022-06-08 Thread maoredri
Hi, everyone.
First I want say that English is not my first language, so I apologize for any 
mistakes.

Recently, we moved from Solr 6 to Solr 8.
now we want to start using nested document in our collections.
we going over our custom plugins in order to make them work for nested document 
and we were wondering if Solr's built in update processors support nested 
documents.
for example, given the document:
{
"id": "321",
"nestField":[
{
"id": "321.456",
"coolField": [12,12,1]
}
]
}

can I ​use 
[UniqFieldsUpdateProcessor](https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html)
 on "coolField"? If yes, how do I configure the processor to the nested field?

Thanks,
Maor Edri

Sent with [Proton Mail](https://proton.me/) secure email.

RE: Solr upload configset end up with exception

2022-06-08 Thread Steffen Moldenhauer
The stacktrace looks pretty much like 
https://issues.apache.org/jira/browse/SOLR-16110
As a workaround you might be able to  upload the configset directly to 
zookeeper from command line using 'solr zk upconfig' and reload the collections.

Regards
Steffen

-Original Message-
From: yang ma  
Sent: Mittwoch, 8. Juni 2022 09:33
To: users@solr.apache.org
Subject: Solr upload configset end up with exception


Hi All,

I am using Solr 8.8.2 in cloud mode, I previously uploaded a config set having 
a name abc and created multiple collections based on that config set.

Now I need to update the config set and try to upload the new config set zip 
file to Solr and overwrite the previous one as below, I got JSONParser 
exception:

$ curl -X POST --header Content-Type:application/octet-stream --data-binary 
@conf.zip 
'http://host:port/solr/admin/configs?action=UPLOAD&name=abc&overwrite=true'
% Total% Received % Xferd  Average Speed   TimeTime Time  Current
   Dload  Upload   Total   SpentLeft  Speed
0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0{
"responseHeader":{
  "status":500,
  "QTime":1},
"error":{
  "msg":"JSON Parse Error: char=\u,position=0 AFTER='\u' BEFORE=''",
  "trace":"org.noggit.JSONParser$ParseException: JSON Parse Error: 
char=\u,position=0 AFTER='\u' BEFORE=''
at org.noggit.JSONParser.err(JSONParser.java:452)
at org.noggit.JSONParser.handleNonDoubleQuoteString(JSONParser.java:819)
at org.noggit.JSONParser.next(JSONParser.java:1026)
at org.noggit.JSONParser.nextEvent(JSONParser.java:1073)
at org.noggit.ObjectBuilder.(ObjectBuilder.java:84)
at org.apache.solr.common.util.Utils.lambda$static$1(Utils.java:356)
at org.apache.solr.common.util.Utils.fromJSON(Utils.java:319)
at org.apache.solr.common.util.Utils.fromJSON(Utils.java:305)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.isCurrentlyTrusted(ConfigSetsHandler.java:328)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.ensureOverwritingUntrustedConfigSet(ConfigSetsHandler.java:308)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.createBaseZnode(ConfigSetsHandler.java:269)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.handleConfigUploadRequest(ConfigSetsHandler.java:230)
at 
org.apache.solr.handler.admin.ConfigSetsHandler.handleRequestBody(ConfigSetsHandler.java:113)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1612)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at 
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.j

Re: Update Processor in Nested Document

2022-06-08 Thread Mikhail Khludnev
Maor, your question makes sense. I remember this discussion a few
years ago. But I don't remember where it end up.
Alexandre,
Do you remember it?

On Wed, Jun 8, 2022 at 10:50 AM maoredri  wrote:

> Hi, everyone.
> First I want say that English is not my first language, so I apologize for
> any mistakes.
>
> Recently, we moved from Solr 6 to Solr 8.
> now we want to start using nested document in our collections.
> we going over our custom plugins in order to make them work for nested
> document and we were wondering if Solr's built in update processors support
> nested documents.
> for example, given the document:
> {
> "id": "321",
> "nestField":[
> {
> "id": "321.456",
> "coolField": [12,12,1]
> }
> ]
> }
>
> can I ​use [UniqFieldsUpdateProcessor](
> https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html)
> on "coolField"? If yes, how do I configure the processor to the nested
> field?
>
> Thanks,
> Maor Edri
>
> Sent with [Proton Mail](https://proton.me/) secure email.



-- 
Sincerely yours
Mikhail Khludnev


Solr indexing performance tips

2022-06-08 Thread Marius Grigaitis
Hi All,

Our Solr is bottlenecking on write performance (uses lots of cpu, writes
queue up). Looking for some tips on what to look into to figure out if we
can squeeze more write performance out of it without changing the setup too
drastically.

Here's the setup:
* Solr 8.2 (I know, could be upgraded to 8.11 or later but I don't think I
have seen significant changes that could impact the performance in
changelogs)
* Replica setup (one node is responsible for indexing, other nodes are
replicating every 10 minutes). Indexing node runs on 8 cores, 16G of RAM.
* 9 different cores. Each weighs around ~100 MB on disk and has
approximately 90k documents inside each.
* Updating is performed using update method in batches of 1000, around 9
processes in parallel (split by core)
* Setup is pretty straight forward, no magic string processors, setup is
quite default, just some field types are defined in core configuration and
only a few ids are stored.

The problem:
* It currently takes around 3 hours to process updates for all documents on
a machine with 2 cores and 8 gigs of RAM.

My gut-feeling assumptions:
* Since total index size is quite small (around 1G in total or less on
disk) and machine doing the indexing is quite powerful I would assume
indexing (updating) should be quite fast (probably <10 mins for all
documents). However it's far longer than that and we are probably doing
something wrong.

What am I looking for:
* Idea what might cause this?
* Common things to check/tips for write performance, maybe some reading
material?
* Measuring write performance, e.g. maybe there are some ways to see what
takes time on Solr side on monitoring to narrow down on what actually takes
time?

Sorry if I misused some Solr term.

Thank you for your tips / insights in advance.

Marius


Solr deletes all collections after recreating (docker)

2022-06-08 Thread Yurii Aleshchenko
Hi all, I tested solr:7.5.0 and solr-8-11-1 docker images as single node
and as cluster  solr-cluster + zk-cluster , and I found some problems,
maybe it is some new issue.

I configured Solr with a data volume folder /var/solr/data where there are
solr.xml and zoo.cfg files, when I created collections via admin panel or
restored from backup my collections , all collections and config files I
see in /var/solr/data , all right.

If I delete solr+zk docker containers (i do docker-compose down) and I have
docker volume /var/solr/data , and i see solr data there after deleting
solr+zk docker containers, all right

But when I want to start it (docker-compose up -d) again in some time, solr
writes an error to docker log, what is attached below, and all my cores and
collections were deleted by solr.

How can I save my data in docker volume and why solr deleted all
collections, cores on startup ?


Re: Solr deletes all collections after recreating (docker)

2022-06-08 Thread Vincenzo D'Amore
Could you please post your docker-compose.yaml?

On Wed, Jun 8, 2022 at 1:36 PM Yurii Aleshchenko <
yurii.aleshche...@kommunion.com> wrote:

> Hi all, I tested solr:7.5.0 and solr-8-11-1 docker images as single node
> and as cluster  solr-cluster + zk-cluster , and I found some problems,
> maybe it is some new issue.
>
> I configured Solr with a data volume folder /var/solr/data where there are
> solr.xml and zoo.cfg files, when I created collections via admin panel or
> restored from backup my collections , all collections and config files I
> see in /var/solr/data , all right.
>
> If I delete solr+zk docker containers (i do docker-compose down) and I
> have docker volume /var/solr/data , and i see solr data there after
> deleting solr+zk docker containers, all right
>
> But when I want to start it (docker-compose up -d) again in some time,
> solr writes an error to docker log, what is attached below, and all my
> cores and collections were deleted by solr.
>
> How can I save my data in docker volume and why solr deleted all
> collections, cores on startup ?
>
>
>

-- 
Vincenzo D'Amore


Re: Solr indexing performance tips

2022-06-08 Thread Shawn Heisey

On 6/8/2022 3:35 AM, Marius Grigaitis wrote:

* 9 different cores. Each weighs around ~100 MB on disk and has
approximately 90k documents inside each.
* Updating is performed using update method in batches of 1000, around 9
processes in parallel (split by core)


This means that indexing within each Solr core is single-threaded.  The 
way to increase indexing speed is to index in parallel with multiple 
threads or processes per index.  If you can increase the CPU power 
available on the Solr server when you increase the number of 
processes/threads sending data to Solr, that might help.


Thanks,
Shawn



RE: Re: Solr deletes all collections after recreating (docker)

2022-06-08 Thread Yurii Aleshchenko

yes, no problem , attached below:

On 2022/06/08 11:58:12 Vincenzo D'Amore wrote:
> Could you please post your docker-compose.yaml?
>
> On Wed, Jun 8, 2022 at 1:36 PM Yurii Aleshchenko <
> yurii.aleshche...@kommunion.com> wrote:
>
> > Hi all, I tested solr:7.5.0 and solr-8-11-1 docker images as single 
node

> > and as cluster solr-cluster + zk-cluster , and I found some problems,
> > maybe it is some new issue.
> >
> > I configured Solr with a data volume folder /var/solr/data where 
there are
> > solr.xml and zoo.cfg files, when I created collections via admin 
panel or
> > restored from backup my collections , all collections and config 
files I

> > see in /var/solr/data , all right.
> >
> > If I delete solr+zk docker containers (i do docker-compose down) and I
> > have docker volume /var/solr/data , and i see solr data there after
> > deleting solr+zk docker containers, all right
> >
> > But when I want to start it (docker-compose up -d) again in some time,
> > solr writes an error to docker log, what is attached below, and all my
> > cores and collections were deleted by solr.
> >
> > How can I save my data in docker volume and why solr deleted all
> > collections, cores on startup ?
> >
> >
> >
>
> --
> Vincenzo D'Amore
>

docker-compose.yml
Description: application/yaml


deleteByQuery is killing Solr servers with OOM

2022-06-08 Thread Parag Ninawe
Hi,
We are using Solr 7.7.3
We have observed that deleteByQuery parameter causing sudden spikes in JVM
causing OOM.

Can someone please guide me regarding the Solr configuration parameters
which I should check

Thanks


Re: deleteByQuery is killing Solr servers with OOM

2022-06-08 Thread Shawn Heisey

On 6/8/2022 6:45 AM, Parag Ninawe wrote:

We are using Solr 7.7.3
We have observed that deleteByQuery parameter causing sudden spikes in JVM
causing OOM.


What do you know about the OOM?  Is it an OS-level OOM or a Java level 
OOM?  If it's Java, have you seen the actual exception?  A whole bunch 
of things can cause Java OOM, and it is not always memory.


What evidence do you have that OOM is occurring?  Where is the problem 
seen, and what are the exact symptoms that led you to conclude that OOM 
was the problem?


Thanks,
Shawn



Re: Solr deletes all collections after recreating (docker)

2022-06-08 Thread Shawn Heisey

On 6/8/2022 4:01 AM, Yurii Aleshchenko wrote:
How can I save my data in docker volume and why solr deleted all 
collections, cores on startup ?


It sounds like when you are recreating the setup, that you are starting 
with a brand new and empty ZK database.


All the collection configuration is in ZK, and any core that exists in a 
SolrCloud node that does not reference a known collection in the ZK 
database is deleted on startup.


If you want to preserve a SolrCloud install when rebuilding the docker 
containers, you will need to preserve the ZK database as well as the 
cores in the Solr home, or Solr will delete all the cores on startup.


Thanks,
Shawn



Re: Solr deletes all collections after recreating (docker)

2022-06-08 Thread Vincenzo D'Amore
I agree with Shawn.
Zookeeper saves the data log into /datalog and I see you missed adding the
volume for /datalog for zookeeper in your docker-compose.
That should do the trick

On Wed, Jun 8, 2022 at 3:21 PM Shawn Heisey  wrote:

> On 6/8/2022 4:01 AM, Yurii Aleshchenko wrote:
> > How can I save my data in docker volume and why solr deleted all
> > collections, cores on startup ?
>
> It sounds like when you are recreating the setup, that you are starting
> with a brand new and empty ZK database.
>
> All the collection configuration is in ZK, and any core that exists in a
> SolrCloud node that does not reference a known collection in the ZK
> database is deleted on startup.
>
> If you want to preserve a SolrCloud install when rebuilding the docker
> containers, you will need to preserve the ZK database as well as the
> cores in the Solr home, or Solr will delete all the cores on startup.
>
> Thanks,
> Shawn
>
>

-- 
Vincenzo D'Amore


RE: Re: Solr deletes all collections after recreating (docker)

2022-06-08 Thread Yurii Aleshchenko

thank you very much you and Shawn, it is working

On 2022/06/08 13:51:45 Vincenzo D'Amore wrote:
> I agree with Shawn.
> Zookeeper saves the data log into /datalog and I see you missed 
adding the

> volume for /datalog for zookeeper in your docker-compose.
> That should do the trick
>
> On Wed, Jun 8, 2022 at 3:21 PM Shawn Heisey  wrote:
>
> > On 6/8/2022 4:01 AM, Yurii Aleshchenko wrote:
> > > How can I save my data in docker volume and why solr deleted all
> > > collections, cores on startup ?
> >
> > It sounds like when you are recreating the setup, that you are starting
> > with a brand new and empty ZK database.
> >
> > All the collection configuration is in ZK, and any core that exists 
in a

> > SolrCloud node that does not reference a known collection in the ZK
> > database is deleted on startup.
> >
> > If you want to preserve a SolrCloud install when rebuilding the docker
> > containers, you will need to preserve the ZK database as well as the
> > cores in the Solr home, or Solr will delete all the cores on startup.
> >
> > Thanks,
> > Shawn
> >
> >
>
> --
> Vincenzo D'Amore
>


Re: Solr indexing performance tips

2022-06-08 Thread Jan Høydahl
* Go multi threaded for each core as Shawn says. Try e.g. 2, 3 and 4 threads
* Experiment with different batch sizes, e.g. try 500 and 2000 - depends on 
your docs what is optimal
* Do NOT commit after each batch of 1000 docs. Instead, commit as seldom as 
your requirements allows, e.g. try commitWithin=6 to commit every minute

Tip: Try to push Solr metrics to DataDog or some other service, where you can 
see a dashboard with stats on requests/sec, RAM, CPU, threads, GC etc which may 
answer your last question.

Jan

> 8. jun. 2022 kl. 14:06 skrev Shawn Heisey :
> 
> On 6/8/2022 3:35 AM, Marius Grigaitis wrote:
>> * 9 different cores. Each weighs around ~100 MB on disk and has
>> approximately 90k documents inside each.
>> * Updating is performed using update method in batches of 1000, around 9
>> processes in parallel (split by core)
> 
> This means that indexing within each Solr core is single-threaded.  The way 
> to increase indexing speed is to index in parallel with multiple threads or 
> processes per index.  If you can increase the CPU power available on the Solr 
> server when you increase the number of processes/threads sending data to 
> Solr, that might help.
> 
> Thanks,
> Shawn
> 



Re: Solr indexing performance tips

2022-06-08 Thread David Hastings
> * Do NOT commit after each batch of 1000 docs. Instead, commit as seldom
as your requirements allows, e.g. try commitWithin=6 to commit every
minute

this is the big one.  commit after the entire process is done or on a
timer, if you don't need NRT searching, rarely does anyone ever need that.
the commit is a heavy operation and takes about the same time if you are
committing 1000 documents or 100k documents.

On Wed, Jun 8, 2022 at 10:40 AM Jan Høydahl  wrote:

> * Go multi threaded for each core as Shawn says. Try e.g. 2, 3 and 4
> threads
> * Experiment with different batch sizes, e.g. try 500 and 2000 - depends
> on your docs what is optimal
> * Do NOT commit after each batch of 1000 docs. Instead, commit as seldom
> as your requirements allows, e.g. try commitWithin=6 to commit every
> minute
>
> Tip: Try to push Solr metrics to DataDog or some other service, where you
> can see a dashboard with stats on requests/sec, RAM, CPU, threads, GC etc
> which may answer your last question.
>
> Jan
>
> > 8. jun. 2022 kl. 14:06 skrev Shawn Heisey :
> >
> > On 6/8/2022 3:35 AM, Marius Grigaitis wrote:
> >> * 9 different cores. Each weighs around ~100 MB on disk and has
> >> approximately 90k documents inside each.
> >> * Updating is performed using update method in batches of 1000, around 9
> >> processes in parallel (split by core)
> >
> > This means that indexing within each Solr core is single-threaded.  The
> way to increase indexing speed is to index in parallel with multiple
> threads or processes per index.  If you can increase the CPU power
> available on the Solr server when you increase the number of
> processes/threads sending data to Solr, that might help.
> >
> > Thanks,
> > Shawn
> >
>
>


Solr compatibility with Oracle Database 19c Database

2022-06-08 Thread Yennam, M
Hi ,

We are currently using Solr 4.9.0 which is connecting Oracle 12cR1 and we are 
planning to upgrade our Database to Oracle 19c. So, the question that I have is 
- Is SOLR 4.9.0 compatible with Oracle 19c, if not what is the minimum version 
of SOLR that supports Oracle 19c database.

Appreciate if someone can help me answer this question.

Thanks & Regards,
Mahesh Yennam | Systems Integration Advisor
m.yen...@nttdata.com | m. +1.602.903.9656 | 
nttdataservices.com | 
@nttdataservices
NTT DATA Information Processing Services Private Limited
Consulting | Industry Solutions | Digital | Application & Infrastructure 
Services | Cloud | BPO
Learn more:
[cid:image001.jpg@01D87B33.A7AC11D0]
[cid:image002.jpg@01D87B33.A7AC11D0]
[cid:image003.jpg@01D87B33.A7AC11D0]
[cid:image004.jpg@01D87B33.A7AC11D0]
[cid:image005.jpg@01D87B33.A7AC11D0]


Disclaimer: This email and any attachments are sent in strictest confidence for 
the sole use of the addressee and may contain legally privileged, confidential, 
and proprietary data. If you are not the intended recipient, please advise the 
sender by replying promptly to this email and then delete and destroy this 
email and any attachments without any further use, copying or forwarding.


Re: Solr compatibility with Oracle Database 19c Database

2022-06-08 Thread Andy Lester


> On Jun 8, 2022, at 2:35 PM, Yennam, M  wrote:
> 
> We are currently using Solr 4.9.0 which is connecting Oracle 12cR1 and we are 
> planning to upgrade our Database to Oracle 19c. So, the question that I have 
> is – Is SOLR 4.9.0 compatible with Oracle 19c, if not what is the minimum 
> version of SOLR that supports Oracle 19c database.

How are you getting data from Oracle into Solr? Are you using the 
DataImportHandler? If you’re not using the DIH, then I don’t think you’re 
connecting to Oracle directly, and then it’s a non-issue.

Andy

Re: Update Processor in Nested Document

2022-06-08 Thread Mikhail Khludnev
I checked the source. There's nothing like this there yet.

On Wed, Jun 8, 2022 at 10:50 AM maoredri  wrote:

> Hi, everyone.
> First I want say that English is not my first language, so I apologize for
> any mistakes.
>
> Recently, we moved from Solr 6 to Solr 8.
> now we want to start using nested document in our collections.
> we going over our custom plugins in order to make them work for nested
> document and we were wondering if Solr's built in update processors support
> nested documents.
> for example, given the document:
> {
> "id": "321",
> "nestField":[
> {
> "id": "321.456",
> "coolField": [12,12,1]
> }
> ]
> }
>
> can I ​use [UniqFieldsUpdateProcessor](
> https://lucene.apache.org/solr/6_6_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html)
> on "coolField"? If yes, how do I configure the processor to the nested
> field?
>
> Thanks,
> Maor Edri
>
> Sent with [Proton Mail](https://proton.me/) secure email.



-- 
Sincerely yours
Mikhail Khludnev


Latencies when Solr node doesn't host queried collection

2022-06-08 Thread Elaine Cario
Sorry for such a long post.

We have a 4-node SolrCloud running Solr 8.11.1.  There are 2 nodes in one
AWS region, and 2 nodes in another region.  All nodes are in peered VPC.
All communications between the nodes are direct IP calls (no DNS).  One
node in each region holds replicas of multiple collections (single-shard
collections).  The other 2 nodes (1 in each region) are empty.  Why we did
that will become apparent.

Zookeeper is a 3 node ensemble, with 1 node in each of the two SolrCloud
regions and a 3rd node in a completely different region.

We are having issues with very high latencies, which were sometimes
resolved by restarting Solr, but sometimes restarting Solr made it worse. A
lot worse.  Sometimes restarting improved things, but then it would
suddenly go bad.

Through a serendipitous side investigation of a blank Tree in the Admin UI,
we found that making the .../solr/admin/zookeeper call would take anywhere
from milliseconds to 10 seconds to 60 seconds.  The latencies were
perfectly correlated with which Zookeeper that particular Solr node was
"attached" to.  Same region: milliseconds, other region - 10 seconds. And
the outlier ZK region - 60 seconds.

Seems like some network issue, yes?  I agree, but I'm trying to convince
our network engineers that it's something inherent in Solr or Zookeeper.

The odd thing is that the query latencies seem to hinge on whether the node
which receives the query actually has at least 1 shard for the queried
collection.  We deployed Dynatrace agents to peer into what might be
happening, but all I end up seeing is there are long waits in
ZkStateReader$LazyCollectionRef.get, but only when the node doesn't have
the collection being queried.

So I'd like to understand better the difference in how Solr manages these
collection configs when the collection is resident or not.
LazyCollectionRef seems to be called when the collection isn't there, and
the timeout for the cache is 2 seconds (solr.OverseerStateUpdateDelay). Do
resident collections run down a different code path?  I ran across this old
change: https://issues.apache.org/jira/browse/SOLR-6629, which seems
related, but only in that it is the father of the current code. If I can
explain with conviction that Solr behaves differently with resident vs
non-resident collections, then I have a path forward to kick networking to
look at this, or suck it up and make sure that every collection is
represented with a shard on each node (which I think is a stupid
work-around especially for small collections, but I gotta do what I gotta
do).

Thanks for your attention!


Re: Solr compatibility with Oracle Database 19c Database

2022-06-08 Thread Gus Heck
Also note that use of Data Import Handler (DIH) is not supported by the
Solr community anymore. DIH has become a separate project (
https://github.com/rohitbemax/dataimporthandler) and seems to be in need of
some folks who care enough to contribute fixes to it. Using another tool or
custom code to query the database and submit updates via the solr JSON api
or SolrJ client is currently recommended over DIH.

On Wed, Jun 8, 2022 at 4:04 PM Andy Lester  wrote:

>
>
> > On Jun 8, 2022, at 2:35 PM, Yennam, M  wrote:
> >
> > We are currently using Solr 4.9.0 which is connecting Oracle 12cR1 and
> we are planning to upgrade our Database to Oracle 19c. So, the question
> that I have is – Is SOLR 4.9.0 compatible with Oracle 19c, if not what is
> the minimum version of SOLR that supports Oracle 19c database.
>
> How are you getting data from Oracle into Solr? Are you using the
> DataImportHandler? If you’re not using the DIH, then I don’t think you’re
> connecting to Oracle directly, and then it’s a non-issue.
>
> Andy



-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Solr compatibility with Oracle Database 19c Database

2022-06-08 Thread Andy Lester
> some folks who care enough to contribute fixes to it. Using another tool or
> custom code to query the database and submit updates via the solr JSON api
> or SolrJ client is currently recommended over DIH.

That’s why I had to write a tool to do the exporting from Oracle, massaging 
into JSON, and posting to Solr, before we migrated from Solr 4 to Solr 8 like 
OP is looking for.

The big benefit of this is that it allowed me to have multiple importers 
running at once.  A full reindex went from taking 8 hours via the DIH to taking 
about 90 minutes with 10 importers running.

It also means that we don’t have to worry about the DIH connection as we 
migrate from Oracle 12 to Oracle 19, as OP is.  OP seems to be in the same 
situation I was in a year ago.

Andy

Re: "this.stopWords" is null

2022-06-08 Thread Chris Hostetter


I suspect you are hitting this bug...

https://issues.apache.org/jira/browse/SOLR-16203

...but AFAIK that would only happen if you are are  explicitly using 
ClassicIndexSchemaFactory in your solrconfig.xml ... can you confirm?

Assuming I'm correct, then either switching to ManagedIndexSchemaFactory 
(and renaming your schema.xml accordingly, or letting it rename 
automatically on startup) *OR* switching all your factory decalrations to 
use the "solr.ClassName" syntax should make the problem go away.

If it does not, that's very curiuos -- and a fully copy of your entire 
configset would probably be needed to give you additional advice.



: Date: Tue, 7 Jun 2022 10:59:26 -0500
: From: Thomas Woodard 
: Reply-To: users@solr.apache.org
: To: users@solr.apache.org
: Subject: Re: "this.stopWords" is null
: 
: Commenting out the stop filter allowed documents to be indexed, confirming
: it was actually the problem. But then queries fail because of not being
: able to find the synonyms for what looks like a similar reason.
: 
: I've also tried switching the files to use absolute paths like below, but
: that also does not work:
: 
: 
: It certainly seems like the Solr configuration is simply not initializing
: the Lucene filters correctly.
: 
: On Tue, Jun 7, 2022 at 9:22 AM Thomas Woodard  wrote:
: 
: > I had an 8.11.1 implementation in progress when 9.0 came out, and am
: > trying to convert it so we don't go live on an already outdated version.
: > I'm having trouble adding documents to the index that worked fine with
: > 8.11.1. Shortened error is below:
: >
: > 2022-06-07 13:49:24.190 ERROR (qtp554868511-21) [ x:sku]
: > o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception
: > writing document id 6-1-TB-0701 to the index; possible analysis error. =>
: > org.apache.solr.common.SolrException: Exception writing document id
: > 6-1-TB-0701 to the index; possible analysis error.
: > Caused by: java.lang.NullPointerException: Cannot invoke
: > "org.apache.lucene.analysis.CharArraySet.contains(char[], int, int)"
: > because "this.stopWords" is null
: > at
: > org.apache.lucene.analysis.StopFilter.accept(StopFilter.java:97) ~[?:?]
: > at
: > 
org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:52)
: > ~[?:?]
: > at
: > 
org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:37)
: > ~[?:?]
: > at
: > 
org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1142)
: > ~[?:?]
: > at
: > org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729)
: > ~[?:?]
: > at
: > 
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620)
: > ~[?:?]
: > at
: > 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:239)
: > ~[?:?]
: > at
: > 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)
: > ~[?:?]
: > at
: > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1530)
: > ~[?:?]
: > at
: > org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1519)
: > ~[?:?]
: > at
: > 
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:1046)
: > ~[?:?]
: > at
: > 
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:416)
: > ~[?:?]
: > at
: > 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:369)
: > ~[?:?]
: > at
: > 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:300)
: > ~[?:?]
: > ... 80 more
: >
: > I have double checked all the stop filters in my schema.xml in my
: > configset, and they all seem fine. The import should only be using
: > text_general, which is configured like this:
: > positionIncrementGap="100" multiValued="true">
: >   
: > 
: > 
: > 
: > 
: >   
: >   
: > 
: > 
: >  ignoreCase="true" expand="true"/>
: > 
: >   
: > 
: >
: > I can't figure out what the problem is, or how to do more detailed
: > debugging to find it. Any help would be greatly appreciated.
: >
: >
: 

-Hoss
http://www.lucidworks.com/


Core Admin API - rename core issue on 8.11

2022-06-08 Thread Shamik Bandopadhyay
Hi,

  After upgrading from 7.5 to 8.11, the core admin API for renaming core
has stopped working. For e.g. when I try to run this,

https://internal-kp-stage.test.com/solr/admin/cores?action=RENAME&core=knowledge_shard1_replica_n1&other=knowledge

It throws the following error.

{
  "responseHeader": {
"status": 400,
"QTime": 140
  },
  "error": {
"metadata": [
  "error-class",
  "org.apache.solr.common.SolrException",
  "root-error-class",
  "org.apache.solr.common.SolrException"
],
"msg": "Not supported in SolrCloud",
"code": 400
  }
}

This call works fine in 7.5 . As per the documentation, the API format
seems to be right.

https://solr.apache.org/guide/8_11/coreadmin-api.html#coreadmin-rename

Any pointers will be appreciated.

Thanks,
Shamik


Re: Core Admin API - rename core issue on 8.11

2022-06-08 Thread Shawn Heisey

On 6/8/2022 5:06 PM, Shamik Bandopadhyay wrote:

 "msg": "Not supported in SolrCloud",


Using the CoreAdmin API when running in cloud mode is a REALLY bad 
idea.  The CoreAdmin API cannot touch information in zookeeper. With 
part of the information for a SolrCloud collection being in zookeeper 
and part of it on the disk in the core.properties files, it causes 
serious problems to try and use that API in cloud mode.


Don't do it.  Use the Collections API to make modifications to your indexes.

Looks like we made it explicitly not supported at some point in 8.x, and 
I can tell you that is a decision that will NOT be reversed.  We've seen 
too many people break a working SolrCloud install completely by using 
the CoreAdmin API to make changes.


If you are absolutely certain that the change you are wanting to make 
will not break the collection (and I am pretty sure that renaming a core 
would break something) then you can manually modify the core.properties 
file and restart Solr.  You might also need to update the collection 
information in zookeeper, which is why CoreAdmin has been disabled in 
cloud mode.


Thanks,
Shawn



Re: Core Admin API - rename core issue on 8.11

2022-06-08 Thread Shamik Bandopadhyay
Hi Shawn,

  Thanks for the insight. As you've mentioned, renaming the core name in
the core properties file does create unwanted consequences. I did give it a
try in a test environment earlier. Renaming the core is not essential for
us, it's just to add some convenience for a few folks using Solr UI to
browse the data. It'll be good to have the documentation updated at
some point to remove unsupported dthe core admin APIs.

Appreciate your help.

Thanks,
Shamik

On Wed, Jun 8, 2022 at 4:21 PM Shawn Heisey  wrote:

> On 6/8/2022 5:06 PM, Shamik Bandopadhyay wrote:
> >  "msg": "Not supported in SolrCloud",
>
> Using the CoreAdmin API when running in cloud mode is a REALLY bad
> idea.  The CoreAdmin API cannot touch information in zookeeper. With
> part of the information for a SolrCloud collection being in zookeeper
> and part of it on the disk in the core.properties files, it causes
> serious problems to try and use that API in cloud mode.
>
> Don't do it.  Use the Collections API to make modifications to your
> indexes.
>
> Looks like we made it explicitly not supported at some point in 8.x, and
> I can tell you that is a decision that will NOT be reversed.  We've seen
> too many people break a working SolrCloud install completely by using
> the CoreAdmin API to make changes.
>
> If you are absolutely certain that the change you are wanting to make
> will not break the collection (and I am pretty sure that renaming a core
> would break something) then you can manually modify the core.properties
> file and restart Solr.  You might also need to update the collection
> information in zookeeper, which is why CoreAdmin has been disabled in
> cloud mode.
>
> Thanks,
> Shawn
>
>


Re: Solr compatibility with Oracle Database 19c Database

2022-06-08 Thread dmitri maziuk

On 2022-06-08 3:01 PM, Andy Lester wrote:




On Jun 8, 2022, at 2:35 PM, Yennam, M  wrote:

We are currently using Solr 4.9.0 which is connecting Oracle 12cR1 and we are 
planning to upgrade our Database to Oracle 19c. So, the question that I have is 
– Is SOLR 4.9.0 compatible with Oracle 19c, if not what is the minimum version 
of SOLR that supports Oracle 19c database.


How are you getting data from Oracle into Solr? Are you using the 
DataImportHandler? If you’re not using the DIH, then I don’t think you’re 
connecting to Oracle directly, and then it’s a non-issue.



I expect DIH just talks to oracle jdbc jar and if there are any 
incompatibilities, they would be solved by updating that jar. It's just 
a guess though.


Dima