Recommended Java version on Ubuntu 20.04, GC

2023-01-26 Thread Adam Sjøgren
  Hi,


We have a Solr Cloud cluster running 8.11.2 on 16 servers that have just
been upgraded to Ubuntu 20.04 (from 18.04).

It looks like we are getting shards down/recovering more often than
previously, so I'm wondering what version of Java and which garbage
collector is recommended on Ubuntu 20.04?

On https://solr.apache.org/docs/8_11_2/SYSTEM_REQUIREMENTS.html is says
Java 8 or higher - we are running 11.0.17+8-1ubuntu2~20.04.

That page also links to
https://cwiki.apache.org/confluence/display/lucene/JavaBugs which very
clearly says not to use the G1 garbage collector and that the page isn't
outdated. Hm.


  Best regards,

Adam

-- 
 "The laws of perspective have been repealed!   Adam Sjøgren
  Objects no longer diminish in size with distance!"   a...@koldfront.dk



Re: Recommended Java version on Ubuntu 20.04, GC

2023-01-26 Thread Jan Høydahl
Can you say something about the root cause for solr processes to crash? Are 
they killed by Linux?
Which version of Java did you run on 18.04? Other changes done at the same 
time, such as OS-level settings for ulimits, vm.max_map_count, swappiness etc.

If you have not fine-tuned your JVM settings, it is recommended to run with the 
default JVM/GC settings. G1 in latest Java-11 shoud be fine.

Jan


> 26. jan. 2023 kl. 14:26 skrev Adam Sjøgren :
> 
>  Hi,
> 
> 
> We have a Solr Cloud cluster running 8.11.2 on 16 servers that have just
> been upgraded to Ubuntu 20.04 (from 18.04).
> 
> It looks like we are getting shards down/recovering more often than
> previously, so I'm wondering what version of Java and which garbage
> collector is recommended on Ubuntu 20.04?
> 
> On https://solr.apache.org/docs/8_11_2/SYSTEM_REQUIREMENTS.html is says
> Java 8 or higher - we are running 11.0.17+8-1ubuntu2~20.04.
> 
> That page also links to
> https://cwiki.apache.org/confluence/display/lucene/JavaBugs which very
> clearly says not to use the G1 garbage collector and that the page isn't
> outdated. Hm.
> 
> 
>  Best regards,
> 
>Adam
> 
> -- 
> "The laws of perspective have been repealed!   Adam Sjøgren
>  Objects no longer diminish in size with distance!"   a...@koldfront.dk
> 



Re: Recommended Java version on Ubuntu 20.04, GC

2023-01-26 Thread Adam Sjøgren
Jan writes:

> Can you say something about the root cause for solr processes to
> crash? Are they killed by Linux?

They are usually not crashing, I "just" see shards go into down/recovery
state.

Some of the time they recover without intervention, some of the time a
shard or two stays down, and only recover if I restart the affected Solr
instance.

Usually the load on the affected server is high and following the log
files does show more GC-activity when this happens.

> Which version of Java did you run on 18.04?

We were running Java 11 on Ubuntu 18.04 as well, so that would have been
11.0.17+8-1ubuntu2~18.04.

> Other changes done at the same time, such as OS-level settings for
> ulimits, vm.max_map_count, swappiness etc.

No other changes - just everything that the Ubuntu 18.04 to 20.04
upgrade did, including the Linux kernel version. Which makes it a little
hard to pin down, so I was fishing after known Ubuntu 20.04 gotchas :-)

> If you have not fine-tuned your JVM settings, it is recommended to run
> with the default JVM/GC settings. G1 in latest Java-11 shoud be fine.

A long time ago we adjusted according to the suggestions on
https://cwiki.apache.org/confluence/display/solr/ShawnHeisey#ShawnHeisey-G1(GarbageFirst)Collector

So we are running with:

# These GC settings have shown to work well for a number of common Solr 
workloads
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:MaxGCPauseMillis=250 \
-XX:+UseLargePages \
-XX:AutoBoxCacheMax=2 \
-XX:BiasedLockingStartupDelay=500 \
-XX:G1HeapRegionSize=32m \
-XX:InitiatingHeapOccupancyPercent=75 \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/log/solr \
"

and SOLR_HEAP="12g".


  Best regards,

Adam

-- 
 "Tears in wavesAdam Sjøgren
  Lights on fire"  a...@koldfront.dk



Re: Recommended Java version on Ubuntu 20.04, GC

2023-01-26 Thread Shawn Heisey

On 1/26/23 06:26, Adam Sjøgren wrote:

We have a Solr Cloud cluster running 8.11.2 on 16 servers that have just
been upgraded to Ubuntu 20.04 (from 18.04).

It looks like we are getting shards down/recovering more often than
previously, so I'm wondering what version of Java and which garbage
collector is recommended on Ubuntu 20.04?

On https://solr.apache.org/docs/8_11_2/SYSTEM_REQUIREMENTS.html is says
Java 8 or higher - we are running 11.0.17+8-1ubuntu2~20.04.

That page also links to
https://cwiki.apache.org/confluence/display/lucene/JavaBugs which very
clearly says not to use the G1 garbage collector and that the page isn't
outdated. Hm.


Solr 8.x and later uses G1 by default.  I haven't seen any problems with 
it, even though Lucene recommends not using it.


For 8.x, I would use OpenJDK 11.  For 9.x, OpenJDK 17.

17 might work with 8.x, but it's minimum requirement is Java 8.  Jumping 
a lot of major Java versions beyond the minimum requirement might cause 
problems.  The latest version of Solr has a workaround for a problem 
with Java 17, but I don't think that workaround is there for 8.11.x.


I would choose the ZGC collector in most cases.  But I have noticed that 
indexing throughput is a little bit better with G1 than ZGC.  If every 
little bit of indexing speed is critical, stick with G1.


Thanks,
Shawn


How to create indexes for CSV dataset (with schema) in Java

2023-01-26 Thread marc nicole
Hello - I want to know how it is possible to create indexes for a certain
attribute (and its different values) of a document, using Java.

Thanks.


Re: How to create indexes for CSV dataset (with schema) in Java

2023-01-26 Thread Shawn Heisey

On 1/26/2023 2:22 PM, marc nicole wrote:

Hello - I want to know how it is possible to create indexes for a certain
attribute (and its different values) of a document, using Java.


This is a pretty vague question.  We need details of exactly what you're 
trying to do and what the environment looks like.


Thanks,
Shawn



Re: Recommended Java version on Ubuntu 20.04, GC

2023-01-26 Thread Adam Sjøgren
Shawn writes:

> Solr 8.x and later uses G1 by default.  I haven't seen any problems
> with it, even though Lucene recommends not using it.

Ok, good.

> For 8.x, I would use OpenJDK 11.  For 9.x, OpenJDK 17.

Sounds like we're on the right track version-wise then.

> I would choose the ZGC collector in most cases.  But I have noticed
> that indexing throughput is a little bit better with G1 than ZGC.  If
> every little bit of indexing speed is critical, stick with G1.

I will happily trade more stability (replicas not going into recovering
or down state) over throughput currently, so I think we will give ZGC a
go.

Any tips on reasonable settings for ZGC on a 80 node Solr cloud with ~3B
documents in a handful of collections and quite a bit of updates?


Thanks for the input, all!


  Best regards,

Adam

-- 
 "Why do you put your answer below the question and Adam Sjøgren
  trim the quoted text?" "It's about having minimala...@koldfront.dk
  courtesy to your readers by not forcing them to
  re-read stuff they just read."



Re: How to create indexes for CSV dataset (with schema) in Java

2023-01-26 Thread marc nicole
In, java i want to create index of a collection named person and say an
"Age" attribute for the values between 20 and 35. so that when i query Solr
on those values it retrieves them quickly how to do that ?

Le ven. 27 janv. 2023 à 03:42, Shawn Heisey  a écrit :

> On 1/26/2023 2:22 PM, marc nicole wrote:
> > Hello - I want to know how it is possible to create indexes for a certain
> > attribute (and its different values) of a document, using Java.
>
> This is a pretty vague question.  We need details of exactly what you're
> trying to do and what the environment looks like.
>
> Thanks,
> Shawn
>
>