Java Versions and GC Tuning

2023-12-14 Thread Walter Underwood
Thanks for the recommendation. Are you running this on Intel or ARM64? We’ve 
mostly moved to ARM64. —wunder

> On Dec 12, 2023, at 9:55 AM, Shawn Heisey  wrote:
> 
> Java 11 is a good solid choice.  Java 17 seems to perform a little better 
> than 11 on Solr 9.x, but I haven't actually measured it.  Java 8, 11, 17, and 
> 21 are the current LTS releases.
> 
> https://en.wikipedia.org/wiki/Java_version_history
> 
> I personally wouldn't trust Solr 8.x with Java 17, as that's a jump of 9 
> major Java versions ... but others have done it and I haven't seen reports of 
> problems.  I use Java 17 with Solr 9.x on Ubuntu.
> 
> With Java 11 or later, I think it would be a good idea to switch garbage 
> collection to ZGC.  I have this in my /etc/default/solr.in.sh:
> 
> GC_TUNE=" \
>  -XX:+UnlockExperimentalVMOptions \
>  -XX:+UseZGC \
>  -XX:+ParallelRefProcEnabled \
>  -XX:+ExplicitGCInvokesConcurrent \
>  -XX:+AlwaysPreTouch \
>  -XX:+UseNUMA \
> "



Re: Solr connection refused error

2023-12-14 Thread Shawn Heisey

On 12/13/23 22:25, Deepak Goel wrote:

On Thu, 14 Dec 2023, 10:11 Anuj Bhargava,  wrote:


There are some files in /var/solr/logs. For example -
solr_oom_killer-8983-2023-12-14_05_30_42.log

and  this contains -

Running OOM killer script for process 2861377 for Solr on port 8983
Killed process 2861377



Looks  like you are running out of memory. Increase your heap size


There are several resource depletion scenarios that cause OOME ... and 
only a subset of those scenarios involve memory.


We need to see the reason for the OOME in the exception to know what 
steps to take.


Solr versions before 9.2.0 may not log that exception.  In 9.2.0 the 
OOME handling was updated to something that ALWAYS logs the reason for 
the OOME.


I do not see any information here about what version of Solr is running.

Thanks,
Shawn



Re: Java Versions and GC Tuning

2023-12-14 Thread Shawn Heisey

On 12/14/23 08:37, Walter Underwood wrote:

Thanks for the recommendation. Are you running this on Intel or ARM64? We’ve 
mostly moved to ARM64. —wunder


This is on x64.

When ZGC was first released, it only worked on Linux/x64.

In Java 17, the supported architectures are:

Linux/x64
Linux/AArch64
macOS/x64
macOS/AArch64
Windows/x64
Windows/AArch64

Thanks,
Shawn



Re: Java Upgrade process

2023-12-14 Thread Shawn Heisey

On 12/13/23 01:15, Jim Morgan wrote:

So, just as a follow up, I'm going to post my BEFORE and AFTER solr.in.sh
configs in case I've removed anything vital. Any comments welcome, as they
don't really mean much to me.
=== BEFORE 
SOLR_HEAP="8000m"

GC_LOG_OPTS="-verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails \
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime"

GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"


The solr.in.sh file included with Solr does NOT have GC_LOG_OPS or 
GC_TUNE defined, so those had to have been added by whoever set up your 
Solr install.


You should remove them entirely and let Solr use its default settings 
for GC logging and GC tuning, unless you would like to switch to ZGC, in 
which case you can use the GC_TUNE that I provided elsewhere in this thread.


Thanks,
Shawn



Re: Java Versions and GC Tuning

2023-12-14 Thread Shawn Heisey

On 12/14/23 12:47, Shawn Heisey wrote:

On 12/14/23 08:37, Walter Underwood wrote:
Thanks for the recommendation. Are you running this on Intel or ARM64? 
We’ve mostly moved to ARM64. —wunder


This is on x64.

When ZGC was first released, it only worked on Linux/x64.

In Java 17, the supported architectures are:

     Linux/x64
     Linux/AArch64
     macOS/x64
     macOS/AArch64
     Windows/x64
     Windows/AArch64


On closer read, that list for Java 17 is applicable to Java itself, not 
necessarily ZGC.


But I would imagine that ZGC is present in the ARM version of Java.  If 
not, Solr probably won't start with ZGC options.


Thanks,
Shawn



Re: Java Versions and GC Tuning

2023-12-14 Thread Shawn Heisey

On 12/14/23 12:58, Shawn Heisey wrote:
On closer read, that list for Java 17 is applicable to Java itself, not 
necessarily ZGC.


And on a third read, I figured out that the list DOES apply to ZGC. 
Apologies for the noise!


Thanks,
Shawn



Re: Java Versions and GC Tuning

2023-12-14 Thread Shawn Heisey

On 12/14/23 13:04, Shawn Heisey wrote:

On 12/14/23 12:58, Shawn Heisey wrote:
On closer read, that list for Java 17 is applicable to Java itself, 
not necessarily ZGC.


And on a third read, I figured out that the list DOES apply to ZGC. 
Apologies for the noise!


They do have a nice clear table for what's supported and when that 
support became available:


https://wiki.openjdk.org/display/zgc#Main-SupportedPlatforms



Re: Java Versions and GC Tuning

2023-12-14 Thread Walter Underwood
Running this with Corretto 17 on my Apple Silicon MacBook:

java -XX:+PrintFlagsFinal

I see this line in the 500+ lines of output:

 bool UseZGC   = false  
   {product} {default}

If it was platform dependent, it would say {pd product} instead of {product}.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 14, 2023, at 12:07 PM, Shawn Heisey  
> wrote:
> 
> On 12/14/23 13:04, Shawn Heisey wrote:
>> On 12/14/23 12:58, Shawn Heisey wrote:
>>> On closer read, that list for Java 17 is applicable to Java itself, not 
>>> necessarily ZGC.
>> And on a third read, I figured out that the list DOES apply to ZGC. 
>> Apologies for the noise!
> 
> They do have a nice clear table for what's supported and when that support 
> became available:
> 
> https://wiki.openjdk.org/display/zgc#Main-SupportedPlatforms
> 



Re: Solr operator and side car container

2023-12-14 Thread Joel Bernstein
We found the place to add the side car. Seems to be working as expected.


Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Dec 12, 2023 at 10:35 AM Joel Bernstein  wrote:

> Hi,
>
> Is there any facility in the Solr operator for adding a sidecar container
> that runs along side the solr container? I see there is an initContainer
> but I don't see anything for adding another container.
>
> Thanks,
> Joel
>
>


Re: Java Versions and GC Tuning

2023-12-14 Thread Shawn Heisey

On 12/14/2023 13:41, Walter Underwood wrote:

Running this with Corretto 17 on my Apple Silicon MacBook:

java -XX:+PrintFlagsFinal

I see this line in the 500+ lines of output:

  bool UseZGC   = false 
{product} {default}

If it was platform dependent, it would say {pd product} instead of {product}.


That seems very good.  I see the same on Linux/x64 OpenJDK.  I think on 
Java 17, you don't even need the -XX:+UnlockExperimentalVMOptions which 
I think is required on Java 11.




Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-14 Thread Ishan Chattopadhyaya
If you're able to do multithreaded indexing, it will go much faster.

On Thu, 14 Dec, 2023, 6:51 pm Vince McMahon, 
wrote:

> Hi,
>
> I have written a  custom python program to Index which may provide a
> better control than DIH.
>
> But, it is still doing at most 5000 documentation.  I have enable debug
> logging to show after updating the log4j2.xml  I am doing a count of
> documents after each batch of indexing.
>
> I would really appreciate you help to fix the Indexing only 5000 documents.
>
> The solr.log is in enclosed zip.
>
> try:
> conn = psycopg2.connect(**postgres_connection)
> cursor = conn.cursor()
>
> # Replace "your_table" and "your_columns" with your actual table and
> columns
> postgres_query = """
> SELECT
> ROW_NUMBER() over (order by update_at)
> , id, name, update_at
> FROM city
> WHERE update_at >= %s
> LIMIT %s OFFSET %s
> """
> StartDT = '2023-07-29 00:00:00.'
>
> batch_size = 1000  # Set your desired batch size
> offset = 0
>
> while True:
> rows = fetch_data_batch(cursor, postgres_query, StartDT,
> batch_size, offset)
>
> # conver tuples to dicts, which is what solr wants
> if rows:
> print(f"Type of row: {type(rows[0])}")
> print(f"Example row content: {rows[0]}")
> # Get column names from cursor.description
> column_names= [desc[0] for desc in cursor.description]
> rows_as_dicts   = [dict(zip(column_names, row)) for row in
> rows]
>
> print(f"Type of row after rows_as_dicts: {type(rows_as_dicts[0
> ])}")
>
> if not rows:
> break  # No more data
>
> # Connect to Solr
> solr = Solr(solr_url, always_commit=True)
>
> # Index data into Solr
> docs = [
> {
> # Assuming each row is a dictionary where keys are field
> names
> "id":   str(row_dict["id"]),
> "name": row_dict["name"],
> "update_at":row_dict["update_at"].isoformat() if
> "update_at" in rows_as_dicts else None
> # Add more fields as needed
> }
> for row_dict in rows_as_dicts
> ]
>
>
> # Log the content of each document
> for doc in docs:
> logging.debug(f"Indexing document: {doc}")
>
> # Index data into Solr and get the response
> response = solr.add(docs)
>
> solr_core='p4a'
>
> # Construct the Solr select query URL
> select_query_url = f"http://localhost:8983/solr/{solr_core}
> /select?q=*:*&rows=0&wt=json"
>
> # Request the select query response
> select_response = requests.get(select_query_url)
>
> try:
> # Try to parse the response as JSON
> select_data = json.loads(select_response.text)
>
> # Extract the number of documents from the response
> num_docs = select_data.get("response", {}).get("numFound", -1)
>
> print(f"Total number of documents in the core '{solr_core}': {
> num_docs}")
>
> except json.decoder.JSONDecodeError as e:
> print(f"Error decoding JSON response: {e}")
> print(f"Response content: {select_response.text}")
>
>
> # Log the outcome of the indexing operation
> *logging.info ("Committing batch to Solr")*
>
> # Commit the changes to Solr
> *solr.commit()*
> time.sleep(1)
> offset += batch_size
>
> finally:
> # Close PostgreSQL connection
> cursor.close()
> conn.close()
>
> Example of content from Postgres:
> Type of row: 
> Example row content: (12001, 2175, '82a4343bc11b04f42ab309e161a0bdf3',
> datetime.datetime(2023, 7, 31, 9, 39, 13, 463165))
> Type of row after rows_as_dicts: 
> Total number of documents in the core 'p4a': 5000
>
> SELECT
> ROW_NUMBER() over (order by update_at)
> , id, name, update_at
> FROM city
> WHERE update_at >= '2023-07-29 00:00:00.'
> LIMIT 1000 *OFFSET 13000*
>
> Type of row: 
> Example row content: (13001, 3089, '048e212232b13f05559c69a81a268f73',
> datetime.datetime(2023, 7, 31, 14, 22, 14, 1572))
> Type of row after rows_as_dicts: 
> Total number of documents in the core 'p4a': 5000
>
> SELECT
> ROW_NUMBER() over (order by update_at)
> , id, name, update_at
> FROM city
> WHERE update_at >= '2023-07-29 00:00:00.'
>   

Re: my solr 8.11 is indexing 5000 only using custom code.

2023-12-14 Thread Mikhail Khludnev
NB: it's not easy to build a robust ETL from scratch (btw, have you asked
Copilot or chat gpt for it? ).
I spot a few oddities in the code, but they are not critical.
>From log I see (fwiw, you still have DEBUG log enabled) that 1000 recs were
added in 17 or something secs. It makes some sense. But then, it turns out
5000 recs returned on *:*. How could it be?
Exit condition `if Not rows:\n break` is not clear to me. Why should
it work?
What happens to the main paging loop? How does it stop? Does it?
Note, looping via LIMIT OFFSET is counterproductive. It's better to select
once, and tweak the driver to pull page-by-page lazily.
Also, how many distinct ids you have behind update_at >= '2023-07-29
00:00:00.'?


On Fri, Dec 15, 2023 at 7:56 AM Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> If you're able to do multithreaded indexing, it will go much faster.
>
> On Thu, 14 Dec, 2023, 6:51 pm Vince McMahon, <
> sippingonesandze...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have written a  custom python program to Index which may provide a
> > better control than DIH.
> >
> > But, it is still doing at most 5000 documentation.  I have enable debug
> > logging to show after updating the log4j2.xml  I am doing a count of
> > documents after each batch of indexing.
> >
> > I would really appreciate you help to fix the Indexing only 5000
> documents.
> >
> > The solr.log is in enclosed zip.
> >
> > try:
> > conn = psycopg2.connect(**postgres_connection)
> > cursor = conn.cursor()
> >
> > # Replace "your_table" and "your_columns" with your actual table and
> > columns
> > postgres_query = """
> > SELECT
> > ROW_NUMBER() over (order by
> update_at)
> > , id, name, update_at
> > FROM city
> > WHERE update_at >= %s
> > LIMIT %s OFFSET %s
> > """
> > StartDT = '2023-07-29 00:00:00.'
> >
> > batch_size = 1000  # Set your desired batch size
> > offset = 0
> >
> > while True:
> > rows = fetch_data_batch(cursor, postgres_query, StartDT,
> > batch_size, offset)
> >
> > # conver tuples to dicts, which is what solr wants
> > if rows:
> > print(f"Type of row: {type(rows[0])}")
> > print(f"Example row content: {rows[0]}")
> > # Get column names from cursor.description
> > column_names= [desc[0] for desc in
> cursor.description]
> > rows_as_dicts   = [dict(zip(column_names, row)) for row
> in
> > rows]
> >
> > print(f"Type of row after rows_as_dicts:
> {type(rows_as_dicts[0
> > ])}")
> >
> > if not rows:
> > break  # No more data
> >
> > # Connect to Solr
> > solr = Solr(solr_url, always_commit=True)
> >
> > # Index data into Solr
> > docs = [
> > {
> > # Assuming each row is a dictionary where keys are field
> > names
> > "id":   str(row_dict["id"]),
> > "name": row_dict["name"],
> > "update_at":row_dict["update_at"].isoformat() if
> > "update_at" in rows_as_dicts else None
> > # Add more fields as needed
> > }
> > for row_dict in rows_as_dicts
> > ]
> >
> >
> > # Log the content of each document
> > for doc in docs:
> > logging.debug(f"Indexing document: {doc}")
> >
> > # Index data into Solr and get the response
> > response = solr.add(docs)
> >
> > solr_core='p4a'
> >
> > # Construct the Solr select query URL
> > select_query_url = f"http://localhost:8983/solr/{solr_core}
> > /select?q=*:*&rows=0&wt=json"
> >
> > # Request the select query response
> > select_response = requests.get(select_query_url)
> >
> > try:
> > # Try to parse the response as JSON
> > select_data = json.loads(select_response.text)
> >
> > # Extract the number of documents from the response
> > num_docs = select_data.get("response", {}).get("numFound",
> -1)
> >
> > print(f"Total number of documents in the core '{solr_core}':
> {
> > num_docs}")
> >
> > except json.decoder.JSONDecodeError as e:
> > print(f"Error decoding JSON response: {e}")
> > print(f"Response content: {select_response.text}")
> >
> >
> > # Log the outcome of the indexing operation
> > *logging.info ("Committing batch to Solr")*
> >
> > # Commit the changes to Solr
> > *solr.commit()*
> > time.sleep(1)
> > offset += batch_size
> >
> > finally:
> > # Close PostgreSQL connection
> > cursor.close()
> > conn.close()
> >
> > Example of content from Postgres:
> > Type of row