Has this approach of importing all previous year's statistics into the "statistics" Solr core worked for others who have a lot of stats? For the last few days I've been trying to import all of the exported statistics files below after renaming the beginning of each CSV file to "statistics-XXXX-...", but no matter how high I set the `http.socket.timeout` parameter in Solr I get the SocketTimeoutException error below when importing the last ZIP file (statistics.zip).
I'm working with the most recent code on the main branch of the DSpace repository. I've increased the Java memory given to Solr to 2GB and added the same amount to the `bin/dspace` command, but that didn't seem to help, and in some cases made things worse. At the time that I get the socket timeout error and the import-statistics process stops running the "statistics" core usually has anywhere from 20-30 million docs in the index. Error message: Problem encountered while trying to import index statistics. org.apache.solr.client.solrj.SolrServerException: Timeout occurred while waiting response from server at: http://localhost:8983/solr/statistics at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:692) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) at org.dspace.util.SolrImportExport.importIndex(SolrImportExport.java:465) at org.dspace.util.SolrImportExport.main(SolrImportExport.java:148) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:277) at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:133) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98) Caused by: java.net.SocketTimeoutException: Read timed out at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283) at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309) at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350) at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803) at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:571) ... 12 more Statistics files: 52MB Jun 7 08:54 statistics-2014.zip 130MB Jun 7 08:58 statistics-2015.zip 222MB Jun 7 09:05 statistics-2017.zip 46MB Jun 7 09:06 statistics-2018.zip 300MB Jun 7 09:15 statistics-2019.zip 273MB Jun 7 09:22 statistics-2020.zip 415MB Jun 7 09:36 statistics-2021.zip 30MB Jun 7 09:37 statistics-2022.zip 687MB Jun 7 10:02 statistics.zip Thanks, Nick On Thursday, April 13, 2023 at 2:13:00 PM UTC-5 Tim Donohue wrote: > Thanks James & Tomas for sharing your hints/tips here! It's obvious we > didn't document this very well in the DSpace 7 Upgrade process. Just now, > I've done my best to summarize your advice & add more hints in Step 10(a) > of the Upgrade process to help others along. I even linked folks back to > this useful dspace-tech thread for more details. > > https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace > > If others have hints/tips to share, please do feel free to continue this > thread, or add comments to the docs & we'll get them taken into account. > > Tim > > On Thursday, April 13, 2023 at 1:54:47 PM UTC-5 ha...@oakland.edu wrote: > >> Thanks for the information. I had similar information sent to me by >> another person on the list so it seems we are all approaching this about >> the same way. I think I have my statistics imported at this point. >> To rename the statistics files I wound up using the following one liner: >> cd /opt/dspace/solr-export; for i in $(ls *.csv); do nf=$(echo $i | sed >> 's/-20[1-2][0-9]_e/_e/g'); mv -v $i $nf; done >> >> Another suggestion was to use rename (e.g. rename >> "s/-2015_export/_export/g" *) but for whatever reason that was not >> working for me on my RHEL 8.7 server. >> After renaming the files and running /opt/dspace/bin/dspace >> solr-import-statistics -i statistics it looks like I have the statistics >> imported. >> >> Thanks for the help. >> -Tomas >> >> On Tue, Apr 4, 2023 at 11:25 AM James Holobetz <jhol...@gmail.com> wrote: >> >>> Hi Tomas, >>> >>> I recently had this issue and I believe that I have found a solution, >>> which I will document in the next few days. The long and the short of it is >>> that DSpace 7 does not support solr shards. You have to create one >>> large solr shard (statistics) from the multiple shards. The >>> biggest problem I found doing this was that DSpace was only ingesting the >>> current year statistics only. The solution was to rename the *csv files >>> that are dumped by solr-export-statistics. For example: the csv files >>> for the solr core "statistics-2012" will look something like this -- >>> statistics-2012_export_2013-12_5.csv. You have to rename all the csv >>> files to remove the -2012 in the filename to look like >>> this: statistics_export_2013-12_5.csv. I downloaded the zipped up cores in >>> csv form to my windows machine so I could use a bulk rename tool to remove >>> the year suffix in each core. I then uploaded them to my linux box running >>> DSpace and ingested each one using the solr-import-statistics tool. >>> This is a very time consuming task. >>> >>> Hope this helps and I will try to document this in the next few days. >>> >>> Best regards, >>> >>> James Holobetz >>> >>> On Fri, Mar 24, 2023 at 3:37 PM Tomas Hajek <ha...@oakland.edu> wrote: >>> >>>> Hello, >>>> I am working on migrating a DSpace 5.10 installation to a new server >>>> running DSpace 7.5. I have the basic installation running on RHEL 8.7 >>>> with >>>> Tomcat 9.0.71, Solr 8.11.2, node.js 16.18.1, and pm2 5.2.2. >>>> I was able to import the database and assetstore and I set up the Solr >>>> cores (authority,oai,search,statistics) from the installation instructions. >>>> The Solr statistics from the 5.10 installation are sharded by year >>>> and I exported with the following: >>>> >>>> bin/dspace solr-export-statistics -i statistics-2015 >>>> bin/dspace solr-export-statistics -i statistics-2016 >>>> ... >>>> bin/dspace solr-export-statistics -i statistics-2022 >>>> >>>> I have copied the exported files to the new 7.5 server >>>> into /opt/dspace/solr-export and am trying to import them but I get the >>>> following error (example when trying to import the 2015 statistics): >>>> >>>> /opt/dspace/bin/dspace solr-import-statistics -i statistics-2015 >>>> Exception: Error from server at >>>> http://localhost:8983/solr/statistics-2015: Expected mime type >>>> application/octet-stream but got text/html. <html> >>>> <head> >>>> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> >>>> <title>Error 404 Not Found</title> >>>> </head> >>>> <body><h2>HTTP ERROR 404 Not Found</h2> >>>> <table> >>>> <tr><th>URI:</th><td>/solr/statistics-2015/admin/luke</td></tr> >>>> <tr><th>STATUS:</th><td>404</td></tr> >>>> <tr><th>MESSAGE:</th><td>Not Found</td></tr> >>>> <tr><th>SERVLET:</th><td>default</td></tr> >>>> </table> >>>> >>>> </body> >>>> </html> >>>> >>>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: >>>> Error from server at http://localhost:8983/solr/statistics-2015: >>>> Expected mime type application/octet-stream but got text/html. <html> >>>> <head> >>>> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/> >>>> <title>Error 404 Not Found</title> >>>> </head> >>>> <body><h2>HTTP ERROR 404 Not Found</h2> >>>> <table> >>>> <tr><th>URI:</th><td>/solr/statistics-2015/admin/luke</td></tr> >>>> <tr><th>STATUS:</th><td>404</td></tr> >>>> <tr><th>MESSAGE:</th><td>Not Found</td></tr> >>>> <tr><th>SERVLET:</th><td>default</td></tr> >>>> </table> >>>> >>>> </body> >>>> </html> >>>> >>>> at >>>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:635) >>>> at >>>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266) >>>> at >>>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) >>>> at >>>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) >>>> at >>>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231) >>>> at >>>> org.dspace.util.SolrImportExport.getMultiValuedFields(SolrImportExport.java:482) >>>> at >>>> org.dspace.util.SolrImportExport.importIndex(SolrImportExport.java:433) >>>> at org.dspace.util.SolrImportExport.main(SolrImportExport.java:148) >>>> at >>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native >>>> Method) >>>> at >>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) >>>> at >>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.base/java.lang.reflect.Method.invoke(Method.java:568) >>>> at >>>> org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:277) >>>> at >>>> org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:133) >>>> at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98) >>>> >>>> Presumably this is due to not having the sharded statistics-20## cores >>>> in Solr configured but I'm not sure at this point how to add and configure >>>> them so I can import the statistics. I am not very familiar with Solr. >>>> >>>> Can anyone enlighten me on how I might do this or correct my steps or >>>> let me know what else to look at. >>>> >>>> Any assistance would be greatly appreciated. >>>> Thank you, >>>> -Tomas >>>> >>>> -- >>>> All messages to this mailing list should adhere to the Code of Conduct: >>>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "DSpace Technical Support" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to dspace-tech...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/dspace-tech/CAPx-GQoBmwVH6byhm%2BZv4kg%3D5zmEH%3DQStGL-y1TTD%3D8qBQFo1w%40mail.gmail.com >>>> >>>> <https://groups.google.com/d/msgid/dspace-tech/CAPx-GQoBmwVH6byhm%2BZv4kg%3D5zmEH%3DQStGL-y1TTD%3D8qBQFo1w%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> >> >> -- >> >> Tomas Hajek >> ha...@oakland.edu >> 1-248-370-3505 <(248)%20370-3505> >> Assistant Director, Research Computing and Infrastructure >> Engineering >> University Technology Services >> Oakland University >> > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/c23c20fa-64c5-466c-8246-be2c2c9ea1ebn%40googlegroups.com.