Has this approach of importing all previous year's statistics into the 
"statistics" Solr core worked for others who have a lot of stats? For the 
last few days I've been trying to import all of the exported statistics 
files below after renaming the beginning of each CSV file to 
"statistics-XXXX-...", but no matter how high I set the 
`http.socket.timeout` parameter in Solr I get the 
SocketTimeoutException error below when importing the last ZIP file 
(statistics.zip). 

I'm working with the most recent code on the main branch of the DSpace 
repository. I've increased the Java memory given to Solr to 2GB and added 
the same amount to the `bin/dspace` command, but that didn't seem to help, 
and in some cases made things worse. At the time that I get the socket 
timeout error and the import-statistics process stops running the 
"statistics" core usually has anywhere from 20-30 million docs in the 
index. 

Error message: 

Problem encountered while trying to import index statistics.
org.apache.solr.client.solrj.SolrServerException: Timeout occurred while 
waiting response from server at: http://localhost:8983/solr/statistics
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:692)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
at org.dspace.util.SolrImportExport.importIndex(SolrImportExport.java:465)
at org.dspace.util.SolrImportExport.main(SolrImportExport.java:148)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at 
org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:277)
at 
org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:133)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:283)
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:309)
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)
at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:571)
... 12 more


Statistics files:

52MB Jun  7 08:54 statistics-2014.zip

130MB Jun  7 08:58 statistics-2015.zip

222MB Jun  7 09:05 statistics-2017.zip

46MB Jun  7 09:06 statistics-2018.zip

300MB Jun  7 09:15 statistics-2019.zip

273MB Jun  7 09:22 statistics-2020.zip

415MB Jun  7 09:36 statistics-2021.zip

30MB Jun  7 09:37 statistics-2022.zip

687MB Jun  7 10:02 statistics.zip

Thanks,
Nick


On Thursday, April 13, 2023 at 2:13:00 PM UTC-5 Tim Donohue wrote:

> Thanks James & Tomas for sharing your hints/tips here!  It's obvious we 
> didn't document this very well in the DSpace 7 Upgrade process.  Just now, 
> I've done my best to summarize your advice & add more hints in Step 10(a) 
> of the Upgrade process to help others along. I even linked folks back to 
> this useful dspace-tech thread for more details.
>
> https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace
>
> If others have hints/tips to share, please do feel free to continue this 
> thread, or add comments to the docs & we'll get them taken into account.
>
> Tim
>
> On Thursday, April 13, 2023 at 1:54:47 PM UTC-5 ha...@oakland.edu wrote:
>
>> Thanks for the information.  I had similar information sent to me by 
>> another person on the list so it seems we are all approaching this about 
>> the same way.  I think I have my statistics imported at this point.  
>> To rename the statistics files I wound up using the following one liner: 
>> cd /opt/dspace/solr-export; for i in $(ls *.csv); do nf=$(echo $i | sed 
>> 's/-20[1-2][0-9]_e/_e/g'); mv -v $i $nf; done
>>
>> Another suggestion was to use rename (e.g. rename 
>> "s/-2015_export/_export/g"  *) but for whatever reason that was not 
>> working for me on my RHEL 8.7 server.
>> After renaming the files and running /opt/dspace/bin/dspace 
>> solr-import-statistics -i statistics it looks like I have the statistics 
>> imported.
>>
>> Thanks for the help.
>> -Tomas
>>
>> On Tue, Apr 4, 2023 at 11:25 AM James Holobetz <jhol...@gmail.com> wrote:
>>
>>> Hi Tomas,
>>>
>>> I recently had this issue and I believe that I have found a solution, 
>>> which I will document in the next few days. The long and the short of it is 
>>> that DSpace 7 does not support solr shards. You have to create one 
>>> large solr shard (statistics) from the multiple shards. The 
>>> biggest problem I found doing this was that DSpace was only ingesting the 
>>> current year statistics only. The solution was to rename the *csv files 
>>> that are dumped by solr-export-statistics. For example: the csv files 
>>> for the solr core "statistics-2012" will look something like this -- 
>>> statistics-2012_export_2013-12_5.csv. You have to rename all the csv 
>>> files to remove the -2012 in the filename to look like 
>>> this: statistics_export_2013-12_5.csv. I downloaded the zipped up cores in 
>>> csv form to my windows machine so I could use a bulk rename tool to remove 
>>> the year suffix in each core. I then uploaded them to my linux box running 
>>> DSpace and ingested each one using the solr-import-statistics tool. 
>>> This is a very time consuming task.
>>>
>>> Hope this helps and I will try to document this in the next few days.
>>>
>>> Best regards,
>>>
>>> James Holobetz
>>>
>>> On Fri, Mar 24, 2023 at 3:37 PM Tomas Hajek <ha...@oakland.edu> wrote:
>>>
>>>> Hello, 
>>>>    I am working on migrating a DSpace 5.10 installation to a new server 
>>>> running DSpace 7.5.  I have the basic installation running on RHEL 8.7 
>>>> with 
>>>> Tomcat 9.0.71, Solr 8.11.2, node.js 16.18.1, and pm2 5.2.2.  
>>>> I was able to import the database and assetstore and I set up the Solr 
>>>> cores (authority,oai,search,statistics) from the installation instructions.
>>>>    The Solr statistics from the 5.10 installation are sharded by year 
>>>> and I exported with the following:
>>>>
>>>> bin/dspace solr-export-statistics -i statistics-2015
>>>> bin/dspace solr-export-statistics -i statistics-2016
>>>> ...
>>>> bin/dspace solr-export-statistics -i statistics-2022
>>>>
>>>> I have copied the exported files to the new 7.5 server 
>>>> into /opt/dspace/solr-export and am trying to import them but I get the 
>>>> following error (example when trying to import the 2015 statistics):
>>>>
>>>> /opt/dspace/bin/dspace solr-import-statistics -i statistics-2015
>>>> Exception: Error from server at 
>>>> http://localhost:8983/solr/statistics-2015: Expected mime type 
>>>> application/octet-stream but got text/html. <html>
>>>> <head>
>>>> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
>>>> <title>Error 404 Not Found</title>
>>>> </head>
>>>> <body><h2>HTTP ERROR 404 Not Found</h2>
>>>> <table>
>>>> <tr><th>URI:</th><td>/solr/statistics-2015/admin/luke</td></tr>
>>>> <tr><th>STATUS:</th><td>404</td></tr>
>>>> <tr><th>MESSAGE:</th><td>Not Found</td></tr>
>>>> <tr><th>SERVLET:</th><td>default</td></tr>
>>>> </table>
>>>>
>>>> </body>
>>>> </html>
>>>>
>>>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
>>>> Error from server at http://localhost:8983/solr/statistics-2015: 
>>>> Expected mime type application/octet-stream but got text/html. <html>
>>>> <head>
>>>> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
>>>> <title>Error 404 Not Found</title>
>>>> </head>
>>>> <body><h2>HTTP ERROR 404 Not Found</h2>
>>>> <table>
>>>> <tr><th>URI:</th><td>/solr/statistics-2015/admin/luke</td></tr>
>>>> <tr><th>STATUS:</th><td>404</td></tr>
>>>> <tr><th>MESSAGE:</th><td>Not Found</td></tr>
>>>> <tr><th>SERVLET:</th><td>default</td></tr>
>>>> </table>
>>>>
>>>> </body>
>>>> </html>
>>>>
>>>> at 
>>>> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:635)
>>>> at 
>>>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
>>>> at 
>>>> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
>>>> at 
>>>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
>>>> at 
>>>> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
>>>> at 
>>>> org.dspace.util.SolrImportExport.getMultiValuedFields(SolrImportExport.java:482)
>>>> at 
>>>> org.dspace.util.SolrImportExport.importIndex(SolrImportExport.java:433)
>>>> at org.dspace.util.SolrImportExport.main(SolrImportExport.java:148)
>>>> at 
>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>> Method)
>>>> at 
>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>>>> at 
>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.base/java.lang.reflect.Method.invoke(Method.java:568)
>>>> at 
>>>> org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:277)
>>>> at 
>>>> org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:133)
>>>> at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98)
>>>>
>>>> Presumably this is due to not having the sharded statistics-20## cores 
>>>> in Solr configured but I'm not sure at this point how to add and configure 
>>>> them so I can import the statistics.  I am not very familiar with Solr. 
>>>>
>>>> Can anyone enlighten me on how I might do this or correct my steps or 
>>>> let me know what else to look at.
>>>>
>>>> Any assistance would be greatly appreciated.
>>>> Thank you,
>>>>  -Tomas
>>>>
>>>> -- 
>>>> All messages to this mailing list should adhere to the Code of Conduct: 
>>>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "DSpace Technical Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to dspace-tech...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/dspace-tech/CAPx-GQoBmwVH6byhm%2BZv4kg%3D5zmEH%3DQStGL-y1TTD%3D8qBQFo1w%40mail.gmail.com
>>>>  
>>>> <https://groups.google.com/d/msgid/dspace-tech/CAPx-GQoBmwVH6byhm%2BZv4kg%3D5zmEH%3DQStGL-y1TTD%3D8qBQFo1w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
>>
>> -- 
>>
>>                 Tomas Hajek
>>                 ha...@oakland.edu
>>                 1-248-370-3505 <(248)%20370-3505>
>>                 Assistant Director, Research Computing and Infrastructure 
>> Engineering
>>                 University Technology Services
>>                 Oakland University
>>
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/c23c20fa-64c5-466c-8246-be2c2c9ea1ebn%40googlegroups.com.

Reply via email to