[ 
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007199#comment-14007199
 ] 

Chris A. Mattmann commented on TIKA-1302:
-----------------------------------------

[~talli...@apache.org] this is a good question -- the VM that lewis set up I 
believe is so that anyone can try out Tika via the JAX-RS service. I would 
imagine if we do the large batch of docs nightly test (which I think would be 
awesome, btw) we'll need to figure out the specs we would need and then compare 
it to the VM that lewis just had set up. How much RAM, CPU, disk etc do you 
think we'll need Tim?

> Let's run Tika against a large batch of docs nightly
> ----------------------------------------------------
>
>                 Key: TIKA-1302
>                 URL: https://issues.apache.org/jira/browse/TIKA-1302
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>
> Many thanks to [~lewismc] for TIKA-1301!  Once we get nightly builds up and 
> running again, it might be fun to run Tika regularly against a large set of 
> docs and report metrics.
> One excellent candidate corpus is govdocs1: 
> http://digitalcorpora.org/corpora/files.
> Any other candidate corpora?  
> [~willp-bl], have anything handy you'd like to contribute? 
> [http://www.openplanetsfoundation.org/blogs/2014-03-21-tika-ride-characterising-web-content-nanite]
>  ;) 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to