Re: How to regulate native memory?

2017-12-04 Thread Dominique Bejean
Hi Uwe, When you are saying "MMap is NOT direct memory", I understand that we can consider that JVM can use (at least) these 3 types of memory: - Heap memory (controlled by Xmx and managed by GC) - Off-heap MMap (os cache) *which is not* Direct Memory and *is not* controlled by MaxDirect

Re: [ANNOUNCE] Web Crawler

2011-05-27 Thread Dominique Bejean
Hi, Sorry for the delay, but I haven't been checking the mailing list for a long time. Crawl-anywhere includes 3 piece of software : a crawler, a pipeline and a solr indexer. There is a default Solr schema used by Crawl-anywhere, tested with Solr 1.4.1 and Solr 3.1.0. But, you can config

[ANNOUNCE] Web Crawler

2011-03-01 Thread Dominique Bejean
Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lo

Re: AW: Best practices for multiple languages?

2011-01-20 Thread Dominique Bejean
Hi, During a recent Solr project we needed to index document in a lot of languages. The natural solution with Lucene and Solr is to define one field per languages. Each field is configured in the schema.xml file to use a language specific processing (tokenizing, stop words, stemmer, ...). Th