Hi Uwe,
When you are saying "MMap is NOT direct memory", I understand that we can
consider that JVM can use (at least) these 3 types of memory:
- Heap memory (controlled by Xmx and managed by GC)
- Off-heap MMap (os cache) *which is not* Direct Memory and *is not*
controlled by MaxDirect
Hi,
Sorry for the delay, but I haven't been checking the mailing list for a
long time.
Crawl-anywhere includes 3 piece of software : a crawler, a pipeline and
a solr indexer.
There is a default Solr schema used by Crawl-anywhere, tested with Solr
1.4.1 and Solr 3.1.0.
But, you can config
Hi,
I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web
Crawler. It includes :
* a crawler
* a document processing pipeline
* a solr indexer
The crawler has a web administration in order to manage web sites to be
crawled. Each web site crawl is configured with a lo
Hi,
During a recent Solr project we needed to index document in a lot of
languages. The natural solution with Lucene and Solr is to define one
field per languages. Each field is configured in the schema.xml file to
use a language specific processing (tokenizing, stop words, stemmer,
...). Th