Hi all,

Thanks for all the replies and comments.

Attached is a new bunch of patches against master. I've reworked the helpindexer.cpp code so that it can be used as a library, and I changed xmlhelp/source/cxxhelp/provider/databases.cxx to call it.

The good news is that I think this gets rid of the Java invocation on startup. The bad news is that this breaks the build, as I explain below. I attach these work-in-progress patches anyway, because I won't get around to working on this for a few days at least.

1. I converted the HelpIndexer from C++'s std::string and std::wstring to rtl::UOString. This created a new problem (HelpIndexer.cxx:106) of how to convert the rtl::UOString to the TCHAR* that CLucene needs. How can I convert a UOString to a TCHAR* (wchar_t*) in a way that won't break platform independence? This currently garbles the "path" field in the index.

2. In xmlhelp/source/cxxhelp/provider/makefile.mk, I've hacked the include path to include l10ntools/source/help, probably not too good of an idea. I also don't know how to link in the HelpIndexer.o file from xmlhelp (or how to create a .so for it that can be found by xmlhelp).

3. The conversion from using UNIX dirent.h and friends to using 'sal' still needs to happen, and I think that will help get rid of some awkward string conversions too.

4. The patch assumes both libclucene-core and libclucene-contribs-lib are available from pkg-config. Disable the '#define TODO' and the relevant line in the Makefile to only depend on libclucene-core.

Cheers,

Gert

On 02/14/2012 05:24 PM, Caolán McNamara wrote:
On Tue, 2012-02-14 at 17:04 +0100, G.H.M.Valkenhoef, van wrote:

I noticed that CJK-based indexing is only enabled for the Japanese
language. Maybe this can be fixed by adding more languages to be
CJK-indexed.
Indeed, opengrok for "CJKAnalyzer" and see if running zh-* (and possibly
ko) through org.apache.lucene.analysis.cjk.CJKAnalyzer makes a
difference.

Which sadly might mean we need the clucene version of that too :-)

C.


>From acd382ec5ca930df837ceac00df8fd181b38cac4 Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenh...@rug.nl>
Date: Tue, 14 Feb 2012 19:31:18 +0100
Subject: [PATCH 1/3] Add C++ HelpIndexer

---
 l10ntools/prj/build.lst               |    2 +-
 l10ntools/prj/d.lst                   |    6 +-
 l10ntools/source/help/helpindexer.cxx |  247 +++++++++++++++++++++++++++++++++
 l10ntools/source/help/makefile.mk     |   30 ++---
 4 files changed, 263 insertions(+), 22 deletions(-)
 create mode 100644 l10ntools/source/help/helpindexer.cxx

diff --git a/l10ntools/prj/build.lst b/l10ntools/prj/build.lst
index ed919a5..8e3ea70 100644
--- a/l10ntools/prj/build.lst
+++ b/l10ntools/prj/build.lst
@@ -1,4 +1,4 @@
-tr l10ntools : BERKELEYDB:berkeleydb EXPAT:expat LIBXSLT:libxslt LUCENE:lucene sal NULL
+tr l10ntools : BERKELEYDB:berkeleydb EXPAT:expat LIBXSLT:libxslt sal NULL
 tr	l10ntools						usr1	-	all	tr_mkout NULL
 tr	l10ntools\inc					nmake	-	all	tr_inc NULL
 tr	l10ntools\source					nmake	-	all	tr_src tr_inc NULL
diff --git a/l10ntools/prj/d.lst b/l10ntools/prj/d.lst
index eded848..174bb6c 100644
--- a/l10ntools/prj/d.lst
+++ b/l10ntools/prj/d.lst
@@ -26,12 +26,14 @@ mkdir: %_DEST%\bin\help\com\sun\star\help
 ..\%__SRC%\bin\txtconv %_DEST%\bin\txtconv
 ..\%__SRC%\bin\ulfconv %_DEST%\bin\ulfconv
 ..\%__SRC%\class\FCFGMerge.jar %_DEST%\bin\FCFGMerge.jar
-..\%__SRC%\class\HelpIndexerTool.jar %_DEST%\bin\HelpIndexerTool.jar
-..\%__SRC%\bin\HelpLinker %_DEST%\bin\HelpLinker
 ..\%__SRC%\bin\HelpCompiler %_DEST%\bin\HelpCompiler
 ..\%__SRC%\bin\HelpCompiler.exe %_DEST%\bin\HelpCompiler.exe
+..\%__SRC%\bin\HelpLinker %_DEST%\bin\HelpLinker
 ..\%__SRC%\bin\HelpLinker.exe %_DEST%\bin\HelpLinker.exe
 ..\%__SRC%\bin\HelpLinker* %_DEST%\bin
+..\%__SRC%\bin\HelpIndexer %_DEST%\bin\HelpIndexer
+..\%__SRC%\bin\HelpIndexer.exe %_DEST%\bin\HelpIndexer.exe
+..\%__SRC%\bin\HelpIndexer* %_DEST%\bin
 
 ..\scripts\localize %_DEST%\bin\localize
 ..\scripts\fast_merge.pl %_DEST%\bin\fast_merge.pl
diff --git a/l10ntools/source/help/helpindexer.cxx b/l10ntools/source/help/helpindexer.cxx
new file mode 100644
index 0000000..c327119
--- /dev/null
+++ b/l10ntools/source/help/helpindexer.cxx
@@ -0,0 +1,247 @@
+#include <CLucene/StdHeader.h>
+#include <CLucene.h>
+#ifdef TODO
+#include <CLucene/analysis/LanguageBasedAnalyzer.h>
+#endif
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <errno.h>
+#include <string.h>
+
+#include <string>
+#include <iostream>
+#include <algorithm>
+#include <set>
+
+// I assume that TCHAR is defined as wchar_t throughout
+
+using namespace lucene::document;
+
+class HelpIndexer {
+	private:
+		std::string d_lang;
+		std::string d_module;
+		std::string d_captionDir;
+		std::string d_contentDir;
+		std::string d_indexDir;
+		std::string d_error;
+		std::set<std::string> d_files;
+
+	public:
+
+	/**
+	 * @param lang Help files language.
+	 * @param module The module of the helpfiles.
+	 * @param captionDir The directory to scan for caption files.
+	 * @param contentDir The directory to scan for content files.
+	 * @param indexDir The directory to write the index to.
+	 */
+	HelpIndexer(std::string const &lang, std::string const &module,
+		std::string const &captionDir, std::string const &contentDir,
+		std::string const &indexDir);
+
+	/**
+	 * Run the indexer.
+	 * @return true if index successfully generated.
+	 */
+	bool indexDocuments();
+
+	/**
+	 * Get the error string (empty if no error occurred).
+	 */
+	std::string const & getErrorMessage();
+
+	private:
+
+	/**
+	 * Scan the caption & contents directories for help files.
+	 */
+	bool scanForFiles();
+
+	/**
+	 * Scan for files in the given directory.
+	 */
+	bool scanForFiles(std::string const &path);
+
+	/**
+	 * Fill the Document with information on the given help file.
+	 */
+	bool helpDocument(std::string const & fileName, Document *doc);
+
+	/**
+	 * Create a reader for the given file, and create an "empty" reader in case the file doesn't exist.
+	 */
+	lucene::util::Reader *helpFileReader(std::string const & path);
+
+	std::wstring string2wstring(std::string const &source);
+};
+
+HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
+	std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
+d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), d_indexDir(indexDir), d_error(""), d_files() {}
+
+bool HelpIndexer::indexDocuments() {
+	if (!scanForFiles()) {
+		return false;
+	}
+
+#ifdef TODO
+	// Construct the analyzer appropriate for the given language
+	lucene::analysis::Analyzer *analyzer = (
+		d_lang.compare("ja") == 0 ?
+		(lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
+		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#else
+	lucene::analysis::Analyzer *analyzer = (
+		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#endif
+
+	lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
+
+	// Index the identified help files
+	Document doc;
+	for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
+		doc.clear();
+		if (!helpDocument(*i, &doc)) {
+			delete analyzer;
+			return false;
+		}
+		writer.addDocument(&doc);
+	}
+
+	// Optimize the index
+	writer.optimize();
+
+	delete analyzer;
+	return true;
+}
+
+std::string const & HelpIndexer::getErrorMessage() {
+	return d_error;
+}
+
+bool HelpIndexer::scanForFiles() {
+	if (!scanForFiles(d_contentDir)) {
+		return false;
+	}
+	if (!scanForFiles(d_captionDir)) {
+		return false;
+	}
+	return true;
+}
+
+bool HelpIndexer::scanForFiles(std::string const & path) {
+	DIR *dir = opendir(path.c_str());
+	if (dir == 0) {
+		d_error = "Error reading directory " + path + strerror(errno);
+		return true;
+	}
+
+	struct dirent *ent;
+	struct stat info;
+	while ((ent = readdir(dir)) != 0) {
+		if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
+			d_files.insert(ent->d_name);
+		}
+	}
+
+	closedir(dir);
+
+	return true;
+}
+
+bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
+	// Add the help path as an indexed, untokenized field.
+	std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
+	doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
+
+	// Add the caption as a field.
+	std::string captionPath = d_captionDir + "/" + fileName;
+	doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
+	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
+
+	// Add the content as a field.
+	std::string contentPath = d_contentDir + "/" + fileName;
+	doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
+	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
+
+	return true;
+}
+
+lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
+	if (access(path.c_str(), R_OK) == 0) {
+		return new lucene::util::FileReader(path.c_str(), "UTF-8");
+	} else {
+		return new lucene::util::StringReader(L"");
+	}
+}
+
+std::wstring HelpIndexer::string2wstring(std::string const &source) {
+	std::wstring target(source.length(), L' ');
+	std::copy(source.begin(), source.end(), target.begin());
+	return target;
+}
+
+int main(int argc, char **argv) {
+	const std::string pLang("-lang");
+	const std::string pModule("-mod");
+	const std::string pOutDir("-zipdir");
+	const std::string pSrcDir("-srcdir");
+
+	std::string lang;
+	std::string module;
+	std::string srcDir;
+	std::string outDir;
+
+	bool error = false;
+	for (int i = 1; i < argc; ++i) {
+		if (pLang.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				lang = argv[++i];
+			} else {
+				error = true;
+			}
+		} else if (pModule.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				module = argv[++i];
+			} else {
+				error = true;
+			}
+		} else if (pOutDir.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				outDir = argv[++i];
+			} else {
+				error = true;
+			}
+		} else if (pSrcDir.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				srcDir = argv[++i];
+			} else {
+				error = true;
+			}
+		} else {
+			error = true;
+		}
+	}
+
+	if (error) {
+		std::cerr << "Error parsing command-line arguments" << std::endl;
+	}
+
+	if (error || lang.empty() || module.empty() || srcDir.empty() || outDir.empty()) {
+		std::cerr << "Usage: HelpIndexer -lang ISOLangCode -mod HelpModule -srcdir SourceDir -zipdir OutputDir" << std::endl;
+		return 1;
+	}
+
+	std::string captionDir(srcDir + "/caption");
+	std::string contentDir(srcDir + "/content");
+	std::string indexDir(outDir + "/" + module + ".idxl");
+	HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
+	if (!indexer.indexDocuments()) {
+		std::cerr << indexer.getErrorMessage() << std::endl;
+		return 2;
+	}
+	return 0;
+}
diff --git a/l10ntools/source/help/makefile.mk b/l10ntools/source/help/makefile.mk
index bab01b8..e22c6a3 100644
--- a/l10ntools/source/help/makefile.mk
+++ b/l10ntools/source/help/makefile.mk
@@ -60,8 +60,10 @@ SLOFILES=\
 EXCEPTIONSFILES=\
         $(OBJ)$/HelpLinker.obj \
         $(OBJ)$/HelpCompiler.obj \
+        $(OBJ)$/helpindexer.obj \
         $(SLO)$/HelpLinker.obj \
         $(SLO)$/HelpCompiler.obj
+
 .IF "$(OS)" == "MACOSX" && "$(CPU)" == "P" && "$(COM)" == "GCC"
 # There appears to be a GCC 4.0.1 optimization error causing _file:good() to
 # report true right before the call to writeOut at HelpLinker.cxx:1.12 l. 954
@@ -72,6 +74,9 @@ NOOPTFILES=\
         $(SLO)$/HelpLinker.obj
 .ENDIF
 
+PKGCONFIG_MODULES=libclucene-core
+.INCLUDE : pkg_config.mk
+
 APP1TARGET= $(TARGET)
 APP1OBJS=\
       $(OBJ)$/HelpLinker.obj \
@@ -79,6 +84,12 @@ APP1OBJS=\
 APP1RPATH = NONE
 APP1STDLIBS+=$(SALLIB) $(BERKELEYLIB) $(XSLTLIB) $(EXPATASCII3RDLIB)
 
+APP2TARGET=HelpIndexer
+APP2OBJS=\
+      $(OBJ)$/helpindexer.obj
+APP2RPATH = NONE
+APP2STDLIBS+=$(SALLIB) $(PKGCONFIG_LIBS)
+
 SHL1TARGET	=$(LIBBASENAME)$(DLLPOSTFIX)
 SHL1LIBS=	$(SLB)$/$(TARGET).lib
 .IF "$(COM)" == "MSC"
@@ -93,26 +104,7 @@ SHL1USE_EXPORTS	=ordinal
 DEF1NAME	=$(SHL1TARGET) 
 DEFLIB1NAME	=$(TARGET)
 
-JAVAFILES = \
-    HelpIndexerTool.java			        \
-    HelpFileDocument.java
-
-
-JAVACLASSFILES = \
-    $(CLASSDIR)$/$(PACKAGE)$/HelpIndexerTool.class			        \
-    $(CLASSDIR)$/$(PACKAGE)$/HelpFileDocument.class
 
-.IF "$(SYSTEM_LUCENE)" == "YES"
-EXTRAJARFILES += $(LUCENE_CORE_JAR) $(LUCENE_ANALYZERS_JAR)
-.ELSE
-JARFILES += lucene-core-2.3.jar lucene-analyzers-2.3.jar
-.ENDIF
-JAVAFILES = $(subst,$(CLASSDIR)$/$(PACKAGE)$/, $(subst,.class,.java $(JAVACLASSFILES)))
-
-JARCLASSDIRS	   = $(PACKAGE)/*
-JARTARGET	       = HelpIndexerTool.jar
-JARCOMPRESS        = TRUE 
- 
 # --- Targets ------------------------------------------------------
 
 .INCLUDE :  target.mk
-- 
1.7.0.4

>From 7388ee77361a1f8dad84b98306cbfe92c9a7ca3c Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenh...@rug.nl>
Date: Tue, 14 Feb 2012 20:19:37 +0100
Subject: [PATCH 2/3] Separate HelpIndexer into header, implementation, and main

---
 l10ntools/source/help/HelpIndexer.cxx      |  123 ++++++++++++++
 l10ntools/source/help/HelpIndexer.hxx      |   71 ++++++++
 l10ntools/source/help/HelpIndexer_main.cxx |   66 ++++++++
 l10ntools/source/help/helpindexer.cxx      |  247 ----------------------------
 l10ntools/source/help/makefile.mk          |    8 +-
 5 files changed, 265 insertions(+), 250 deletions(-)
 create mode 100644 l10ntools/source/help/HelpIndexer.cxx
 create mode 100644 l10ntools/source/help/HelpIndexer.hxx
 create mode 100644 l10ntools/source/help/HelpIndexer_main.cxx
 delete mode 100644 l10ntools/source/help/helpindexer.cxx

diff --git a/l10ntools/source/help/HelpIndexer.cxx b/l10ntools/source/help/HelpIndexer.cxx
new file mode 100644
index 0000000..ed0ce39
--- /dev/null
+++ b/l10ntools/source/help/HelpIndexer.cxx
@@ -0,0 +1,123 @@
+#include "HelpIndexer.hxx"
+
+#define TODO
+
+#ifdef TODO
+#include <CLucene/analysis/LanguageBasedAnalyzer.h>
+#endif
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <errno.h>
+#include <string.h>
+
+#include <algorithm>
+
+using namespace lucene::document;
+
+HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
+	std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
+d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), d_indexDir(indexDir), d_error(""), d_files() {}
+
+bool HelpIndexer::indexDocuments() {
+	if (!scanForFiles()) {
+		return false;
+	}
+
+#ifdef TODO
+	// Construct the analyzer appropriate for the given language
+	lucene::analysis::Analyzer *analyzer = (
+		d_lang.compare("ja") == 0 ?
+		(lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
+		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#else
+	lucene::analysis::Analyzer *analyzer = (
+		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
+#endif
+
+	lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
+
+	// Index the identified help files
+	Document doc;
+	for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
+		doc.clear();
+		if (!helpDocument(*i, &doc)) {
+			delete analyzer;
+			return false;
+		}
+		writer.addDocument(&doc);
+	}
+
+	// Optimize the index
+	writer.optimize();
+
+	delete analyzer;
+	return true;
+}
+
+std::string const & HelpIndexer::getErrorMessage() {
+	return d_error;
+}
+
+bool HelpIndexer::scanForFiles() {
+	if (!scanForFiles(d_contentDir)) {
+		return false;
+	}
+	if (!scanForFiles(d_captionDir)) {
+		return false;
+	}
+	return true;
+}
+
+bool HelpIndexer::scanForFiles(std::string const & path) {
+	DIR *dir = opendir(path.c_str());
+	if (dir == 0) {
+		d_error = "Error reading directory " + path + strerror(errno);
+		return true;
+	}
+
+	struct dirent *ent;
+	struct stat info;
+	while ((ent = readdir(dir)) != 0) {
+		if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
+			d_files.insert(ent->d_name);
+		}
+	}
+
+	closedir(dir);
+
+	return true;
+}
+
+bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
+	// Add the help path as an indexed, untokenized field.
+	std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
+	doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
+
+	// Add the caption as a field.
+	std::string captionPath = d_captionDir + "/" + fileName;
+	doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
+	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
+
+	// Add the content as a field.
+	std::string contentPath = d_contentDir + "/" + fileName;
+	doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
+	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
+
+	return true;
+}
+
+lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
+	if (access(path.c_str(), R_OK) == 0) {
+		return new lucene::util::FileReader(path.c_str(), "UTF-8");
+	} else {
+		return new lucene::util::StringReader(L"");
+	}
+}
+
+std::wstring HelpIndexer::string2wstring(std::string const &source) {
+	std::wstring target(source.length(), L' ');
+	std::copy(source.begin(), source.end(), target.begin());
+	return target;
+}
diff --git a/l10ntools/source/help/HelpIndexer.hxx b/l10ntools/source/help/HelpIndexer.hxx
new file mode 100644
index 0000000..56122e7
--- /dev/null
+++ b/l10ntools/source/help/HelpIndexer.hxx
@@ -0,0 +1,71 @@
+#ifndef HELPINDEXER_HXX
+#define HELPINDEXER_HXX
+
+#include <CLucene/StdHeader.h>
+#include <CLucene.h>
+
+#include <string>
+#include <set>
+
+// I assume that TCHAR is defined as wchar_t throughout
+
+class HelpIndexer {
+	private:
+		std::string d_lang;
+		std::string d_module;
+		std::string d_captionDir;
+		std::string d_contentDir;
+		std::string d_indexDir;
+		std::string d_error;
+		std::set<std::string> d_files;
+
+	public:
+
+	/**
+	 * @param lang Help files language.
+	 * @param module The module of the helpfiles.
+	 * @param captionDir The directory to scan for caption files.
+	 * @param contentDir The directory to scan for content files.
+	 * @param indexDir The directory to write the index to.
+	 */
+	HelpIndexer(std::string const &lang, std::string const &module,
+		std::string const &captionDir, std::string const &contentDir,
+		std::string const &indexDir);
+
+	/**
+	 * Run the indexer.
+	 * @return true if index successfully generated.
+	 */
+	bool indexDocuments();
+
+	/**
+	 * Get the error string (empty if no error occurred).
+	 */
+	std::string const & getErrorMessage();
+
+	private:
+
+	/**
+	 * Scan the caption & contents directories for help files.
+	 */
+	bool scanForFiles();
+
+	/**
+	 * Scan for files in the given directory.
+	 */
+	bool scanForFiles(std::string const &path);
+
+	/**
+	 * Fill the Document with information on the given help file.
+	 */
+	bool helpDocument(std::string const & fileName, lucene::document::Document *doc);
+
+	/**
+	 * Create a reader for the given file, and create an "empty" reader in case the file doesn't exist.
+	 */
+	lucene::util::Reader *helpFileReader(std::string const & path);
+
+	std::wstring string2wstring(std::string const &source);
+};
+
+#endif
diff --git a/l10ntools/source/help/HelpIndexer_main.cxx b/l10ntools/source/help/HelpIndexer_main.cxx
new file mode 100644
index 0000000..a1dd50b
--- /dev/null
+++ b/l10ntools/source/help/HelpIndexer_main.cxx
@@ -0,0 +1,66 @@
+#include "HelpIndexer.hxx"
+
+#include <string>
+#include <iostream>
+
+int main(int argc, char **argv) {
+	const std::string pLang("-lang");
+	const std::string pModule("-mod");
+	const std::string pOutDir("-zipdir");
+	const std::string pSrcDir("-srcdir");
+
+	std::string lang;
+	std::string module;
+	std::string srcDir;
+	std::string outDir;
+
+	bool error = false;
+	for (int i = 1; i < argc; ++i) {
+		if (pLang.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				lang = argv[++i];
+			} else {
+				error = true;
+			}
+		} else if (pModule.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				module = argv[++i];
+			} else {
+				error = true;
+			}
+		} else if (pOutDir.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				outDir = argv[++i];
+			} else {
+				error = true;
+			}
+		} else if (pSrcDir.compare(argv[i]) == 0) {
+			if (i + 1 < argc) {
+				srcDir = argv[++i];
+			} else {
+				error = true;
+			}
+		} else {
+			error = true;
+		}
+	}
+
+	if (error) {
+		std::cerr << "Error parsing command-line arguments" << std::endl;
+	}
+
+	if (error || lang.empty() || module.empty() || srcDir.empty() || outDir.empty()) {
+		std::cerr << "Usage: HelpIndexer -lang ISOLangCode -mod HelpModule -srcdir SourceDir -zipdir OutputDir" << std::endl;
+		return 1;
+	}
+
+	std::string captionDir(srcDir + "/caption");
+	std::string contentDir(srcDir + "/content");
+	std::string indexDir(outDir + "/" + module + ".idxl");
+	HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
+	if (!indexer.indexDocuments()) {
+		std::cerr << indexer.getErrorMessage() << std::endl;
+		return 2;
+	}
+	return 0;
+}
diff --git a/l10ntools/source/help/helpindexer.cxx b/l10ntools/source/help/helpindexer.cxx
deleted file mode 100644
index c327119..0000000
--- a/l10ntools/source/help/helpindexer.cxx
+++ /dev/null
@@ -1,247 +0,0 @@
-#include <CLucene/StdHeader.h>
-#include <CLucene.h>
-#ifdef TODO
-#include <CLucene/analysis/LanguageBasedAnalyzer.h>
-#endif
-
-#include <unistd.h>
-#include <sys/stat.h>
-#include <dirent.h>
-#include <errno.h>
-#include <string.h>
-
-#include <string>
-#include <iostream>
-#include <algorithm>
-#include <set>
-
-// I assume that TCHAR is defined as wchar_t throughout
-
-using namespace lucene::document;
-
-class HelpIndexer {
-	private:
-		std::string d_lang;
-		std::string d_module;
-		std::string d_captionDir;
-		std::string d_contentDir;
-		std::string d_indexDir;
-		std::string d_error;
-		std::set<std::string> d_files;
-
-	public:
-
-	/**
-	 * @param lang Help files language.
-	 * @param module The module of the helpfiles.
-	 * @param captionDir The directory to scan for caption files.
-	 * @param contentDir The directory to scan for content files.
-	 * @param indexDir The directory to write the index to.
-	 */
-	HelpIndexer(std::string const &lang, std::string const &module,
-		std::string const &captionDir, std::string const &contentDir,
-		std::string const &indexDir);
-
-	/**
-	 * Run the indexer.
-	 * @return true if index successfully generated.
-	 */
-	bool indexDocuments();
-
-	/**
-	 * Get the error string (empty if no error occurred).
-	 */
-	std::string const & getErrorMessage();
-
-	private:
-
-	/**
-	 * Scan the caption & contents directories for help files.
-	 */
-	bool scanForFiles();
-
-	/**
-	 * Scan for files in the given directory.
-	 */
-	bool scanForFiles(std::string const &path);
-
-	/**
-	 * Fill the Document with information on the given help file.
-	 */
-	bool helpDocument(std::string const & fileName, Document *doc);
-
-	/**
-	 * Create a reader for the given file, and create an "empty" reader in case the file doesn't exist.
-	 */
-	lucene::util::Reader *helpFileReader(std::string const & path);
-
-	std::wstring string2wstring(std::string const &source);
-};
-
-HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
-	std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
-d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), d_indexDir(indexDir), d_error(""), d_files() {}
-
-bool HelpIndexer::indexDocuments() {
-	if (!scanForFiles()) {
-		return false;
-	}
-
-#ifdef TODO
-	// Construct the analyzer appropriate for the given language
-	lucene::analysis::Analyzer *analyzer = (
-		d_lang.compare("ja") == 0 ?
-		(lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
-		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
-#else
-	lucene::analysis::Analyzer *analyzer = (
-		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
-#endif
-
-	lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
-
-	// Index the identified help files
-	Document doc;
-	for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
-		doc.clear();
-		if (!helpDocument(*i, &doc)) {
-			delete analyzer;
-			return false;
-		}
-		writer.addDocument(&doc);
-	}
-
-	// Optimize the index
-	writer.optimize();
-
-	delete analyzer;
-	return true;
-}
-
-std::string const & HelpIndexer::getErrorMessage() {
-	return d_error;
-}
-
-bool HelpIndexer::scanForFiles() {
-	if (!scanForFiles(d_contentDir)) {
-		return false;
-	}
-	if (!scanForFiles(d_captionDir)) {
-		return false;
-	}
-	return true;
-}
-
-bool HelpIndexer::scanForFiles(std::string const & path) {
-	DIR *dir = opendir(path.c_str());
-	if (dir == 0) {
-		d_error = "Error reading directory " + path + strerror(errno);
-		return true;
-	}
-
-	struct dirent *ent;
-	struct stat info;
-	while ((ent = readdir(dir)) != 0) {
-		if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
-			d_files.insert(ent->d_name);
-		}
-	}
-
-	closedir(dir);
-
-	return true;
-}
-
-bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
-	// Add the help path as an indexed, untokenized field.
-	std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
-	doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
-
-	// Add the caption as a field.
-	std::string captionPath = d_captionDir + "/" + fileName;
-	doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
-	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
-
-	// Add the content as a field.
-	std::string contentPath = d_contentDir + "/" + fileName;
-	doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
-	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
-
-	return true;
-}
-
-lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
-	if (access(path.c_str(), R_OK) == 0) {
-		return new lucene::util::FileReader(path.c_str(), "UTF-8");
-	} else {
-		return new lucene::util::StringReader(L"");
-	}
-}
-
-std::wstring HelpIndexer::string2wstring(std::string const &source) {
-	std::wstring target(source.length(), L' ');
-	std::copy(source.begin(), source.end(), target.begin());
-	return target;
-}
-
-int main(int argc, char **argv) {
-	const std::string pLang("-lang");
-	const std::string pModule("-mod");
-	const std::string pOutDir("-zipdir");
-	const std::string pSrcDir("-srcdir");
-
-	std::string lang;
-	std::string module;
-	std::string srcDir;
-	std::string outDir;
-
-	bool error = false;
-	for (int i = 1; i < argc; ++i) {
-		if (pLang.compare(argv[i]) == 0) {
-			if (i + 1 < argc) {
-				lang = argv[++i];
-			} else {
-				error = true;
-			}
-		} else if (pModule.compare(argv[i]) == 0) {
-			if (i + 1 < argc) {
-				module = argv[++i];
-			} else {
-				error = true;
-			}
-		} else if (pOutDir.compare(argv[i]) == 0) {
-			if (i + 1 < argc) {
-				outDir = argv[++i];
-			} else {
-				error = true;
-			}
-		} else if (pSrcDir.compare(argv[i]) == 0) {
-			if (i + 1 < argc) {
-				srcDir = argv[++i];
-			} else {
-				error = true;
-			}
-		} else {
-			error = true;
-		}
-	}
-
-	if (error) {
-		std::cerr << "Error parsing command-line arguments" << std::endl;
-	}
-
-	if (error || lang.empty() || module.empty() || srcDir.empty() || outDir.empty()) {
-		std::cerr << "Usage: HelpIndexer -lang ISOLangCode -mod HelpModule -srcdir SourceDir -zipdir OutputDir" << std::endl;
-		return 1;
-	}
-
-	std::string captionDir(srcDir + "/caption");
-	std::string contentDir(srcDir + "/content");
-	std::string indexDir(outDir + "/" + module + ".idxl");
-	HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
-	if (!indexer.indexDocuments()) {
-		std::cerr << indexer.getErrorMessage() << std::endl;
-		return 2;
-	}
-	return 0;
-}
diff --git a/l10ntools/source/help/makefile.mk b/l10ntools/source/help/makefile.mk
index e22c6a3..1283535 100644
--- a/l10ntools/source/help/makefile.mk
+++ b/l10ntools/source/help/makefile.mk
@@ -60,7 +60,8 @@ SLOFILES=\
 EXCEPTIONSFILES=\
         $(OBJ)$/HelpLinker.obj \
         $(OBJ)$/HelpCompiler.obj \
-        $(OBJ)$/helpindexer.obj \
+        $(OBJ)$/HelpIndexer.obj \
+        $(OBJ)$/HelpIndexer_main.obj \
         $(SLO)$/HelpLinker.obj \
         $(SLO)$/HelpCompiler.obj
 
@@ -74,7 +75,7 @@ NOOPTFILES=\
         $(SLO)$/HelpLinker.obj
 .ENDIF
 
-PKGCONFIG_MODULES=libclucene-core
+PKGCONFIG_MODULES=libclucene-core libclucene-contribs-lib
 .INCLUDE : pkg_config.mk
 
 APP1TARGET= $(TARGET)
@@ -86,7 +87,8 @@ APP1STDLIBS+=$(SALLIB) $(BERKELEYLIB) $(XSLTLIB) $(EXPATASCII3RDLIB)
 
 APP2TARGET=HelpIndexer
 APP2OBJS=\
-      $(OBJ)$/helpindexer.obj
+      $(OBJ)$/HelpIndexer.obj \
+      $(OBJ)$/HelpIndexer_main.obj
 APP2RPATH = NONE
 APP2STDLIBS+=$(SALLIB) $(PKGCONFIG_LIBS)
 
-- 
1.7.0.4

>From c44f78a37c2e4919b7c6fc01efa8a04a81b014be Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenh...@rug.nl>
Date: Tue, 14 Feb 2012 21:56:08 +0100
Subject: [PATCH 3/3] HelpIndexer using rtl::OUString, called from xmlhelp

---
 l10ntools/source/help/HelpIndexer.cxx         |   59 ++++++++------
 l10ntools/source/help/HelpIndexer.hxx         |   32 ++++----
 l10ntools/source/help/HelpIndexer_main.cxx    |    9 ++-
 xmlhelp/source/cxxhelp/provider/databases.cxx |  102 +++++++++++--------------
 xmlhelp/source/cxxhelp/provider/makefile.mk   |    5 +
 5 files changed, 105 insertions(+), 102 deletions(-)

diff --git a/l10ntools/source/help/HelpIndexer.cxx b/l10ntools/source/help/HelpIndexer.cxx
index ed0ce39..f86d265 100644
--- a/l10ntools/source/help/HelpIndexer.cxx
+++ b/l10ntools/source/help/HelpIndexer.cxx
@@ -6,6 +6,8 @@
 #include <CLucene/analysis/LanguageBasedAnalyzer.h>
 #endif
 
+#include <rtl/string.hxx>
+
 #include <unistd.h>
 #include <sys/stat.h>
 #include <dirent.h>
@@ -16,9 +18,10 @@
 
 using namespace lucene::document;
 
-HelpIndexer::HelpIndexer(std::string const &lang, std::string const &module,
-	std::string const &captionDir, std::string const &contentDir, std::string const &indexDir) :
-d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), d_indexDir(indexDir), d_error(""), d_files() {}
+HelpIndexer::HelpIndexer(rtl::OUString const &lang, rtl::OUString const &module,
+	rtl::OUString const &captionDir, rtl::OUString const &contentDir, rtl::OUString const &indexDir) :
+d_lang(lang), d_module(module), d_captionDir(captionDir), d_contentDir(contentDir), d_indexDir(indexDir),
+d_error(), d_files() {}
 
 bool HelpIndexer::indexDocuments() {
 	if (!scanForFiles()) {
@@ -28,7 +31,7 @@ bool HelpIndexer::indexDocuments() {
 #ifdef TODO
 	// Construct the analyzer appropriate for the given language
 	lucene::analysis::Analyzer *analyzer = (
-		d_lang.compare("ja") == 0 ?
+		d_lang.compareToAscii("ja") == 0 ?
 		(lucene::analysis::Analyzer*)new lucene::analysis::LanguageBasedAnalyzer(L"cjk") :
 		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
 #else
@@ -36,11 +39,13 @@ bool HelpIndexer::indexDocuments() {
 		(lucene::analysis::Analyzer*)new lucene::analysis::standard::StandardAnalyzer());
 #endif
 
-	lucene::index::IndexWriter writer(d_indexDir.c_str(), analyzer, true);
+	rtl::OString indexDirStr;
+	d_indexDir.convertToString(&indexDirStr, RTL_TEXTENCODING_ASCII_US, 0);
+	lucene::index::IndexWriter writer(indexDirStr.getStr(), analyzer, true);
 
 	// Index the identified help files
 	Document doc;
-	for (std::set<std::string>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
+	for (std::set<rtl::OUString>::iterator i = d_files.begin(); i != d_files.end(); ++i) {
 		doc.clear();
 		if (!helpDocument(*i, &doc)) {
 			delete analyzer;
@@ -56,7 +61,7 @@ bool HelpIndexer::indexDocuments() {
 	return true;
 }
 
-std::string const & HelpIndexer::getErrorMessage() {
+rtl::OUString const & HelpIndexer::getErrorMessage() {
 	return d_error;
 }
 
@@ -70,18 +75,23 @@ bool HelpIndexer::scanForFiles() {
 	return true;
 }
 
-bool HelpIndexer::scanForFiles(std::string const & path) {
-	DIR *dir = opendir(path.c_str());
+bool HelpIndexer::scanForFiles(rtl::OUString const & path) {
+	rtl::OString pathStr;
+	path.convertToString(&pathStr, RTL_TEXTENCODING_ASCII_US, 0);
+	DIR *dir = opendir(pathStr.getStr());
 	if (dir == 0) {
-		d_error = "Error reading directory " + path + strerror(errno);
+		d_error = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("Error reading directory ")) + path +
+			 rtl::OUString::createFromAscii(strerror(errno));
 		return true;
 	}
 
 	struct dirent *ent;
 	struct stat info;
 	while ((ent = readdir(dir)) != 0) {
-		if (stat((path + "/" + ent->d_name).c_str(), &info) == 0 && S_ISREG(info.st_mode)) {
-			d_files.insert(ent->d_name);
+		rtl::OString entPath(pathStr);
+		entPath += rtl::OString(RTL_CONSTASCII_STRINGPARAM("/")) + rtl::OString(ent->d_name);
+		if (stat(entPath.getStr(), &info) == 0 && S_ISREG(info.st_mode)) {
+			d_files.insert(rtl::OUString::createFromAscii(ent->d_name));
 		}
 	}
 
@@ -90,34 +100,31 @@ bool HelpIndexer::scanForFiles(std::string const & path) {
 	return true;
 }
 
-bool HelpIndexer::helpDocument(std::string const & fileName, Document *doc) {
+bool HelpIndexer::helpDocument(rtl::OUString const & fileName, Document *doc) {
 	// Add the help path as an indexed, untokenized field.
-	std::wstring path(L"#HLP#" + string2wstring(d_module) + L"/" + string2wstring(fileName));
-	doc->add(*new Field(_T("path"), path.c_str(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
+	rtl::OUString path = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("#HLP#")) + d_module + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("/")) + fileName;
+	// FIXME: the (TCHAR*) cast is a problem, because TCHAR does not match sal_Unicode
+	doc->add(*new Field(_T("path"), (TCHAR*)path.getStr(), Field::STORE_YES | Field::INDEX_UNTOKENIZED));
 
 	// Add the caption as a field.
-	std::string captionPath = d_captionDir + "/" + fileName;
+	rtl::OUString captionPath = d_captionDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("/")) + fileName;
 	doc->add(*new Field(_T("caption"), helpFileReader(captionPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
 	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
 
 	// Add the content as a field.
-	std::string contentPath = d_contentDir + "/" + fileName;
+	rtl::OUString contentPath = d_contentDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM("/")) + fileName;
 	doc->add(*new Field(_T("content"), helpFileReader(contentPath), Field::STORE_NO | Field::INDEX_TOKENIZED));
 	// FIXME: does the Document take responsibility for the FileReader or should I free it somewhere?
 
 	return true;
 }
 
-lucene::util::Reader *HelpIndexer::helpFileReader(std::string const & path) {
-	if (access(path.c_str(), R_OK) == 0) {
-		return new lucene::util::FileReader(path.c_str(), "UTF-8");
+lucene::util::Reader *HelpIndexer::helpFileReader(rtl::OUString const & path) {
+	rtl::OString pathStr;
+	path.convertToString(&pathStr, RTL_TEXTENCODING_ASCII_US, 0);
+	if (access(pathStr.getStr(), R_OK) == 0) {
+		return new lucene::util::FileReader(pathStr.getStr(), "UTF-8");
 	} else {
 		return new lucene::util::StringReader(L"");
 	}
 }
-
-std::wstring HelpIndexer::string2wstring(std::string const &source) {
-	std::wstring target(source.length(), L' ');
-	std::copy(source.begin(), source.end(), target.begin());
-	return target;
-}
diff --git a/l10ntools/source/help/HelpIndexer.hxx b/l10ntools/source/help/HelpIndexer.hxx
index 56122e7..833e5e7 100644
--- a/l10ntools/source/help/HelpIndexer.hxx
+++ b/l10ntools/source/help/HelpIndexer.hxx
@@ -4,20 +4,20 @@
 #include <CLucene/StdHeader.h>
 #include <CLucene.h>
 
-#include <string>
+#include <rtl/ustring.hxx>
 #include <set>
 
 // I assume that TCHAR is defined as wchar_t throughout
 
 class HelpIndexer {
 	private:
-		std::string d_lang;
-		std::string d_module;
-		std::string d_captionDir;
-		std::string d_contentDir;
-		std::string d_indexDir;
-		std::string d_error;
-		std::set<std::string> d_files;
+		rtl::OUString d_lang;
+		rtl::OUString d_module;
+		rtl::OUString d_captionDir;
+		rtl::OUString d_contentDir;
+		rtl::OUString d_indexDir;
+		rtl::OUString d_error;
+		std::set<rtl::OUString> d_files;
 
 	public:
 
@@ -28,9 +28,9 @@ class HelpIndexer {
 	 * @param contentDir The directory to scan for content files.
 	 * @param indexDir The directory to write the index to.
 	 */
-	HelpIndexer(std::string const &lang, std::string const &module,
-		std::string const &captionDir, std::string const &contentDir,
-		std::string const &indexDir);
+	HelpIndexer(rtl::OUString const &lang, rtl::OUString const &module,
+		rtl::OUString const &captionDir, rtl::OUString const &contentDir,
+		rtl::OUString const &indexDir);
 
 	/**
 	 * Run the indexer.
@@ -41,7 +41,7 @@ class HelpIndexer {
 	/**
 	 * Get the error string (empty if no error occurred).
 	 */
-	std::string const & getErrorMessage();
+	rtl::OUString const & getErrorMessage();
 
 	private:
 
@@ -53,19 +53,17 @@ class HelpIndexer {
 	/**
 	 * Scan for files in the given directory.
 	 */
-	bool scanForFiles(std::string const &path);
+	bool scanForFiles(rtl::OUString const &path);
 
 	/**
 	 * Fill the Document with information on the given help file.
 	 */
-	bool helpDocument(std::string const & fileName, lucene::document::Document *doc);
+	bool helpDocument(rtl::OUString const & fileName, lucene::document::Document *doc);
 
 	/**
 	 * Create a reader for the given file, and create an "empty" reader in case the file doesn't exist.
 	 */
-	lucene::util::Reader *helpFileReader(std::string const & path);
-
-	std::wstring string2wstring(std::string const &source);
+	lucene::util::Reader *helpFileReader(rtl::OUString const & path);
 };
 
 #endif
diff --git a/l10ntools/source/help/HelpIndexer_main.cxx b/l10ntools/source/help/HelpIndexer_main.cxx
index a1dd50b..3d69630 100644
--- a/l10ntools/source/help/HelpIndexer_main.cxx
+++ b/l10ntools/source/help/HelpIndexer_main.cxx
@@ -57,9 +57,14 @@ int main(int argc, char **argv) {
 	std::string captionDir(srcDir + "/caption");
 	std::string contentDir(srcDir + "/content");
 	std::string indexDir(outDir + "/" + module + ".idxl");
-	HelpIndexer indexer(lang, module, captionDir, contentDir, indexDir);
+	HelpIndexer indexer(
+		rtl::OUString::createFromAscii(lang.c_str()),
+		rtl::OUString::createFromAscii(module.c_str()),
+		rtl::OUString::createFromAscii(captionDir.c_str()),
+		rtl::OUString::createFromAscii(contentDir.c_str()),
+		rtl::OUString::createFromAscii(indexDir.c_str()));
 	if (!indexer.indexDocuments()) {
-		std::cerr << indexer.getErrorMessage() << std::endl;
+		std::wcerr << indexer.getErrorMessage().getStr() << std::endl;
 		return 2;
 	}
 	return 0;
diff --git a/xmlhelp/source/cxxhelp/provider/databases.cxx b/xmlhelp/source/cxxhelp/provider/databases.cxx
index 4a4a756..14fe6b5 100644
--- a/xmlhelp/source/cxxhelp/provider/databases.cxx
+++ b/xmlhelp/source/cxxhelp/provider/databases.cxx
@@ -39,6 +39,12 @@
 #include <algorithm>
 #include <string.h>
 
+// EDIT FROM HERE
+
+#include <HelpIndexer.hxx>
+
+// EDIT ENDS HERE
+
 // Extensible help
 #include "com/sun/star/deployment/ExtensionManager.hpp"
 #include "com/sun/star/deployment/thePackageManagerFactory.hpp"
@@ -2113,78 +2119,60 @@ rtl::OUString IndexFolderIterator::implGetIndexFolderFromPackage( bool& o_rbTemp
             // TEST
             //bIsWriteAccess = false;
 
-            Reference< script::XInvocation > xInvocation;
-            Reference< XMultiComponentFactory >xSMgr( m_xContext->getServiceManager(), UNO_QUERY );
+// EDIT FROM HERE
             try
             {
-                xInvocation = Reference< script::XInvocation >(
-                    m_xContext->getServiceManager()->createInstanceWithContext( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM(
-                    "com.sun.star.help.HelpIndexer" )), m_xContext ) , UNO_QUERY );
-
-                if( xInvocation.is() )
-                {
-                    Sequence<uno::Any> aParamsSeq( bIsWriteAccess ? 6 : 8 );
-
-                    aParamsSeq[0] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "-lang" )) );
-
-                    rtl::OUString aLang;
-                    sal_Int32 nLastSlash = aLangURL.lastIndexOf( '/' );
-                    if( nLastSlash != -1 )
-                        aLang = aLangURL.copy( nLastSlash + 1 );
-                    else
-                        aLang = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "en" ));
-                    aParamsSeq[1] = uno::makeAny( aLang );
+                rtl::OUString aLang;
+                sal_Int32 nLastSlash = aLangURL.lastIndexOf( '/' );
+                if( nLastSlash != -1 )
+                    aLang = aLangURL.copy( nLastSlash + 1 );
+                else
+                    aLang = rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "en" ));
 
-                    aParamsSeq[2] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "-mod" )) );
-                    aParamsSeq[3] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "help" )) );
+		rtl::OUString aMod(RTL_CONSTASCII_USTRINGPARAM("help"));
 
-                    rtl::OUString aZipDir = aLangURL;
-                    if( !bIsWriteAccess )
+                rtl::OUString aZipDir = aLangURL;
+                if( !bIsWriteAccess )
+                {
+                    rtl::OUString aTempFileURL;
+                    ::osl::FileBase::RC eErr = ::osl::File::createTempFile( 0, 0, &aTempFileURL );
+                    if( eErr == ::osl::FileBase::E_None )
                     {
-                        rtl::OUString aTempFileURL;
-                        ::osl::FileBase::RC eErr = ::osl::File::createTempFile( 0, 0, &aTempFileURL );
-                        if( eErr == ::osl::FileBase::E_None )
+                        rtl::OUString aTempDirURL = aTempFileURL;
+                        try
                         {
-                            rtl::OUString aTempDirURL = aTempFileURL;
-                            try
-                            {
-                                m_xSFA->kill( aTempDirURL );
-                            }
-                            catch (Exception &)
-                            {}
-                            m_xSFA->createFolder( aTempDirURL );
-
-                            aZipDir = aTempDirURL;
-                            o_rbTemporary = true;
+                            m_xSFA->kill( aTempDirURL );
                         }
+                        catch (Exception &)
+                        {}
+                        m_xSFA->createFolder( aTempDirURL );
+
+                        aZipDir = aTempDirURL;
+                        o_rbTemporary = true;
                     }
+                }
 
-                    aParamsSeq[4] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "-zipdir" )) );
-                    rtl::OUString aSystemPath;
-                    osl::FileBase::getSystemPathFromFileURL( aZipDir, aSystemPath );
-                    aParamsSeq[5] = uno::makeAny( aSystemPath );
+                rtl::OUString aTargetDir;
+                osl::FileBase::getSystemPathFromFileURL( aZipDir, aTargetDir );
 
-                    if( !bIsWriteAccess )
-                    {
-                        aParamsSeq[6] = uno::makeAny( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "-srcdir" )) );
-                        rtl::OUString aSrcDirVal;
-                        osl::FileBase::getSystemPathFromFileURL( aLangURL, aSrcDirVal );
-                        aParamsSeq[7] = uno::makeAny( aSrcDirVal );
-                    }
+                rtl::OUString aSourceDir;
+                osl::FileBase::getSystemPathFromFileURL( aLangURL, aSourceDir );
 
-                    Sequence< sal_Int16 > aOutParamIndex;
-                    Sequence< uno::Any > aOutParam;
-                    uno::Any aRet = xInvocation->invoke( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "createIndex" )),
-                        aParamsSeq, aOutParamIndex, aOutParam );
+		rtl::OUString aCaption(RTL_CONSTASCII_USTRINGPARAM("/caption"));
+		rtl::OUString aContent(RTL_CONSTASCII_USTRINGPARAM("/content"));
 
-                    if( bIsWriteAccess )
-                        aIndexFolder = implGetFileFromPackage( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( ".idxl" )), xPackage );
-                    else
-                        aIndexFolder = aZipDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "/help.idxl" ));
-                }
+		HelpIndexer aIndexer(aLang, aMod, aSourceDir + aCaption, aSourceDir + aContent, aTargetDir);
+
+                if( bIsWriteAccess )
+                    aIndexFolder = implGetFileFromPackage( rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( ".idxl" )), xPackage );
+                else
+                    aIndexFolder = aZipDir + rtl::OUString(RTL_CONSTASCII_USTRINGPARAM( "/help.idxl" ));
             }
             catch (Exception &)
             {}
+
+// EDIT UNTIL HERE
+
         }
     }
 
diff --git a/xmlhelp/source/cxxhelp/provider/makefile.mk b/xmlhelp/source/cxxhelp/provider/makefile.mk
index b709797..05f4ead 100644
--- a/xmlhelp/source/cxxhelp/provider/makefile.mk
+++ b/xmlhelp/source/cxxhelp/provider/makefile.mk
@@ -67,6 +67,11 @@ LIBXSLTINCDIR=external$/libxslt
 CFLAGS+= -I$(SOLARINCDIR)$/$(LIBXSLTINCDIR)
 .ENDIF
 
+CFLAGS+= -I$(SRC_ROOT)$/l10ntools$/source$/help
+
+PKGCONFIG_MODULES=libclucene-core libclucene-contribs-lib
+.INCLUDE : pkg_config.mk
+
 .IF "$(GUI)"=="WNT"
 .IF "$(COM)"=="MSC"
 CFLAGS+=-GR
-- 
1.7.0.4

>From 2f2ab3b5fca95c5ae39939b361d291cfe0a6cbb4 Mon Sep 17 00:00:00 2001
From: Gert van Valkenhoef <g.h.m.van.valkenh...@rug.nl>
Date: Tue, 14 Feb 2012 19:31:41 +0100
Subject: [PATCH] Use C++ HelpIndexer

---
 helpcontent2/settings.pmk    |   12 ------------
 helpcontent2/util/target.pmk |   21 +++------------------
 2 files changed, 3 insertions(+), 30 deletions(-)

diff --git a/helpcontent2/settings.pmk b/helpcontent2/settings.pmk
index 185438e..3716281 100755
--- a/helpcontent2/settings.pmk
+++ b/helpcontent2/settings.pmk
@@ -1,17 +1,5 @@
 .INCLUDE : $(LOCAL_COMMON_OUT)/inc$/aux_langs.mk
 .INCLUDE : $(LOCAL_COMMON_OUT)/inc$/help_exist.mk
 
-my_cp:=$(CLASSPATH)$(PATH_SEPERATOR)$(SOLARBINDIR)$/jaxp.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/juh.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/parser.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/xt.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/unoil.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/ridl.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/jurt.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/xmlsearch.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/LuceneHelpWrapper.jar$(PATH_SEPERATOR)$(SOLARBINDIR)$/HelpIndexerTool.jar$
-
-.IF "$(SYSTEM_LUCENE)" == "YES"
-my_cp!:=$(my_cp)$(PATH_SEPERATOR)$(LUCENE_CORE_JAR)$(PATH_SEPERATOR)$(LUCENE_ANALYZERS_JAR)
-.ELSE
-my_cp!:=$(my_cp)$(PATH_SEPERATOR)$(SOLARBINDIR)/lucene-core-2.3.jar$(PATH_SEPERATOR)$(SOLARBINDIR)/lucene-analyzers-2.3.jar
-.ENDIF
- 
-.IF "$(SYSTEM_DB)" != "YES"
-JAVA_LIBRARY_PATH= -Djava.library.path=$(SOLARSHAREDBIN)
-.ENDIF 
-
 aux_alllangiso_all:=$(foreach,i,$(alllangiso) $(foreach,j,$(aux_langdirs) $(eq,$i,$j  $i $(NULL))))
 aux_alllangiso:=$(foreach,i,$(aux_alllangiso_all) $(foreach,j,$(help_exist) $(eq,$i,$j  $i $(NULL))))
diff --git a/helpcontent2/util/target.pmk b/helpcontent2/util/target.pmk
index 40f6e5d..7dd7e5b 100755
--- a/helpcontent2/util/target.pmk
+++ b/helpcontent2/util/target.pmk
@@ -30,25 +30,10 @@ LINKALLADDEDDEPS=$(foreach,i,$(aux_alllangiso) $(subst,LANGUAGE,$i $(LINKADDEDDP
 
 ALLTAR : $(LINKALLTARGETS)
 
-.IF "$(SYSTEM_DB)" != "YES"
-JAVA_LIBRARY_PATH= -Djava.library.path=$(SOLARSHAREDBIN)
-.ENDIF
-
 XSL_DIR*:=$(SOLARBINDIR)
 
 $(LINKALLTARGETS) : $(foreach,i,$(LINKLINKFILES) $(COMMONMISC)$/$$(@:b:s/_/./:e:s/.//)/$i) $(subst,LANGUAGE,$$(@:b:s/_/./:e:s/.//) $(LINKADDEDDEPS)) $(COMMONMISC)$/xhp_changed.flag
     $(HELPLINKER) @$(mktmp -mod $(LINKNAME) -src $(COMMONMISC) -sty $(XSL_DIR)/embed.xsl -zipdir $(MISC)$/ziptmp$(@:b) -idxcaption $(XSL_DIR)/idxcaption.xsl -idxcontent $(XSL_DIR)/idxcontent.xsl -lang {$(subst,$(LINKNAME)_, $(@:b))} $(subst,LANGUAGE,{$(subst,$(LINKNAME)_, $(@:b))} $(LINKADDEDFILES)) $(foreach,i,$(LINKLINKFILES) $(COMMONMISC)$/{$(subst,$(LINKNAME)_, $(@:b))}/$i) -o $@.$(INPATH))
-.IF "$(SOLAR_JAVA)" == "TRUE"
-.IF "$(CHECK_LUCENCE_INDEXER_OUTPUT)" == ""
-    $(JAVAI) $(JAVAIFLAGS) $(JAVA_LIBRARY_PATH) -cp "$(my_cp)" com.sun.star.help.HelpIndexerTool -lang $(@:b:s/_/./:e:s/.//) -mod $(LINKNAME) -zipdir $(MISC)$/ziptmp$(@:b) -o $@.$(INPATH)
-.ELSE
-    $(JAVAI) $(JAVAIFLAGS) $(JAVA_LIBRARY_PATH) -cp "$(my_cp)" com.sun.star.help.HelpIndexerTool -lang $(@:b:s/_/./:e:s/.//) -mod $(LINKNAME) -zipdir $(MISC)$/ziptmp$(@:b) -o $@.$(INPATH) -checkcfsandsegname _0 _3
-.ENDIF
-   $(RENAME) $@.$(INPATH) $@
-.ELSE
-    -$(RM) $(MISC)$/ziptmp$(@:b)$/content/*.*
-    -$(RM) $(MISC)$/ziptmp$(@:b)$/caption/*.*
-    zip -j -D $@.$(INPATH) $(MISC)$/ziptmp$(@:b)$/*
-    $(RENAME) $@.$(INPATH) $@
-    -$(RM) $(MISC)$/ziptmp$(@:b)$/*.*
-.ENDIF
+    $(HELPINDEXER) -lang $(@:b:s/_/./:e:s/.//) -mod $(LINKNAME) -srcdir $(MISC)$/ziptmp$(@:b) -zipdir $(MISC)$/ziptmp$(@:b)
+    cd $(MISC)$/ziptmp$(@:b) && zip -rX --filesync zipfile.zip $(LINKNAME).*
+    $(RENAME) $(MISC)$/ziptmp$(@:b)$/zipfile.zip $@
-- 
1.7.0.4

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to