[ANN]VTD-XML 2.10

2011-02-25 Thread Jimmy Zhang
VTD-XML 2.10 is now released. It can be downloaded at https://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.10/. This release includes a number of new features and enhancement. * The core API of VTD-XML has been expanded. Users can now perform cut/paste/insert on an empty element.

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling wrote: > Hi Yonik, > > good point, yes we are using Jetty. > Do you know if Tomcat has this limitation? > Hi Bernd, I placed some patched Jetty jar files on https://issues.apache.org/jira/browse/SOLR-2381 for the meantime. Maybe then you can get pas

Re: Proper way to deal with shared indexer exception

2011-02-25 Thread Simon Willnauer
I looked at your code briefly. I could imagine that you have a problem if you close the IndexReader while there is still Search going on holding on to the already replaced IndexSearcher. I usually recommend to use a some kind of a "transactional" pattern. When you do a search you ask for the IndexS

Re: shared IndexSearcher (lucene 3.0.3)

2011-02-25 Thread Simon Willnauer
Hey, the too many open files can be prevented by raising the limit of open files ;) there is a nice summary on the FAQ you might wanna look at: http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_an_IOException_that_says_.22Too_many_open_files.22.3F if you have further questions just

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 10:04 AM, Yonik Seeley wrote: > But firefox complains on XML output, and any other output like JSON it > looks mangled. > My bet is Jetty's UTF8 encoding for the response also doesn't handle > the full range. > I created a JIRA issue on jetty's issue tracker with a tentati

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 9:31 AM, Robert Muir wrote: > Then i searched on 'range' via the admin gui to retrieve this > document, and chrome blew up with "This page contains the following > errors: error on line 17 at column 306: Encoding error" I got an error in firefox too. I added the following

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
I just tried vim as editor, seams to work. - start vim - enter i (for insert) - enter +v and then +U (for uppercase U) - enter upper Unicode with 8 digits (e.g. 0001D5A0 for U+1D5A0 [MATHEMATICAL SANS-SERIF CAPITAL A]) Am 25.02.2011 15:16, schrieb Yonik Seeley: > On Fri, Feb 25, 2011 at 9:09 AM

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 9:16 AM, Yonik Seeley wrote: > > On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling > wrote: > > Hi Yonik, > > > > good point, yes we are using Jetty. > > Do you know if Tomcat has this limitation? > > Tomcat's defaults are worse - you need to configure it to use UTF-8 by > de

Proper way to deal with shared indexer exception

2011-02-25 Thread Jason Tesser
We are having issues with FileChannelClosed and are NOT calling Thread.interrupt. We also start to see AlreadyClosedException on Reader. * * we are running the latest 3.0.3 We have code in my lucene Util class like this http://pastebin.com/ifbxhVLi * * we have a single shared searcher and a

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling wrote: > Hi Yonik, > > good point, yes we are using Jetty. > Do you know if Tomcat has this limitation? Tomcat's defaults are worse - you need to configure it to use UTF-8 by default for URLs. Once you do, it passes all those tests (last I checked).

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Yonik, good point, yes we are using Jetty. Do you know if Tomcat has this limitation? Regards, Bernd Am 25.02.2011 14:54, schrieb Yonik Seeley: > On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling > wrote: >> So Solr trunk should already handle Unicode above BMP for field type string? >> Strange

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling wrote: > So Solr trunk should already handle Unicode above BMP for field type string? > Strange... One issue is that jetty doesn't support UTF-8 beyond the BMP: /opt/code/lusolr/solr/example/exampledocs$ ./test_utf8.sh Solr server is up. HTTP GET is

RE: which unicode version is supported with lucene

2011-02-25 Thread Uwe Schindler
What APIs are you using to communicate with Solr? If you are using XML it may be limited by the XML parser used... If you are using SolrJ with binary request handler it should in all cases go through. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@the

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
So Solr trunk should already handle Unicode above BMP for field type string? Strange... Regards, Bernd Am 25.02.2011 14:40, schrieb Uwe Schindler: > Solr trunk is using Lucene trunk since Lucene and Solr are merged. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.t

RE: which unicode version is supported with lucene

2011-02-25 Thread Uwe Schindler
Solr trunk is using Lucene trunk since Lucene and Solr are merged. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] > Sent: Friday, February 25, 2011 2

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Simon, actually I'm working with Solr from trunk but followed the problem all the way down to Lucene. I think Solr trunk is build with Lucene 3.0.3. My field is: No analysis done at all, just stored the content for result display. But the result is unpredictable and can end in invalid utf-8

shared IndexSearcher (lucene 3.0.3)

2011-02-25 Thread Akos Tajti
Hi all, in our project we're using lucene in tomcat. To avoid some overhead we have a shared IndexSearcher instance. In the past we had too many open files errors many times. To prevent this the IndexSearcher is closed and reopened after indexing. The shared instance is not closed anywhere else in

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 6:04 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > Since 3.0 is a Java Generics / move to Java 1.5 only release these > APIs are not in use yet in the latest released version. Lucene 3.1 > holds a largely converted Analyzer / TokenFilter / Tokenizer codebas

Re: which unicode version is supported with lucene

2011-02-25 Thread Simon Willnauer
On Fri, Feb 25, 2011 at 1:02 PM, Bernd Fehling wrote: > Hi Simon, > > thanks for the details. > > My platform supports and uses code above BMP (0x1 and up). > So the limit is Lucene. > Don't know how to handle this problem. > May be deleting all code above BMP...??? the code will work fine ev

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Simon, thanks for the details. My platform supports and uses code above BMP (0x1 and up). So the limit is Lucene. Don't know how to handle this problem. May be deleting all code above BMP...??? Good to hear that Lucene 3.1 will come soon. Any rough estimation when Lucene 3.1 will be avail

Re: which unicode version is supported with lucene

2011-02-25 Thread Simon Willnauer
Hey Bernd, On Fri, Feb 25, 2011 at 11:23 AM, Bernd Fehling wrote: > Dear list, > > a very basic question about lucene, which version of > unicode can be handled (indexed and searched) with lucene? if you ask for what the indexer / query can handle then it is really what UTF-8 can handle. Strings

which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Dear list, a very basic question about lucene, which version of unicode can be handled (indexed and searched) with lucene? It looks like lucene can only handle the very old Unicode 2.0 but not the newer 3.1 version (4 byte utf-8 unicode). Is that true? Regards, Bernd --

Re: Converting an existing index format to Lucene Index

2011-02-25 Thread Lokendra Singh
Eddie and Aditya: Thanks a lot for your suggestions!!. The collect-and-commit approach looks good. @Eddie: My old index size is of order of ~Milion pairs. Though, I haven't run any such tests, I will be running them quickly for analysis. Regards Lokendra On Fri, Feb 25, 2011 at 2:23 PM, Edwar

Re: Converting an existing index format to Lucene Index

2011-02-25 Thread Edward Drapkin
On 2/25/2011 12:26 AM, Lokendra Singh wrote: Hi all, I am seeking for some guidelines to directly convert an already existing index to Lucene index. The index available to me is of a set of pairs. Where each pair is : < word , fileName > i.e a word as a 'value1', and the 'value2' being the