Re: which unicode version is supported with lucene

2011-02-27 Thread Yonik Seeley
On Sun, Feb 27, 2011 at 2:15 PM, Bernd Fehling wrote: > Jepp, its back online. > Just did a short test and reported my results to jira, but is the > error from the xml output still a jetty problem or is it from XMLwriter? The patch has been committed, so you should just be able to try trunk (or 3

RE: which unicode version is supported with lucene

2011-02-27 Thread Bernd Fehling
ielefeld.de] > > Sent: Sunday, February 27, 2011 3:04 PM > > To: java-user@lucene.apache.org > > Subject: Re: which unicode version is supported with lucene > > > > Hi Robert, > > > > thanks to you and Yonik for looking into this. > > As soon

RE: which unicode version is supported with lucene

2011-02-27 Thread Uwe Schindler
From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] > Sent: Sunday, February 27, 2011 3:04 PM > To: java-user@lucene.apache.org > Subject: Re: which unicode version is supported with lucene > > Hi Robert, > > thanks to you and Yonik for looking into this. > As soon a

Re: which unicode version is supported with lucene

2011-02-27 Thread Bernd Fehling
Hi Robert, thanks to you and Yonik for looking into this. As soon as Apache jira is back online I will try your jetty version and give feedback. Regards, Bernd > On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling > wrote: > > Hi Yonik, > > > > good point, yes we are using Jetty. > > Do you know if

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling wrote: > Hi Yonik, > > good point, yes we are using Jetty. > Do you know if Tomcat has this limitation? > Hi Bernd, I placed some patched Jetty jar files on https://issues.apache.org/jira/browse/SOLR-2381 for the meantime. Maybe then you can get pas

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 10:04 AM, Yonik Seeley wrote: > But firefox complains on XML output, and any other output like JSON it > looks mangled. > My bet is Jetty's UTF8 encoding for the response also doesn't handle > the full range. > I created a JIRA issue on jetty's issue tracker with a tentati

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 9:31 AM, Robert Muir wrote: > Then i searched on 'range' via the admin gui to retrieve this > document, and chrome blew up with "This page contains the following > errors: error on line 17 at column 306: Encoding error" I got an error in firefox too. I added the following

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
I just tried vim as editor, seams to work. - start vim - enter i (for insert) - enter +v and then +U (for uppercase U) - enter upper Unicode with 8 digits (e.g. 0001D5A0 for U+1D5A0 [MATHEMATICAL SANS-SERIF CAPITAL A]) Am 25.02.2011 15:16, schrieb Yonik Seeley: > On Fri, Feb 25, 2011 at 9:09 AM

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 9:16 AM, Yonik Seeley wrote: > > On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling > wrote: > > Hi Yonik, > > > > good point, yes we are using Jetty. > > Do you know if Tomcat has this limitation? > > Tomcat's defaults are worse - you need to configure it to use UTF-8 by > de

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling wrote: > Hi Yonik, > > good point, yes we are using Jetty. > Do you know if Tomcat has this limitation? Tomcat's defaults are worse - you need to configure it to use UTF-8 by default for URLs. Once you do, it passes all those tests (last I checked).

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Yonik, good point, yes we are using Jetty. Do you know if Tomcat has this limitation? Regards, Bernd Am 25.02.2011 14:54, schrieb Yonik Seeley: > On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling > wrote: >> So Solr trunk should already handle Unicode above BMP for field type string? >> Strange

Re: which unicode version is supported with lucene

2011-02-25 Thread Yonik Seeley
On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling wrote: > So Solr trunk should already handle Unicode above BMP for field type string? > Strange... One issue is that jetty doesn't support UTF-8 beyond the BMP: /opt/code/lusolr/solr/example/exampledocs$ ./test_utf8.sh Solr server is up. HTTP GET is

RE: which unicode version is supported with lucene

2011-02-25 Thread Uwe Schindler
...@thetaphi.de > -Original Message- > From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] > Sent: Friday, February 25, 2011 2:48 PM > To: java-user@lucene.apache.org > Subject: Re: which unicode version is supported with lucene > > > So Solr trunk should alre

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
3 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] >> Sent: Friday, February 25, 2011 2:19 PM >> To: simon.willna...@gmail.com >> Cc: java-user@lucene.apache.org &

RE: which unicode version is supported with lucene

2011-02-25 Thread Uwe Schindler
25, 2011 2:19 PM > To: simon.willna...@gmail.com > Cc: java-user@lucene.apache.org > Subject: Re: which unicode version is supported with lucene > > Hi Simon, > > actually I'm working with Solr from trunk but followed the problem all the > way down to Lucene. I think

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Simon, actually I'm working with Solr from trunk but followed the problem all the way down to Lucene. I think Solr trunk is build with Lucene 3.0.3. My field is: No analysis done at all, just stored the content for result display. But the result is unpredictable and can end in invalid utf-8

Re: which unicode version is supported with lucene

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 6:04 AM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > Since 3.0 is a Java Generics / move to Java 1.5 only release these > APIs are not in use yet in the latest released version. Lucene 3.1 > holds a largely converted Analyzer / TokenFilter / Tokenizer codebas

Re: which unicode version is supported with lucene

2011-02-25 Thread Simon Willnauer
On Fri, Feb 25, 2011 at 1:02 PM, Bernd Fehling wrote: > Hi Simon, > > thanks for the details. > > My platform supports and uses code above BMP (0x1 and up). > So the limit is Lucene. > Don't know how to handle this problem. > May be deleting all code above BMP...??? the code will work fine ev

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Simon, thanks for the details. My platform supports and uses code above BMP (0x1 and up). So the limit is Lucene. Don't know how to handle this problem. May be deleting all code above BMP...??? Good to hear that Lucene 3.1 will come soon. Any rough estimation when Lucene 3.1 will be avail

Re: which unicode version is supported with lucene

2011-02-25 Thread Simon Willnauer
Hey Bernd, On Fri, Feb 25, 2011 at 11:23 AM, Bernd Fehling wrote: > Dear list, > > a very basic question about lucene, which version of > unicode can be handled (indexed and searched) with lucene? if you ask for what the indexer / query can handle then it is really what UTF-8 can handle. Strings

which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Dear list, a very basic question about lucene, which version of unicode can be handled (indexed and searched) with lucene? It looks like lucene can only handle the very old Unicode 2.0 but not the newer 3.1 version (4 byte utf-8 unicode). Is that true? Regards, Bernd --