I know some of the original team members - I could ask.

Are there specific questions, or just "is anybody still minding the fire"?

-- Ken

On Nov 1, 2011, at 2:43pm, Nick Burch wrote:

> On Tue, 1 Nov 2011, Robert Muir wrote:
>> Well as an alternative for them committing the ebcdic detection, perhaps we 
>> could look at the Charset detection apis and propose some API additions so 
>> that users (like Tika) can plug in custom detectors?
> 
> In theory it should be pluggable, but I seem to recal we needed to tweak a 
> few core bits to get the detector working (around negative matches for 
> control characters)
> 
> Looking at the svn version history, the ICU4J team don't appear to have done 
> any work on their character detectors in several years. From the lack of 
> responses when I asked on their list about extending them, I fear there may 
> not be anyone left in their project who's interested in charset detectors any 
> more. I'd love to be proved wrong though, if anyone has any personal contacts 
> on the project they could prod about it?
> 
> Nick

--------------------------
Ken Krugler
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr



Reply via email to