I know some of the original team members - I could ask. Are there specific questions, or just "is anybody still minding the fire"?
-- Ken On Nov 1, 2011, at 2:43pm, Nick Burch wrote: > On Tue, 1 Nov 2011, Robert Muir wrote: >> Well as an alternative for them committing the ebcdic detection, perhaps we >> could look at the Charset detection apis and propose some API additions so >> that users (like Tika) can plug in custom detectors? > > In theory it should be pluggable, but I seem to recal we needed to tweak a > few core bits to get the detector working (around negative matches for > control characters) > > Looking at the svn version history, the ICU4J team don't appear to have done > any work on their character detectors in several years. From the lack of > responses when I asked on their list about extending them, I fear there may > not be anyone left in their project who's interested in charset detectors any > more. I'd love to be proved wrong though, if anyone has any personal contacts > on the project they could prod about it? > > Nick -------------------------- Ken Krugler http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr