Stefan Bodewig schrieb: > On 2009-03-02, Wolfgang Glas <wolfgang.g...@ev-i.at> wrote: > >> Stefan Bodewig schrieb: >>> On 2009-03-01, Wolfgang Glas <wolfgang.g...@ev-i.at> wrote: > >>>> 1) Unicode extra fields are written for all ZIP entries and not only >>>> for entries, which are not encodable by the encoding set to >>>> ZipArchiveOutputStream. > >>> Maybe room for yet another flag? Or an enum-like option > >>> setCreateUnicodeExtraFields(NEVER | ALWAYS | NOT_ENCODABLE) > > Consider the WinZIP case, WinZIP wouldn't recognize the EFS. If you > set the encoding to UTF-8 and use your code and only add extra fields > for non-encodable paths, WinZIP will never see the correct path.
Acccording to my tests WinZip recognizes the EFS flag upon reading. Upon writing WinZip uses extra fields and encodes filenames as Cp437, which is really the most useful variant these days. Secondly, if you set the encoding to UTF-8, there's no need for unicode extra fields anyway. But as mentioned above, the most portable tool-readable variant as requested by the reporter of the original SANDBOX-176 issue is writing Cp437 and adding unicode extra fields. EFS support in the wild is not really widespread, propably due to a mid-air collision between specification writing and omplementation of widespread ZIP-Implementations.... >> I like the idea of a unicode policy flag ;-) > > May be a better approach, agreed. But only if we manage to cover all > border cases. > >> My suggestion is > >> setUnicodePolicy( >> SURROGATES | /* no extra fields, no utf-8 fallback, only %Uxxxx >> surrogates*/ >> EXTRA_FIELDS | /* extra fields for unencodable entriey, no utf-8 fallback >> */ >> EXTRA_FIELDS_ALWAYS | /* extra fields for all entries, no utf-8 fallback >> */ >> UTF8_FALLBACK| /* fall back to utf-8 plus EFS flag for unencodable >> entries. */ >> UTF8_FALLBACK_EXTRA_FIELDS| /* fall back to utf-8 plus EFS flag plus extra >> fields for unencodable */ >> UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS /* fall back to utf-8 plus EFS flag for >> unencodable entries, exta fields for >> all >> entries. */ >> ) > >> We might drop the last two options and we might choose a better >> wording, however the direction should IMHO be as above mentioned... > > This covers all permutations, agreed. > > Names, names, I'm really bad at them. > > EXTRA_FIELDS => ADD_EXTRA_FIELDS_FOR_UNENCODABLE > EXTRA_FIELDS_ALWAYS => ADD_EXTRA_FIELDS > UTF8_FALLBACK => FALL_BACK_TO_UTF8 > UTF8_FALLBACK_EXTRA_FIELDS => FALL_BACK_TO_UTF8_PLUS_EXTRA_FIELD > UTF8_FALLBACK_EXTRA_FIELDS_ALWAYS => FALL_BACK_TO_UTF8_ADD_EXTRA_FIELDS > > but looking at the names we may be better off with two independent > options. Hmm, yes, right now I prefer two flags because they seem to > be orthogonal. I think you should choose, which approach better fits your needs in ant ;-) At least you have to write an XML parser for these settings and the documentation, so you might choose the approach which may be explained in brief words. I can live very well with two options ;-) Wolfgang --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org