On second thought... Text::Filter::NoPunctuation is probably better than
::Unpunctuate.
However, a more general solution might be Text::Filter::Transliterate
(using "tr" with "from" and "to" mappings passed to the filter) and
Text::Filter::Delete (deleting characters specified according to a
string or regex).
- Brian
On 2012-12-19 21:26, Brian Katzung wrote:
Ben,
How about creating Text::Filter::LowerCase and
Text::Filter::Unpunctuate as derived classes of Text::Filter?
- Brian
On 2012-12-19 13:56, Ben Deutsch wrote:
Hello,
I'm writing a small module to apply "lossy" filters to text, to
enable better subsequent lossless compression. For example, "Hello,
World!" would become "hello, world!" with the "lowercase" filter, or
"Hello World" with the punctuation removal filter. This does not
apply the actual compression, it just reduces the entropy of the text
in question.
As a working title, I'm using
Text::Lossy
as the module name. But "Text" is quite a large and well-known
top-level namespace, so I'm asking if this is a good fit, and if not,
what I might call the module instead.
One thing I do *not* want to do is place it in the "Acme" namespace –
the module may sound a bit silly, but it strives to do exactly what
it says on the tin: every filter reduces the entropy while still
retaining most of the meaning. For example, reducing the entire text
to the empty string (while great for compression) is straight out.
Thanks for your time,
Ben Deutsch
--
Brian Katzung, Kappa Computer Solutions, LLC
Offering web, client/server, open source, and traditional
software development and mixed operating system support
for business, education, and science
Phone: 847.412.0713 http://www.kappacs.com