Robert Wohlfarth writes:

> I am looking to release a collection of modules for converting data.
> The modules read data from a source, convert the data, then add it
> into an SQL database.
> 
> The modules are named like this...
> * Data::ETL
> * Data::ETL::Extract
> * Data::ETL::Extract::Excel
> * Data::ETL::Extract::DelimitedText
> * Data::ETL::Extract::XML
> * Data::ETL::Load
> * Data::ETL::MSAccess
> 
> In my mind, ETL means "Extract-Transform-Load".

That wouldn't've occurred to me, but the Wikipedia page for ‘Extra,
transform, load’ is the top link when searching DuckDuckGo for “ETL”, so
it seems reasonable to use it in a module name if your target audience
is people already working in the field and familiar with its jargon.

> Is "Data" an appropriate place?

Yes ... and no. Data:: is appropriate for pretty much every module on
Cpan, in that an awful lot of code does stuff with data. That makes it a
suboptimal namespace, because it doesn't define what's specific about
this particular module.

In particular, it didn't to me suggest databases, or even data
warehousing (which the ETL Wikipedia page suggests is the main use of
ETL). It'd be good for the name to indicate that field in some way.

> Thoughts on the naming convention "Data::ETL"?

The combination of a very broad namespace and an acronym makes it hard
to guess at the area of the module — for instance that would be an
equally good name for a module that processes data searching for
extra-terrestrial life ...

If the database-loading part uses DBI connections then the DBIx::
namespace would be good for indicating that.

Unfortunately for you, DataWarehouse::ETL is already used by another
module. Ideally you'd mention that module in your docs, explaining to
new users the difference between them. If your name can help to indicate
the distinctive feature of yours, so much the better — but often that
isn't possible if they are simply different approaches to the same
problem.

One possibility for a suite of connected modules that only really work
together is to concoct a ‘fanciful’ brand name for the framework, like
Moose or Catalyst and put all your modules under either $Brand:: or
something like DataWarehouse::$Brand::.

A framework name works well if, say, your $whatever::Extract::Excel
module is only intended to be used with other modules in your framework
and doesn't really make sense as a standalone module for somebody just
wanting to extract data from an Excel spreadsheet (and get back a Perl
data structure they can do what they want with). The brand name
indicates that it's part of the framework and to be used with that.

Hope that helps.

Smylers
-- 
http://twitter.com/Smylers2

Reply via email to