I don't think the widest variable should be saved - the same old rules
should be followed (eg. in MATCH, if the var is already present leave it
alone). Point is, in 99% of cases the incompatible-width vars are
completely irrelevant for the MATCH at hand, which will fail only
because a feeding file happens to have the same var somewhere but with
different width.
To ftr: people create .sav files from Excel data, or using import
wizards. And even trained staff may need to cope with a name that is
longer than expected.
Cheers
frans
On 26/03/2015 12:10, Alan Mead wrote:
On 3/26/2015 3:59 AM, ftr wrote:
So this means that the programs that produce the CSV files produce
output with different string variable width ?
This is due to the programs or to the people that use the progs ?
In general, when you import text files you fix the variable width in
the DATA LIST.
Or you use GET DATA/TYPE=
http://www.gnu.org/software/pspp/manual/pspp.html#GET-DATA
And why don't you set FORMAT on each of the separate files before you
integrate them ?
When I worked in a project that sounds similar to yours we did a
serious pre-field work training of the local data producers that
succeeded in making the local projects aware what was on stake
(motivation), that made the local heads control the consistency of
data to be sent - something we could not do because we had no direct
access to the local projects, for which the local heads had better
knowledge, and it would have cost us too much (data control) - and
that assured that data were sent in a coherent format and at time.
Maybe you have to train your local people ?
Just some ideas for local problem solving. I am happy that we have
volunteers doing the programming work so we should not overcharge
them with more work that we can at our side.
ftr,
It sounds like you don't run into this problem, so maybe this
discussion isn't relevant for you.
But to repeat the reasons why this change is a good idea: (1) it would
still be EASIER to have PSPP deal with this problem automatically,
rather than forcing me to deal with this issue; and (2) and it would
be a simple way to create another point distinguishing PSPP as
superior to SPSS.
I have given some thought to why SPSS has this limitation. One
possibility is that it's simply an old limitation due to some original
hardware or software issues. I speculate below that at the time of
SPSS's inception, string data was not particularly common nor
important and that variable lengths would be rare. Also, it could be
due to performance issues, but if so I'm sure it would be faster for
PSPP to resolve this issue than for me to due so manually; I assume
that fixing this issue wouldn't generally slow down merge/join files?
I cannot imagine a situation where having this restriction on matching
string length would be a feature. But if PSPP solves the problem by
truncating longer strings, then some data would be lost and sometimes
that will be unacceptable so it would be good to issue a warning or
force people to turn on this feature. If the solution can be to
change the final string length to the longest encountered string
length (and, I assume, therefore truncate no data) then I cannot see a
problem arising from this feature.
I also speculate that this problem is far more of an issue today than
when SPSS was first created, because string data is easier (sometimes
more natural) to collect today. SPSS would have originally (i.e.,
cerca 1970) been fed punch cards and most string data would have been
generated either by the researcher (like a coding) or by something
like a scantron or a scantron-like response grid. I'm sure someone had
participants respond by writing something in but it would have been
keyed into the computer into a fixed width. Using a physical storage
medium (cards) would have discouraged strings unless they were
necessary and encouraged researchers to use the shortest possible
length. Compare that to now: my web-based surveys often have variable
length strings like email, useragent and other string-based meta-data
and often the survey includes fill-in-the-blank or short answer
questions. Often I get datasets where responses are strings, rather
than numeric codes (e.g., "male" and "female"). Even if they are the
same data (e.g., email), it would be natural for these variables to
have different lengths across different surveys. I don't foresee these
conditions changing.
-Alan
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.
science + technology = better workers
+815.588.3846 (Office)
+267.334.4143 (Mobile)
http://www.alanmead.org
Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing:http://www.iacat.org/jcat
_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users
_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users