Re: String variables combining files

Frans Houweling Thu, 26 Mar 2015 04:50:14 -0700

I don't think the widest variable should be saved - the same old rulesshould be followed (eg. in MATCH, if the var is already present leave italone). Point is, in 99% of cases the incompatible-width vars arecompletely irrelevant for the MATCH at hand, which will fail onlybecause a feeding file happens to have the same var somewhere but withdifferent width.To ftr: people create .sav files from Excel data, or using importwizards. And even trained staff may need to cope with a name that islonger than expected.

Cheers
frans

On 26/03/2015 12:10, Alan Mead wrote:

On 3/26/2015 3:59 AM, ftr wrote:
So this means that the programs that produce the CSV files produceoutput with different string variable width ?
This is due to the programs or to the people that use the progs ?
In general, when you import text files you fix the variable width inthe DATA LIST.
Or you use GET DATA/TYPE=
http://www.gnu.org/software/pspp/manual/pspp.html#GET-DATA
And why don't you set FORMAT on each of the separate files before youintegrate them ?
When I worked in a project that sounds similar to yours we did aserious pre-field work training of the local data producers thatsucceeded in making the local projects aware what was on stake(motivation), that made the local heads control the consistency ofdata to be sent - something we could not do because we had no directaccess to the local projects, for which the local heads had betterknowledge, and it would have cost us too much (data control) - andthat assured that data were sent in a coherent format and at time.
Maybe you have to train your local people ?
Just some ideas for local problem solving. I am happy that we havevolunteers doing the programming work so we should not overchargethem with more work that we can at our side.
ftr,
It sounds like you don't run into this problem, so maybe thisdiscussion isn't relevant for you.
But to repeat the reasons why this change is a good idea: (1) it wouldstill be EASIER to have PSPP deal with this problem automatically,rather than forcing me to deal with this issue; and (2) and it wouldbe a simple way to create another point distinguishing PSPP assuperior to SPSS.
I have given some thought to why SPSS has this limitation. Onepossibility is that it's simply an old limitation due to some originalhardware or software issues. I speculate below that at the time ofSPSS's inception, string data was not particularly common norimportant and that variable lengths would be rare. Also, it could bedue to performance issues, but if so I'm sure it would be faster forPSPP to resolve this issue than for me to due so manually; I assumethat fixing this issue wouldn't generally slow down merge/join files?
I cannot imagine a situation where having this restriction on matchingstring length would be a feature. But if PSPP solves the problem bytruncating longer strings, then some data would be lost and sometimesthat will be unacceptable so it would be good to issue a warning orforce people to turn on this feature. If the solution can be tochange the final string length to the longest encountered stringlength (and, I assume, therefore truncate no data) then I cannot see aproblem arising from this feature.
I also speculate that this problem is far more of an issue today thanwhen SPSS was first created, because string data is easier (sometimesmore natural) to collect today. SPSS would have originally (i.e.,cerca 1970) been fed punch cards and most string data would have beengenerated either by the researcher (like a coding) or by somethinglike a scantron or a scantron-like response grid. I'm sure someone hadparticipants respond by writing something in but it would have beenkeyed into the computer into a fixed width. Using a physical storagemedium (cards) would have discouraged strings unless they werenecessary and encouraged researchers to use the shortest possiblelength. Compare that to now: my web-based surveys often have variablelength strings like email, useragent and other string-based meta-dataand often the survey includes fill-in-the-blank or short answerquestions. Often I get datasets where responses are strings, ratherthan numeric codes (e.g., "male" and "female"). Even if they are thesame data (e.g., email), it would be natural for these variables tohave different lengths across different surveys. I don't foresee theseconditions changing.
-Alan


--

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

+815.588.3846 (Office)
+267.334.4143 (Mobile)

http://www.alanmead.org

Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing:http://www.iacat.org/jcat


_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users

_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users

Re: String variables combining files

Reply via email to