I agree with John. You're right that unix cat is a good solution for
concatenating delimited files in identical formats (and I think you can
find windows-based cat programs) but that's an unusual use-case. The
problem that John is pointing out is that PSPP and SPSS are assigning
meta-data to the columns and that process is imperfect (although, in my
experience, it works a lot better if the data are all generic numeric
scalars).
Programs like PSPP work well when you are joining files in more complex
ways. You're basically complaining that flyswatters work well for
killing flies but hammers are less effective and leave big holes in the
wall.
I haven't examined how PSPP handles this, but I had the impression that
SPSS looks at the first few records (maybe it's all the records, but it
seems unduly influenced by early records) to guess the meta-data, which
works reasonably well if there's a single file. But treating each file
independently has the potential to cause a lot of trouble when there are
several files. My worse case scenario is using SPSS to join several
delimited files on an alphanumeric key (e.g., email or Qualtrics' id or
a hash). What I wish PSPP/SPSS would do is to detect that the keys have
different lengths and either just ignore it or silently increase the
length of the smaller key. If PSPP already does this, kudos!
-Alan
On 11/12/2013 2:04 AM, Ken Singh wrote:
Thank you.
In the case of concatenating the csv files I don't think format
specifiers are essential. As long as the files are of exactly the
same format all variables will align. Once combined the file could be
saved then loaded into the editor (which guesses the formats for each
column). That said, I understand PSPP is meant to be a clone of
SPSS, so likely there is no good solution available. It may still be
the case that it's more expedient to use cat or or "copy /a" to join
files then import into the editor. I had attempted a variation of
your second suggestion but maybe conceded too early. I'll play with
both of your suggestions. Thanks again.
K.
On Tue, Nov 12, 2013 at 2:28 AM, John Darrington
<j...@darrington.wattle.id.au <mailto:j...@darrington.wattle.id.au>>
wrote:
On Mon, Nov 11, 2013 at 10:35:40PM -0500, Ken Singh wrote:
Hello,
It is unclear to me how to quickly and efficiently combine a
set of comma
separated files into one data file. The easy solution would
be to use the
unix 'cat' command to concatenate the files then import using
the graphical
interface. However, I'm interested in a purely PSPP based
solution.
I have tried GET DATA in combination with SAVE but it appears
that the
dataset must be made active. I am not certain about this step.
GET DATA
/TYPE=TXT /ARRANGEMENT=DELIMITED
/FILE='e:\Dropbox\data\raw1.txt'
/DELIMITERS=','
SAVE
/OUTFILE = 'e:\dropbox\data\tmp.sav'
Here is one way you could solve that problem, assuming that both your
CSV files have the same arrangement:
dataset declare d_one.
dataset activate d_one.
GET DATA /TYPE=TXT /FILE='one.csv' /VARIABLES=x F8.2 y F8.2 z F8.2.
dataset declare d_two.
dataset activate d_two.
GET DATA /TYPE=TXT /FILE='two.csv' /VARIABLES=x F8.2 y F8.2 z F8.2.
dataset declare d_concat.
dataset activate d_concat.
ADD FILES /FILE=d_one /FILE=d_two.
LIST.
It also appears that the VARIABLES subcommand is required.
Is there a
solution for when one has dozens of variables?
The problem with CSV is that there is no metadata. How should
pspp (or anyone else!)
know if a column is to be interpreted as a string, a date, or
whatever? If you
happen to know that all your files have the same arrangement, then
one solution you
could try, is to use psppire's import function to "guess" the
arrangement of each file
(hopefully it should guess each one identically) and save to a
.sav - then you can use
ADD FILES to concatonate all the files at once.
The bigger problem is that I have many raw#.txt files, not
necessarily
contiguously numbered. Any suggestions would be most
appreciated.
Again, if the order that the .txt files should be read cannot be
determined from
the names, then you must tell it. At the end of the day, pspp is
a statistical
analysis tool, not an artificial intelligence engine (although the
format guesser does
attempt to go a small way in that direction).
Hope this is helpful.
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.
+815.588.3846 (Office)
+267.334.4143 (Mobile)
http://www.alanmead.org
Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing: http://www.iacat.org/jcat
_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users