Re: Pspp-users Digest, Vol 128, Issue 3

Alan Mead Thu, 19 Jan 2017 12:41:22 -0800

That's a good question.  When I learned the syntax, it was the only way
to do it.  There are some good resources available online to address
specific questions.  I usually google something like spss syntax compute
and that produces a lot of hits for most questions.

I should say that these days, I do use the GUI to generate the syntax
for virtually all analyses. What I do is to navigate to the dialog I for
the analysis that I want, add some random variables (for most analyses,
you have to find numeric variables), select the options I want and then
paste the syntax.  I then edit it to include the variables I want.  If
the analysis is just a variable or two and they are easily found, then
it's just as easy to simply pick that one.  But if there are many
variables and especially if they are arranged in the dataset
contiguously, then it's far easier to edit the syntax to change the
randomly-selected variables to something like "X to Y" (which includes
all columns between variable X and variable Y in the dataset, including
X and Y).

I personally also use the GUI exclusively to generate syntax for reading
in raw data (reading SAV files with GET is pretty trivial) . I try hard
to analyze tab-delimited files with the variables names on top and I've
found that those are usually read very well into PSPP/SPSS.  I paste the
syntax generated by the GUI so I can edit it if needed. For example,
sometimes it will guess wrong about a variable type.  Or I might want to
manually change a variable name so it matches another file.

So, I write syntax mainly for data manipulation (finding bad data,
scoring, creating new variables from input data, etc.) and that includes
a fairly small number of statements:

execute.
count
compute
recode
value label
variable label
temporary
select if
do if ... else ... end if.
sort cases
get
save
write
format
merge files

Maybe I'm digressing. And maybe these comments are mainly for people who
will be doing long, complex, or very important analyses.  But I highly
recommend using syntax (whether generated by GUI or by hand) for all
analyses.  For one thing, it's self-documenting (if you save it)... You
can go back later and see exactly what you did (e.g., what variables
were included in "INDEX"? How was that Likert scale scored? What regions
went into that market segment?) and if you find a problem, you already
have all the syntax you need to re-run the analysis.  In fact, PSPP
produces readable output but with SPSS if you don't have a copy of SPSS
then you won't be able to read the output file or the SAV file.  So, the
syntax is the only file that will be readable (they're just text files;
you can open them with your favorite text editor or Notepad/Wordpad on
Windows). If you did an SPSS analysis and saved just the output and SAV
data, you might not be able to read either file years (or months) from
now when you no longer have SPSS.

I also think you should avoid ever modifying existing variables, so that
you can re-run your syntax to reproduce an analysis. (You could also
never over-write a SAV file, so that the modified variables become part
of a new SAV file, but this is fraught with peril and tends to lead to a
series of undocumented but indistinguishable datasets, DATA1.SAV,
DATA2.SAV, etc.... Far better to document your analysis in syntax and
avoid modifying existing variables by creating new ones.)

Sometimes, you can also re-use old syntax (if you analyze similar
datasets frequently).

Also, I recommend that when feasible (and sometimes it simply isn't),
you should avoid using SAV files. Or only use them as temporary files,
not as permanent storage of data.  Instead, your analysis should begin
by reading a "raw" data source and then do the whole analysis. The
reason is that you cannot tell what data transformation have been
applied to the dataset. Whereas if you read the data from a raw source,
you always know that that raw source data is in it's known original
state.  This might not be an issue if your analyses do not require data
transformations; but I find that most of my analyses do require a lot. 
In those cases, this isn't a trivial issue.

Once I had an NSF grant which entailed creating an ethics measure and we
used SPSS to score it.  It would have been the work of centuries to
re-create the scoring (and verifying it) through a GUI for each
dataset.  Instead, I copied a fairly complex chunk of syntax and adapted
it to the names of the variables in the current dataset.  I had a syntax
error in one statement because the number of items had changed.  Because
it didn't execute, my data were half scored (half unscored) and I
compounded the problem by not noticing the error and using the score in
a later analysis.  If I'd written the syntax to create new scored
variables, it wouldn't be possible for my scored variables to be "half
scored" ... some of them wouldn't exist.  And in that case, the missing
variables would have stopped the analysis, instead of allowing erroneous
results (from half-scored data) to be produced.

This is definitely a problem is complex analyses or when manipulating
data, but I'd argue it's a potential source of error in any analysis
that involves any degree of data transformation.  IIRC, PSPP distributes
a small example dataset with some kind of Likert data (customer
satisfaction ratings?) and in some version of that example dataset, one
of the items had been reversed (i.e., the Likert responses had been
swapped to 1->5, 2->4, 4->2, 5->1) and saved. You cannot tell this from
the SAV file (at all).  In fact, I'm inferring it from a data analysis,
but it's the only possible way that one Likert item could be so
different.  Garbage in, garbage out and you often cannot verify that a
SAV file is not "garbage" unless you've just created it.

You should also be generous in adding comments to your syntax.  A
comment is a note to the reader/analyst about the syntax and looks like
this:
* data cleaning code .
* removed item 12 on 17-jan-2017 because it had a poor ITC .
* this is the composite that worked best out of the four we tried. it's
R2 was 0.56.
* scoring for the customer satisfaction Likert responses.

I will admit that syntax requires adhering to the rules of PSPP/SPSS
syntax.  You leave off a period at the end or a quote (or use the wrong
quote) and PSPP/SPSS gives you a cryptic error message.  I think this is
one of the reasons novice PSPP/SPSS users avoid syntax, but I think they
are handicapping themselves as a result.

One final thing:  one of the main advantages of PSPP is that it's free
(i.e., user-editable) software, which includes the manual.  So if you
have modifications to make the manual clearer or to add examples, I'm
sure the developers will be delighted to see your changes/additions.

-Alan

On 1/19/2017 1:11 PM, Aj Hollenbach wrote:
> Thanks Alan. What is the best approach, in your opinion, to learning
> the syntax for these types of expressions? Again, I wholly relied upon
> the GUI in SPSS. I did take a look at the PSPP manual, but did not
> immediate see examples of the structure of the syntax.
>
> Thanks,
> Allen
>
> On Thu, Jan 19, 2017 at 12:00 PM, <pspp-users-requ...@gnu.org
> <mailto:pspp-users-requ...@gnu.org>> wrote:
>
>     Send Pspp-users mailing list submissions to
>             pspp-users@gnu.org <mailto:pspp-users@gnu.org>
>
>     To subscribe or unsubscribe via the World Wide Web, visit
>             https://lists.gnu.org/mailman/listinfo/pspp-users
>     <https://lists.gnu.org/mailman/listinfo/pspp-users>
>     or, via email, send a message with subject or body 'help' to
>             pspp-users-requ...@gnu.org <mailto:pspp-users-requ...@gnu.org>
>
>     You can reach the person managing the list at
>             pspp-users-ow...@gnu.org <mailto:pspp-users-ow...@gnu.org>
>
>     When replying, please edit your Subject line so it is more specific
>     than "Re: Contents of Pspp-users digest..."
>
>     Today's Topics:
>
>        1. Re: Selecting cases using the "IF" Function (Alan Mead)
>
>
>     ---------- Forwarded message ----------
>     From: Alan Mead <ame...@alanmead.org <mailto:ame...@alanmead.org>>
>     To: pspp-users@gnu.org <mailto:pspp-users@gnu.org>
>     Cc: 
>     Date: Wed, 18 Jan 2017 11:10:03 -0600
>     Subject: Re: Selecting cases using the "IF" Function
>     As Dr. Water says, syntax is a solution.  The steps would be to
>     (1) paste the desired analysis and then (2) edit the syntax to
>     insert the "IF" statement above it.
>
>     You also need to decide if you want to "permanently" delete the
>     non-selected cases or not.  If I have a long series of analyses, I
>     might select cases (say valid cases) and save them (or use a
>     filter).  But Hollenbach describes analyzing subsets of the
>     dataset and in that case I often find the temporary command to be
>     helpful.  The syntax for _each analysis_ would look like this:
>
>     temporary.
>     select if( region = 1 or (region=1 and id=3)).
>     freq ...
>
>     You would highlight all three statements and run them.  The
>     "temporary" command causes the selection to be in effect only for
>     the next analysis. You repeat the "temporary" and "select if" for
>     each analysis (or, again, use a filter).
>
>     BTW, I honestly think just typing the syntax of the "select if" is
>     easier than using the GUI.
>
>     -Alan
>
>
>     On 1/18/2017 9:53 AM, Aj Hollenbach wrote:
>>     Hi PSPP Users,
>>
>>     I am transitioning from SPSS to PSPP and am having some troubles
>>     with case selection. Specifically, under SPSS, I used to be able
>>     to select cases using a radio button in the Data / Select Cases
>>     dialogue box that stated "Select if condition is satisfied...".
>>     However, under PSPP, I have found that this option is not
>>     available, and that you can only select cases based upon (1) a
>>     random sample, (2) case range, or (3) a filter variable. In other
>>     words, there is no option for using the IF function for selection
>>     purposes. I am attaching screenshots from both programs.
>>
>>     I greatly appreciate any advice that others might have on how to
>>     best make a selection of cases using a conditional IF statement.
>>     In short, I am running analysis of household survey data, but
>>     only want to use data from a handful of the administrative
>>     jurisdictions (provinces) within the larger data set.
>>
>>     Regards,
>>     Allen
>>
>>     PS: I am running GNU pspp 0.10.1-g1082b8
>
>     -- 
>
>     Alan D. Mead, Ph.D.
>     President, Talent Algorithms Inc.
>
>     science + technology = better workers
>
>     http://www.alanmead.org
>
>
>     _______________________________________________
>     Pspp-users mailing list
>     Pspp-users@gnu.org <mailto:Pspp-users@gnu.org>
>     https://lists.gnu.org/mailman/listinfo/pspp-users
>     <https://lists.gnu.org/mailman/listinfo/pspp-users>
>
>
>
>
> _______________________________________________
> Pspp-users mailing list
> Pspp-users@gnu.org
> https://lists.gnu.org/mailman/listinfo/pspp-users

-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

http://www.alanmead.org

I've... seen things you people wouldn't believe...
functions on fire in a copy of Orion.
I watched C-Sharp glitter in the dark near a programmable gate.
All those moments will be lost in time, like Ruby... on... Rails... Time for Pi.

          --"The Register" user Alister, applying the famous 
            "Blade Runner" speech to software development

_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users

Re: Pspp-users Digest, Vol 128, Issue 3

Reply via email to