Preparing Your Data for Analysis
Organizing Your Data for Analysis
Suppose you have three test scores collected from a class of 10 students (5 males, and 5 females) during a semester. Each student was assigned an identification number. The information for each student you have is an identification number, gender of each student, and scores for test one, test two, and test three (the full data set is displayed toward the end of this section for you to view). Your first task is to present the data in a form acceptable to SPSS for processing.
SPSS uses data organized in rows and columns. Cases are represented in rows and variables are represented in columns.
|
variable
|
|||
|
↓
|
|||
| Name | Test1 | Test2 | Test3 | |
|---|---|---|---|---|
|
Tim
|
20
|
23
|
24
|
←case
|
|
Hans
|
21
|
26
|
28
|
A case contains information for one unit of analysis (e.g., a person, an animal, a machine). Variables are information collected for each case, such as name, score, age, income, educational level. In the above chart, there are two cases and four variables.
In SPSS, less than eight characters of variable names are recommended. They must begin with a letter, although the remaining characters can be any letter, any digit, a period, or the symbols (@, #, _, or $). Variable names cannot end with a period. Variable names that end with an underscore should be avoided. Blanks and special characters such as &, !, ?, ', and * cannot be used in a variable name. Variable names are not case sensitive. Each variable name must be unique; duplication is not allowed.
Most variables are generally numeric (e.g., 12, 93.23) or character/string/alphanumeric (e.g., F, f, john). You can have more than 40 characters but only the first 16 digits are correct. The maximum number of decimal positions depends on the number of digits you have before the decimal point because the total valid digits for the numeric variable is 16. String variables with a defined width of eight or fewer characters are short strings, more than eight characters (up to 255 characters) are long strings. Short string variables can be used in many SPSS procedures. You may leave a blank for any missing numeric values or enter a user-define missing (e.g., 9, 999) value. However, for string values a blank is considered a valid value. You may choose to enter a user-defined missing (e.g., x, xxx, na) value for missing short string variables, but long string variables cannot have user-missing values.
Following the conventions above, let us assign names for the variables in our data set: id, sex, test1, test2, and test3. Once the variables are named according to SPSS conventions, it is a good practice to prepare a code book with details of the data layout. Following is a code book for the data in discussion. Note that this step is to present your data in an organized fashion. It is not mandatory for data analysis. A code book becomes especially handy when dealing with large number of variables. A short sample data, like the following, may not need a code book, but it is included for illustration.
var. name width columns var. type var. labels
id 2 8 Numeric identification no.
sex 1 8 String student gender (f, m)
test1 2 8 Numeric test one score
test2 2 8 Numeric test two score
test3 2 8 Numeric test three score
In the above code book, width indicates the length of a variable measured in digits or characters. For example, the value for variable id takes a maximum of two fields since the highest identification number in our example is going to be 10. The value for variable sex takes a maximum of one field, and so on. Columns affect only the display of values in the Data Editor. Changing the column width does not change the defined width of a variable. Var. type specifies the data type (numeric, comma, dot, scientific notation, date, custom currency or string). In our example, sex is the only string variable coded as f for female, m for male.
The next issue is entering your data into the computer. There are several options. You may create a data file using one of your favorite text editors, or word processing packages (e.g., Word Perfect, MS-Word). Files created using word processing software should be saved in text format before trying to read them into an SPSS session. You may enter your data into a spreadsheet (e.g., Lotus 123, Excel, dBASE) and read it directly into SPSS for Windows. Finally, you may enter the data directly into the spreadsheet-like Data Editor of SPSS for Windows. In this document we are going to examine two of the above data entry methods: using a text editor/word processor, and using the Data Editor of SPSS for Windows.
Using an Editor/Word Processor to Enter Data
Let us first look into the steps for using a text editor or word processor for entering data. Note that if you have a data set with a limited number of variables, you may want to use the SPSS Data Editor to enter your data. However, this example is for illustration purposes. Open up your editor session, or word processing session, and enter the variable values into appropriate columns as outlined in the code book. If you are using a word processor, make sure to save your data in text format. Your completed data file will appear as follows. (Note: The first line is included as a column marker line and is not part of the data. It must be removed before saving or using the data for analysis.)
12345678901234567890
01 f 83 85 91
02 f 65 72 68
03 f 90 94 90
04 f 87 80 82
05 f 78 86 80
06 m 60 74 64
07 m 88 96 92
08 m 84 79 82
09 m 90 87 93
10 m 76 73 70
Save the data as a text file named, grade.dat, onto a flash drive or onto the hard drive.
Notice that in the above data layout one blank space is left after each variable as specified in the code book. It is optional whether to leave a space between variable values. For example, you may choose to enter the data as following:
01f838591
02f657268
03f909490
04f878082
05f788680
06m607464
07m889692
Whichever style (format) you choose, as long as you convey the format correctly to SPSS, it should not have any impact on the analysis. In the above layout, each case/observation has only one line (record) of data. In another situation you may have multiple records per observation.
Creating a Command file to read in your data
In many instances, you may have an external ASCII data file made available to you for analysis, just like the data, grade.dat, we discussed earlier. In such a situation, you do not have to enter your data again into the Data Editor. You can direct SPSS to read the file from the SPSS Syntax Editor window.
Suppose you want to read the file, grade.dat, into SPSS from a Syntax Editor window and create a system file. Creating a command file is a faster way to define your variables, especially if you have a large number of variables. You may create a command file using your favorite editor or word processor and then read it into a Syntax Editor window or open a Syntax Editor window and type in the command lines.
To read your already created command file into a Syntax Editor window
- Select File → Open → Syntax...
- Choose the syntax file (with .sps extension) you want to read and click Open
In the following example we are opening a new Syntax Editor window.
- Select File → New → Syntax
When the Syntax Editor window appears, type:
DATA LIST FILE='C:\TEMP\GRADE.DAT' FIXED
/ id 1-2 sex 4 (A) test1 6-7 test2 9-10 test3 12-13.
EXECUTE.
SAVE OUTFILE='C:\TEMP\SAMPLE1.SAV'.
- Click and drag with your mouse to highlight the lines entered, then click
Run and choose selection. Alternatively, you can click
from the toolbar
The command file will read the specified variable values from the data file, grade.dat, on C:\TEMP, and create a system file, sample1.sav, on C:\TEMP. Make sure you specify the pathname; appropriately indicating the location of the external data file and where the newly created file is to be written. However, you do not have to save a system file to do the analysis. This means the last line is optional for data analysis. Every time you run the above lines, SPSS does create an active file stored in the computer's memory. However, for large data sets, it will save processing time if you save it as a system file and access it for analysis.
In the above command lines, DATA LIST defines a raw data file by assigning names and formats to each variable in the file. They can be in fixed format (values for the same variable are always entered in the same location on the same record for each case) or in free format (values for consecutive variables are not in particular columns but are entered one after the other, separated by blanks or commas). In our example, we used the fixed format. FIXED is the default if no format is specified. That is, in our example we did not have to use the FIXED keyword, but it is included for the sake of illustration. The only string variable in the data is sex, which is identified with a (A) after the variable name and column location.
We do not have any numeric variables with decimal places. SPSS assumes that decimal points are explicitly coded in the data file. If there are no decimal points, the numeric variables are assumed to be integers. To indicate noninteger values for data that have not been coded with decimal points, specify the implied number of decimal places in parentheses after the variable name and column location as in gpa 16-18 (2). This means the variable gpa is in columns 16-18 and is recorded as, for example, 389, and it will be assigned 3.89 by SPSS.
Inline data
In the above example your data are being read from an external file, grade.dat, on C:\TEMP. Still another option is to keep the data within the command file. In such an instance you direct SPSS to read your inline data from the command lines with the BEGIN DATA and END DATA commands. In this mode of data input you will omit the FILE subcommand from the DATA LIST command. The BEGIN DATA command follows the DATA LIST command, and the END DATA command follows the last line of data. All procedure commands should come after the END DATA command, but transformation commands can be specified before BEGIN DATA. For example, if you want to read the above data file, grade.dat, as inline data, you should modify the above command lines as following:
DATA LIST FREE
/ id * sex (a) test1 test2 test3.
BEGIN DATA
01 f 83 85 91
02 f 65 72 68
03 f 90 94 90
04 f 87 80 82
05 f 78 86 80
06 m 60 74 64
07 m 88 96 92
08 m 84 79 82
09 m 90 87 93
10 m 76 73 70
END DATA.
EXECUTE.
SAVE OUTFILE='C:\TEMP\SAMPLE1.SAV'.
In this example, we used a FREE format data layout for illustration. Each variable value is separated by a blank space. Since we are using free format, the column specification, after each variable, is dropped. Note that the variable sex is a one-character string variable. In free field format, when you specify a string format, that format applies to all preceding variables. This means SPSS will regard both id and sex to be read with the string format. To avoid this, place an asterisk (*) after the variable id, to convey that id must be read with the default numeric format. FIXED format can be used with inline data. You may type the above lines in to a Syntax Editor window, or read in the text file with inline data into a Syntax Editor window and execute it as explained above. Keeping data inline may not be an efficient option when you have a large number of data lines.
Using Text Import Wizard to Read Text Data
Using Text Import Wizard is another way to direct SPSS to read an external ASCII data file.
Suppose you want to read the file, grade.dat, into SPSS from Text Import Wizard.
- Select File → Read Text Data
- Click Text(*.txt) for the file type from the Open File dialog box, choose the data file grade.dat in your (C:\TEMP) drive and click Open
- Text Import Wizard is open, follow the Step1 to Step6 in this wizard to specify how the data should be read.
In the following example we are opening grade.dat.
- Step 1 of 6: Check 'no' in 'Does your text file match a predefined format?' and click Next.
- Step 2 of 6: Check 'fixed width' in 'How are your variables arranged?', check 'no' in 'Are variable names included at the top of your file?' and click Next.
- Step 3 of 6: Keep all default checks and click Next.
- Step 4 of 6: Insert breaks to specify the variables begin. Then click Next.
- Step 5 of 6: This step is for specifications of variables selected in the data preview. Click V1, the column highlight. Type id in the Variable name dialog box, type numeric in the Data format dialog box. As the same way, change V2, V3, V4, V5 into sex, test1.test2, test3, respectively. Only sex is string variable. click Next.
- Step 6 of 6: Keep all default checks and click finish.
The data file is read into the SPSS. We can save the data file as SAMPLE1.SAV.
Using the SPSS Data Editor for entering data
Suppose you want to use the SPSS for Windows features for data entry. In that case, you enter data directly into the SPSS spreadsheet-like Data Editor. This is convenient if you have only a small number of variables. The first step is to enter the data into the Data Editor window by opening an SPSS for Windows session. You will define your variables, variable type (e.g., numeric, string), number of decimal places, and any other necessary attributes while you are entering the data. In this mode of data entry, you must define each variable in the Data Editor. You cannot define a group of variables (e.g., Q1 to Q10) using the Data Editor. To define a group of variables, without individually specifying them, you would use the Syntax window.
Let us start an SPSS for Windows session to enter the above data set. If you are using your own PC, start Windows and launch SPSS. If you are using a PC in a UITS Student Technology Center:
- Log on to an available workstation
- Click the Start button
- Clickr All Programs → Departmentally Sponsored → Statistics-Math → SPSS 18.0
This opens the SPSS Data Editor window (titled Untitled). The Data Editor window contains the menu bar, which you use to open files, choose statistical procedures, create graphs, etc. When you start an SPSS session, the Data Editor window always opens first.
You are ready to enter your data once the Data Editor window appears. The first step is to enter the variable names that will appear as the top row of the data file. When you start the session, the top row of the Data Editor window contains a dimmed var as the title of every column, indicating that no data are present. In our sample data set, discussed above, there are five variables named earlier as id, sex, test1, test2, and test3. Let us now enter these variable names into the Data Editor.
To define the variables, click on the Variable View tag at the lower left corner of the Data Editor window and:
- Type in the variable name, id, at the first row under the column Name.
- Press the Tab key to fill-in the variable's attributes with default settings.
SPSS considers all variables as numeric variables by default. Since id is a numeric variable you do not have to redefine the variable type for id. However, you may want to change the current format for decimal places.
- Enter 0 for Decimals.
Now let us define the second variable, sex.
- Type in the variable name, sex, at the second row under the column Name.
- Press the Tab key to fill-in the variable's attributes with default settings.
- To modify the variable type, click on the
icon in the Type column.
- Select String by clicking on the circle to the left.
Define the remaining three numeric variables, test1, test2, and test3, the same way the variable id was defined. Once you have finished, the Variable View screen should look like:

Click on the Data View tag. Now enter the data pressing [Tab] or the right arrow key after each entry. After entering the last variable value for case number one use the arrow key to move the cursor to the beginning of the next line. Continue the process until all the data are entered.

Saving Your SPSS Data
After you have entered/read the data into the Data Editor, save it onto the flash drive. Those who are working from personally owned computers might want to save the file to the hard disk.
- Select Save... or Save As... from the File menu. A dialog box appears

- In the box below File Name type C:\TEMP\sample1.sav.
- Click OK
The data will be saved as an SPSS format file which is readable only by SPSS for Windows. Note that the data file, grade.dat, you saved earlier and the file, sample1.sav, you saved now are in different formats.
Even after saving the data file, the data will still be displayed on your screen. If not, select sample1.sav-SPSS Data Editor from the Window menu.
Next: Descriptive Data Analysis
Prev: Orientation
Up: Table of Contents



