View on GitHub
File an issue

7 Statistical Analysis (SAS)

7.1 Setup

In this section, we’ll provide some examples in SAS of basic statistical analyses you might need to implement at some point.

The analyses you conduct as part of real research will almost certainly be more complicated than the code shown here. However, the hope is that you’ll get enough of a feel for the mechanics of SAS to begin putting the pieces together as demanded by a given project.

7.1.1 Data

If you’d like to try any of these examples on your own, we’re using the NHANES I Epidemiologic Followup Study (NHEFS) data set freely available at the website for Causal Inference: What If by Miguel A. Hernán and James M. Robins.

For both SAS and R, we’ll use the CSV file they provide.

The website itself contains extensive code examples in SAS, Stata, R, and Python—everything you could want!—so those who will be taking advanced epidemiologic or causal inference coursework should refer to those resources for guidance on how to implement relevant analyses. We’ll cover a little bit here, but only briefly.

7.1.2 NHEFS Codebook

Variable name Description
active IN YOUR USUAL DAY, HOW ACTIVE ARE YOU? IN 1971, 0:very active, 1:moderately active, 2:inactive
age AGE IN 1971
alcoholfreq HOW OFTEN DO YOU DRINK? IN 1971 0: Almost every day, 1: 2-3 times/week, 2: 1-4 times/month, 3: < 12 times/year, 4: No alcohol last year, 5: Unknown
alcoholhowmuch WHEN YOU DRINK, HOW MUCH DO YOU DRINK? IN 1971
alcoholpy HAVE YOU HAD 1 DRINK PAST YEAR? IN 1971, 1:EVER, 0:NEVER; 2:MISSING
alcoholtype WHICH DO YOU MOST FREQUENTLY DRINK? IN 1971 1: BEER, 2: WINE, 3: LIQUOR, 4: OTHER/UNKNOWN
allergies USE ALLERGIES MEDICATION IN 1971, 1:EVER, 0:NEVER
asthma DX ASTHMA IN 1971, 1:EVER, 0:NEVER
bithcontrol BIRTH CONTROL PILLS PAST 6 MONTHS? IN 1971 1:YES, 0:NO, 2:MISSING
birthplace CHECK STATE CODE - SECOND PAGE
boweltrouble USE BOWEL TROUBLE MEDICATION IN 1971, 1:EVER, 0:NEVER, ; 2:MISSING
bronch DX CHRONIC BRONCHITIS/EMPHYSEMA IN 1971, 1:EVER, 0:NEVER
cholesterol SERUM CHOLESTEROL (MG/100ML) IN 1971
chroniccough DX CHRONIC COUGH IN 1971, 1:EVER, 0:NEVER
colitis DX COLITIS IN 1971, 1:EVER, 0:NEVER
dadth DAY OF DEATH
dbp DIASTOLIC BLOOD PRESSURE IN 1982
death DEATH BY 1992, 1:YES, 0:NO
diabetes DX DIABETES IN 1971, 1:EVER, 0:NEVER, 2:MISSING
education AMOUNT OF EDUCATION BY 1971: 1: 8TH GRADE OR LESS, 2: HS DROPOUT, 3: HS, 4:COLLEGE DROPOUT, 5: COLLEGE OR MORE
exercise IN RECREATION, HOW MUCH EXERCISE? IN 1971, 0:much exercise,1:moderate exercise,2:little or no exercise
hayfever DX HAY FEVER IN 1971, 1:EVER, 0:NEVER
hbp DX HIGH BLOOD PRESSURE IN 1971, 1:EVER, 0:NEVER, 2:MISSING
hbpmed USE HIGH BLOOD PRESSURE MEDICATION IN 1971, 1:EVER, 0:NEVER, ; 2:MISSING
headache USE HEADACHE MEDICATION IN 1971, 1:EVER, 0:NEVER
hepatitis DX HEPATITIS IN 1971, 1:EVER, 0:NEVER
hf DX HEART FAILURE IN 1971, 1:EVER, 0:NEVER
hightax82 LIVING IN A HIGHLY TAXED STATE IN 1982, High taxed state of residence=1, 0 otherwise
ht HEIGHT IN CENTIMETERS IN 1971
income TOTAL FAMILY INCOME IN 1971 11:<$1000, 12: 1000-1999, 13: 2000-2999, 14: 3000-3999, 15: 4000-4999, 16: 5000-5999, 17: 6000-6999, 18: 7000-9999, 19: 10000-14999, 20: 15000-19999, 21: 20000-24999, 22: 25000+
infection USE INFECTION MEDICATION IN 1971, 1:EVER, 0:NEVER
lackpep USELACK OF PEP MEDICATION IN 1971, 1:EVER, 0:NEVER
marital MARITAL STATUS IN 1971 1: Under 17, 2: Married, 3: Widowed, 4: Never married, 5: Divorced, 6: Separated, 8: Unknown
modth MONTH OF DEATH
nerves USE NERVES MEDICATION IN 1971, 1:EVER, 0:NEVER
nervousbreak DX NERVOUS BREAKDOWN IN 1971, 1:EVER, 0:NEVER
otherpain USE OTHER PAINS MEDICATION IN 1971, 1:EVER, 0:NEVER
pepticulcer DX PEPTIC ULCER IN 1971, 1:EVER, 0:NEVER
pica DO YOU EAT DIRT OR CLAY, STARCH OR OTHER NON STANDARD FOOD? IN 1971 1:EVER, 0:NEVER; 2:MISSING
polio DX POLIO IN 1971, 1:EVER, 0:NEVER
pregnancies TOTAL NUMBER OF PREGNANCIES? IN 1971
price71 AVG TOBACCO PRICE IN STATE OF RESIDENCE 1971 (US$2008)
price71_82 DIFFERENCE IN AVG TOBACCO PRICE IN STATE OF RESIDENCE 1971-1982 (US$2008)
price82 AVG TOBACCO PRICE IN STATE OF RESIDENCE 1982 (US$2008)
qsmk QUIT SMOKING BETWEEN 1ST QUESTIONNAIRE AND 1982, 1:YES, 0:NO
race 0: WHITE 1: BLACK OR OTHER IN 1971
sbp SYSTOLIC BLOOD PRESSURE IN 1982
school HIGHEST GRADE OF REGULAR SCHOOL EVER IN 1971
seqn UNIQUE PERSONAL IDENTIFIER
sex 0: MALE 1: FEMALE
smokeintensity NUMBER OF CIGARETTES SMOKED PER DAY IN 1971
smkintensity 82_71 INCREASE IN NUMBER OF CIGARETTES/DAY BETWEEN 1971 and 1982
smokeyrs YEARS OF SMOKING
tax71 TOBACCO TAX IN STATE OF RESIDENCE 1971 (US$2008)
tax71_82 DIFFERENCE IN TOBACCO TAX IN STATE OF RESIDENCE 1971-1982 (US$2008)
tax82 TOBACCO TAX IN STATE OF RESIDENCE 1971 (US$2008)
tb DX TUBERCULOSIS IN 1971, 1:EVER, 0:NEVER
tumor DX MALIGNANT TUMOR/GROWTH IN 1971, 1:EVER, 0:NEVER
weakheart USE WEAK HEART MEDICATION IN 1971, 1:EVER, 0:NEVER
wt71 WEIGHT IN KILOGRAMS IN 1971
wt82 WEIGHT IN KILOGRAMS IN 1982
wt82_71 WEIGHT CHANGE IN KILOGRAMS
wtloss USE WEIGHT LOSS MEDICATION IN 1971, 1:EVER, 0:NEVER
yrdth YEAR OF DEATH

7.2 Exploratory Data Analysis

First, let’s read in our data and glance at it using a couple of different views.

SAS Documentation: PROC IMPORT

proc import datafile="data/nhefs.csv"
            out=nhefs
            dbms=csv
            replace;
run;

* information about the variables in NHEFS;
proc contents data=nhefs; run;
Data Set Name WORK.NHEFS Observations 1629
Member Type DATA Variables 64
Engine V9 Indexes 0
Created 12/15/2022 20:38:42 Observation Length 512
Last Modified 12/15/2022 20:38:42 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding latin1 Western (ISO)

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 13
First Data Page 1
Max Obs per Page 127
Obs in First Data Page 107
Number of Data Set Repairs 0
Filename /tmp/SAS_work6AC9000060C8_jrgant-AW/nhefs.sas7bdat
Release Created 9.0401M7
Host Created Linux
Inode Number 7998379
Access Permission rw-rw-r–
Owner Name jrgant
File Size 896KB
File Size (bytes) 917504

Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
53 active Num 8 BEST12. BEST32.
10 age Num 8 BEST12. BEST32.
39 alcoholfreq Num 8 BEST12. BEST32.
41 alcoholhowmuch Num 8 BEST12. BEST32.
38 alcoholpy Num 8 BEST12. BEST32.
40 alcoholtype Num 8 BEST12. BEST32.
46 allergies Num 8 BEST12. BEST32.
24 asthma Num 8 BEST12. BEST32.
55 birthcontrol Num 8 BEST12. BEST32.
20 birthplace Num 8 BEST12. BEST32.
50 boweltrouble Num 8 BEST12. BEST32.
25 bronch Num 8 BEST12. BEST32.
57 cholesterol Num 8 BEST12. BEST32.
32 chroniccough Num 8 BEST12. BEST32.
30 colitis Num 8 BEST12. BEST32.
6 dadth Num 8 BEST12. BEST32.
8 dbp Num 8 BEST12. BEST32.
3 death Num 8 BEST12. BEST32.
34 diabetes Num 8 BEST12. BEST32.
15 education Num 8 BEST12. BEST32.
54 exercise Num 8 BEST12. BEST32.
33 hayfever Num 8 BEST12. BEST32.
28 hbp Num 8 BEST12. BEST32.
49 hbpmed Num 8 BEST12. BEST32.
43 headache Num 8 BEST12. BEST32.
31 hepatitis Num 8 BEST12. BEST32.
27 hf Num 8 BEST12. BEST32.
58 hightax82 Num 8 BEST12. BEST32.
16 ht Num 8 BEST12. BEST32.
12 income Num 8 BEST12. BEST32.
52 infection Num 8 BEST12. BEST32.
48 lackpep Num 8 BEST12. BEST32.
13 marital Num 8 BEST12. BEST32.
5 modth Num 8 BEST12. BEST32.
47 nerves Num 8 BEST12. BEST32.
37 nervousbreak Num 8 BEST12. BEST32.
44 otherpain Num 8 BEST12. BEST32.
29 pepticulcer Num 8 BEST12. BEST32.
42 pica Num 8 BEST12. BEST32.
35 polio Num 8 BEST12. BEST32.
56 pregnancies Num 8 BEST12. BEST32.
59 price71 Num 8 BEST12. BEST32.
60 price82 Num 8 BEST12. BEST32.
63 price71_82 Num 8 BEST12. BEST32.
2 qsmk Num 8 BEST12. BEST32.
11 race Num 8 BEST12. BEST32.
7 sbp Num 8 BEST12. BEST32.
14 school Num 8 BEST12. BEST32.
1 seqn Num 8 BEST12. BEST32.
9 sex Num 8 BEST12. BEST32.
22 smkintensity82_71 Num 8 BEST12. BEST32.
21 smokeintensity Num 8 BEST12. BEST32.
23 smokeyrs Num 8 BEST12. BEST32.
61 tax71 Num 8 BEST12. BEST32.
62 tax82 Num 8 BEST12. BEST32.
64 tax71_82 Num 8 BEST12. BEST32.
26 tb Num 8 BEST12. BEST32.
36 tumor Num 8 BEST12. BEST32.
45 weakheart Num 8 BEST12. BEST32.
17 wt71 Num 8 BEST12. BEST32.
18 wt82 Num 8 BEST12. BEST32.
19 wt82_71 Num 8 BEST12. BEST32.
51 wtloss Num 8 BEST12. BEST32.
4 yrdth Num 8 BEST12. BEST32.

* print first five rows of dataset;
proc print data=nhefs(obs=5); run;
Obs seqn qsmk death yrdth modth dadth sbp dbp sex age race income marital school education ht wt71 wt82 wt82_71 birthplace smokeintensity smkintensity82_71 smokeyrs asthma bronch tb hf hbp pepticulcer colitis hepatitis chroniccough hayfever diabetes polio tumor nervousbreak alcoholpy alcoholfreq alcoholtype alcoholhowmuch pica headache otherpain weakheart allergies nerves lackpep hbpmed boweltrouble wtloss infection active exercise birthcontrol pregnancies cholesterol hightax82 price71 price82 tax71 tax82 price71_82 tax71_82
1 233 0 0 . . . 175 96 0 42 1 19 2 7 1 174.1875 79.04 68.94604024 -10.09395976 47 30 -10 29 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 1 3 7 0 1 0 0 0 0 0 1 0 0 0 0 2 2 . 197 0 2.18359375 1.7399902344 1.1022949219 0.4619750977 0.4437866211 0.6403808594
2 235 0 0 . . . 123 80 0 36 0 18 2 9 2 159.375 58.63 61.23496995 2.60496995 42 20 -10 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 4 0 1 0 0 0 0 0 0 0 0 1 0 0 2 . 301 0 2.3466796875 1.7973632813 1.3649902344 0.5718994141 0.5493164063 0.79296875
3 244 0 0 . . . 115 75 1 56 1 15 3 11 2 168.5 56.81 66.22448602 9.41448602 51 20 -14 26 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 3 4 . 0 1 1 0 0 1 0 0 0 0 0 0 2 0 2 157 0 1.5695800781 1.5134277344 0.5512695313 0.2309875488 0.0561981201 0.3202514648
4 245 0 1 85 2 14 148 78 0 68 1 15 3 5 1 170.1875 59.42 64.41011654 4.99011654 37 3 4 53 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 2 3 4 0 0 1 1 0 0 0 0 0 0 0 1 2 2 . 174 0 1.5065917969 1.4519042969 0.5249023438 0.2199707031 0.0547943115 0.3049926758
5 252 0 0 . . . 118 77 0 40 0 18 2 11 2 181.875 87.09 92.07925111 4.98925111 42 20 0 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 2 0 1 0 0 0 0 0 0 1 0 0 1 1 2 . 216 0 2.3466796875 1.7973632813 1.3649902344 0.5718994141 0.5493164063 0.79296875

For now, let’s focus on observations (rows) that don’t have any missing data. To do this, let’s get a summary of how much missingness we have across all observations.

proc means data=nhefs nmiss STACKODSOUTPUT;
    ods output summary=nhefs_miss;
run;
Variable N Miss
seqn 0
qsmk 0
death 0
yrdth 1311
modth 1307
dadth 1307
sbp 77
dbp 81
sex 0
age 0
race 0
income 62
marital 0
school 0
education 0
ht 0
wt71 0
wt82 63
wt82_71 63
birthplace 92
smokeintensity 0
smkintensity82_71 0
smokeyrs 0
asthma 0
bronch 0
tb 0
hf 0
hbp 0
pepticulcer 0
colitis 0
hepatitis 0
chroniccough 0
hayfever 0
diabetes 0
polio 0
tumor 0
nervousbreak 0
alcoholpy 0
alcoholfreq 0
alcoholtype 0
alcoholhowmuch 417
pica 0
headache 0
otherpain 0
weakheart 0
allergies 0
nerves 0
lackpep 0
hbpmed 0
boweltrouble 0
wtloss 0
infection 0
active 0
exercise 0
birthcontrol 0
pregnancies 903
cholesterol 16
hightax82 92
price71 92
price82 92
tax71 92
tax82 92
price71_82 92
tax71_82 92

proc print data=nhefs_miss; run;
Obs Variable NMiss
1 seqn 0
2 qsmk 0
3 death 0
4 yrdth 1311
5 modth 1307
6 dadth 1307
7 sbp 77
8 dbp 81
9 sex 0
10 age 0
11 race 0
12 income 62
13 marital 0
14 school 0
15 education 0
16 ht 0
17 wt71 0
18 wt82 63
19 wt82_71 63
20 birthplace 92
21 smokeintensity 0
22 smkintensity82_71 0
23 smokeyrs 0
24 asthma 0
25 bronch 0
26 tb 0
27 hf 0
28 hbp 0
29 pepticulcer 0
30 colitis 0
31 hepatitis 0
32 chroniccough 0
33 hayfever 0
34 diabetes 0
35 polio 0
36 tumor 0
37 nervousbreak 0
38 alcoholpy 0
39 alcoholfreq 0
40 alcoholtype 0
41 alcoholhowmuch 417
42 pica 0
43 headache 0
44 otherpain 0
45 weakheart 0
46 allergies 0
47 nerves 0
48 lackpep 0
49 hbpmed 0
50 boweltrouble 0
51 wtloss 0
52 infection 0
53 active 0
54 exercise 0
55 birthcontrol 0
56 pregnancies 903
57 cholesterol 16
58 hightax82 92
59 price71 92
60 price82 92
61 tax71 92
62 tax82 92
63 price71_82 92
64 tax71_82 92

Let’s drop every row with a missing value for any variable:

Complete case code adapted from an example on SAS blogs.

/* h/t https://blogs.sas.com/content/iml/2015/02/23/complete-cases.html */

data nhefs2;
    set nhefs;
    if cmiss(of _ALL_)=0;
run;

proc contents data=nhefs2; run;
Data Set Name WORK.NHEFS2 Observations 44
Member Type DATA Variables 64
Engine V9 Indexes 0
Created 12/15/2022 20:38:45 Observation Length 512
Last Modified 12/15/2022 20:38:45 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding latin1 Western (ISO)

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 127
Obs in First Data Page 44
Number of Data Set Repairs 0
Filename /tmp/SAS_work1D400000634E_jrgant-AW/nhefs2.sas7bdat
Release Created 9.0401M7
Host Created Linux
Inode Number 7998383
Access Permission rw-rw-r–
Owner Name jrgant
File Size 128KB
File Size (bytes) 131072

Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
53 active Num 8 BEST12. BEST32.
10 age Num 8 BEST12. BEST32.
39 alcoholfreq Num 8 BEST12. BEST32.
41 alcoholhowmuch Num 8 BEST12. BEST32.
38 alcoholpy Num 8 BEST12. BEST32.
40 alcoholtype Num 8 BEST12. BEST32.
46 allergies Num 8 BEST12. BEST32.
24 asthma Num 8 BEST12. BEST32.
55 birthcontrol Num 8 BEST12. BEST32.
20 birthplace Num 8 BEST12. BEST32.
50 boweltrouble Num 8 BEST12. BEST32.
25 bronch Num 8 BEST12. BEST32.
57 cholesterol Num 8 BEST12. BEST32.
32 chroniccough Num 8 BEST12. BEST32.
30 colitis Num 8 BEST12. BEST32.
6 dadth Num 8 BEST12. BEST32.
8 dbp Num 8 BEST12. BEST32.
3 death Num 8 BEST12. BEST32.
34 diabetes Num 8 BEST12. BEST32.
15 education Num 8 BEST12. BEST32.
54 exercise Num 8 BEST12. BEST32.
33 hayfever Num 8 BEST12. BEST32.
28 hbp Num 8 BEST12. BEST32.
49 hbpmed Num 8 BEST12. BEST32.
43 headache Num 8 BEST12. BEST32.
31 hepatitis Num 8 BEST12. BEST32.
27 hf Num 8 BEST12. BEST32.
58 hightax82 Num 8 BEST12. BEST32.
16 ht Num 8 BEST12. BEST32.
12 income Num 8 BEST12. BEST32.
52 infection Num 8 BEST12. BEST32.
48 lackpep Num 8 BEST12. BEST32.
13 marital Num 8 BEST12. BEST32.
5 modth Num 8 BEST12. BEST32.
47 nerves Num 8 BEST12. BEST32.
37 nervousbreak Num 8 BEST12. BEST32.
44 otherpain Num 8 BEST12. BEST32.
29 pepticulcer Num 8 BEST12. BEST32.
42 pica Num 8 BEST12. BEST32.
35 polio Num 8 BEST12. BEST32.
56 pregnancies Num 8 BEST12. BEST32.
59 price71 Num 8 BEST12. BEST32.
60 price82 Num 8 BEST12. BEST32.
63 price71_82 Num 8 BEST12. BEST32.
2 qsmk Num 8 BEST12. BEST32.
11 race Num 8 BEST12. BEST32.
7 sbp Num 8 BEST12. BEST32.
14 school Num 8 BEST12. BEST32.
1 seqn Num 8 BEST12. BEST32.
9 sex Num 8 BEST12. BEST32.
22 smkintensity82_71 Num 8 BEST12. BEST32.
21 smokeintensity Num 8 BEST12. BEST32.
23 smokeyrs Num 8 BEST12. BEST32.
61 tax71 Num 8 BEST12. BEST32.
62 tax82 Num 8 BEST12. BEST32.
64 tax71_82 Num 8 BEST12. BEST32.
26 tb Num 8 BEST12. BEST32.
36 tumor Num 8 BEST12. BEST32.
45 weakheart Num 8 BEST12. BEST32.
17 wt71 Num 8 BEST12. BEST32.
18 wt82 Num 8 BEST12. BEST32.
19 wt82_71 Num 8 BEST12. BEST32.
51 wtloss Num 8 BEST12. BEST32.
4 yrdth Num 8 BEST12. BEST32.

Oops, we have only 44 rows left in the data set! Perhaps we should reconsider and pick several variables that will be of interest to us for our current purposes.

%let selectvars = age alcoholfreq asthma wt71 smokeintensity sbp allergies weakheart;

data nhefs3;
    set nhefs(keep=&selectvars);
    if cmiss(of _ALL_)=0;
run;

proc contents data=nhefs3; run;
Data Set Name WORK.NHEFS3 Observations 1552
Member Type DATA Variables 8
Engine V9 Indexes 0
Created 12/15/2022 20:38:45 Observation Length 64
Last Modified 12/15/2022 20:38:45 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding latin1 Western (ISO)

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 1021
Obs in First Data Page 977
Number of Data Set Repairs 0
Filename /tmp/SAS_work4619000063F6_jrgant-AW/nhefs3.sas7bdat
Release Created 9.0401M7
Host Created Linux
Inode Number 7998383
Access Permission rw-rw-r–
Owner Name jrgant
File Size 192KB
File Size (bytes) 196608

Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 age Num 8 BEST12. BEST32.
6 alcoholfreq Num 8 BEST12. BEST32.
8 allergies Num 8 BEST12. BEST32.
5 asthma Num 8 BEST12. BEST32.
1 sbp Num 8 BEST12. BEST32.
4 smokeintensity Num 8 BEST12. BEST32.
7 weakheart Num 8 BEST12. BEST32.
3 wt71 Num 8 BEST12. BEST32.

That’s better. We’ve discarded 77 rows and focused on a subset of variables of interest.

Before we do some exploratory data analysis, let’s recall that category 5 for one of the variables we selected (alcoholfreq) indicated that alcohol frequency for that individual was unknown. This is missing data, and we should recode the alcoholfreq variable so that SAS recognizes these observations as such.

In addition, SAS currently believes that alcoholfreq is a numeric variable, though it is a categorical variable.

We can add a new variable to our nhefs3 data set that recodes alcoholfreq to meet our needs. We’ll name our new variable alcfreqcat

CHARACTER VARIABLES How did we get alcfreqcat to be a character variable when we based it on the numeric variable alcoholfreq? Notice in the data step that we initialized alcfreqcat using '.' instead of .. Wrapping the dot in quotes indicated to SAS that we wanted it to treat alfreqcat as a character variable.

data nhefs3;
    set nhefs3;
    if alcoholfreq = 5 then alcfreqcat = '.';
    else alcfreqcat = alcoholfreq;
run;

proc freq data=nhefs3;
    table alcfreqcat*alcoholfreq;
run;
Frequency
Percent
Row Pct
Col Pct
Table of alcfreqcat by alcoholfreq
alcfreqcat alcoholfreq
0 1 2 3 4 5 Total
.
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
5
0.32
100.00
100.00
5
0.32
0
320
20.62
100.00
100.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
320
20.62
1
0
0.00
0.00
0.00
217
13.98
100.00
100.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
217
13.98
2
0
0.00
0.00
0.00
0
0.00
0.00
0.00
489
31.51
100.00
100.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
489
31.51
3
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
329
21.20
100.00
100.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
329
21.20
4
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
192
12.37
100.00
100.00
0
0.00
0.00
0.00
192
12.37
Total
320
20.62
217
13.98
489
31.51
329
21.20
192
12.37
5
0.32
1552
100.00

proc contents data=nhefs3; run;
Data Set Name WORK.NHEFS3 Observations 1552
Member Type DATA Variables 9
Engine V9 Indexes 0
Created 12/15/2022 20:38:46 Observation Length 72
Last Modified 12/15/2022 20:38:46 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64
Encoding latin1 Western (ISO)

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 908
Obs in First Data Page 866
Number of Data Set Repairs 0
Filename /tmp/SAS_work71760000654C_jrgant-AW/nhefs3.sas7bdat
Release Created 9.0401M7
Host Created Linux
Inode Number 7998381
Access Permission rw-rw-r–
Owner Name jrgant
File Size 192KB
File Size (bytes) 196608

Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
2 age Num 8 BEST12. BEST32.
9 alcfreqcat Char 1
6 alcoholfreq Num 8 BEST12. BEST32.
8 allergies Num 8 BEST12. BEST32.
5 asthma Num 8 BEST12. BEST32.
1 sbp Num 8 BEST12. BEST32.
4 smokeintensity Num 8 BEST12. BEST32.
7 weakheart Num 8 BEST12. BEST32.
3 wt71 Num 8 BEST12. BEST32.

Although SAS will automatically throw out observations with missing data when running regressions and other procedures, it will do so based on which variables are included in model formulas. So that we conduct all the analyses below on the same subset of the NHEFS data, let’s drop any rows from our data set with missing values of alcfreqcat and drop the original alcoholfreq variable.

data nhefs4;
    set nhefs3(where=(alcfreqcat ne '.') drop=alcoholfreq);
run;

proc print data=nhefs4(obs=5); run;
Obs sbp age wt71 smokeintensity asthma weakheart allergies alcfreqcat
1 175 42 79.04 30 0 0 0 1
2 123 36 58.63 20 0 0 0 0
3 115 56 56.81 20 0 0 0 3
4 148 68 59.42 3 0 1 0 2
5 118 40 87.09 20 0 0 0 2

Let’s summarize our dataset:

* save list of continuous variables and summarize;
%let contvars = age wt71 smokeintensity sbp;
proc means data=nhefs4;
    var &contvars;
run;
Variable N Mean Std Dev Minimum Maximum
age
wt71
smokeintensity
sbp
1547
1547
1547
1547
43.6528765
70.9031157
20.5416936
128.7039431
12.0298947
15.3891998
11.7258480
19.0608817
25.0000000
39.5800000
1.0000000
87.0000000
74.0000000
151.7300000
80.0000000
229.0000000

* save list of categorical variables and summarize;
%let catvars = asthma allergies alcfreqcat weakheart;
proc freq data=nhefs4;
    tables &catvars;
run;
asthma Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 1474 95.28 1474 95.28
1 73 4.72 1547 100.00

allergies Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 1448 93.60 1448 93.60
1 99 6.40 1547 100.00

alcfreqcat Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 320 20.69 320 20.69
1 217 14.03 537 34.71
2 489 31.61 1026 66.32
3 329 21.27 1355 87.59
4 192 12.41 1547 100.00

weakheart Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 1512 97.74 1512 97.74
1 35 2.26 1547 100.00

7.3 Tabular Analysis

Soon you will be no stranger to two-by-two tables and contingency tables. Introductory statistics and epidemiologic methods courses will require you to analyze such tables, either to measure the strength of an exposure’s effect on some outcome or to establish a statistical association between two variables.

Going back to our NHEFS data, let’s say we’re interested in a possible association between quitting smoking and weight loss. The qsmk variable is a binary indicator of whether or not an individual quit smoking between 1971 and 1982, while the sbp variable measures an individual’s change in weight (kilograms) between 1971 and 1982.

Typically, we would not want to dichotomize a continuous variable like sbp, but for the sake of this example, let’s begin by creating a binary indicator of whether a subject gained weight during this timeframe.

data nhefs;
    set nhefs;
    if sbp > 140 then sbp_hi = 1;
    else if sbp <= 140 then sbp_hi = 0;
    if sbp = . then sbp_hi = .;
run;

proc freq data=nhefs;
    table sbp_hi;
run;
sbp_hi Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 1192 76.80 1192 76.80
1 360 23.20 1552 100.00
Frequency Missing = 77

The summary statistics we generated above should comply with our expectations: those we marked as having a high systolic blood pressure have values of sbp above 140 mm/Hg, while those we marked as not having high systolic blood pressure all have values of sbp less than or equal to 140 mm/Hg.

Note, too, that we have 63 missing values in our new variable, just as we saw in the original outcome.

proc freq data=nhefs;
    table sbp_hi * qsmk /chisq;
run;
Frequency
Percent
Row Pct
Col Pct
Table of sbp_hi by qsmk
sbp_hi qsmk
0 1 Total
0
908
58.51
76.17
78.34
284
18.30
23.83
72.26
1192
76.80
1
251
16.17
69.72
21.66
109
7.02
30.28
27.74
360
23.20
Total
1159
74.68
393
25.32
1552
100.00
Frequency Missing = 77


Statistics for Table of sbp_hi by qsmk

Statistic DF Value Prob
Chi-Square 1 6.0872 0.0136
Likelihood Ratio Chi-Square 1 5.9256 0.0149
Continuity Adj. Chi-Square 1 5.7508 0.0165
Mantel-Haenszel Chi-Square 1 6.0833 0.0136
Phi Coefficient 0.0626
Contingency Coefficient 0.0625
Cramer's V 0.0626

Fisher's Exact Test
Cell (1,1) Frequency (F) 908
Left-sided Pr <= F 0.9939
Right-sided Pr >= F 0.0088
Table Probability (P) 0.0028
Two-sided Pr <= P 0.0155


Sample Size = 1552
Frequency Missing = 77

A natural choice here would be to conduct a two-sample test for the equality of proportions, where we compare the proportion of subjects in the quit smoking group with high systolic blood pressure versus the same proportion in the non-quit smoking group.

The output above shows us the group proportions, the \(\chi^2\) test statistic (elicited by specifying the table option chisq), and the corresponding p-value. We also get a 95% confidence interval for the difference in proportions.

7.4 Regression Models

In this section we will fit some regression models that you may encounter in classes or in your own research. We will leave the theoretical background to your statistics and methods courses but aim to provide enough information so that you understand the basic idea behind each approach and can refer back to this page in the future.

All upcoming examples use variables from the NHEFS dataset that we’ve referred back to several times by now.

7.4.1 Linear regression

Exposure of interest: quitting smoking between 1971 and 1982 (qsmk)
Outcome: change in weight between 1971 and 1982 (continuous, sbp)

First, we can run an unadjusted model with weight change as the outcome and quitting smoking as the predictor variable.

ods graphics off;
proc reg data=nhefs;
    model sbp = qsmk /clb; * /clb adds confidence limits to output table;
quit;
Model: MODEL1
Dependent Variable: sbp

Number of Observations Read 1629
Number of Observations Used 1552
Number of Observations with Missing Values 77

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 4673.90689 4673.90689 12.98 0.0003
Error 1550 558280 360.18067
Corrected Total 1551 562954

Root MSE 18.97843 R-Square 0.0083
Dependent Mean 128.70941 Adj R-Sq 0.0077
Coeff Var 14.74517

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t| 95% Confidence Limits
Intercept 1 127.69888 0.55747 229.07 <.0001 126.60541 128.79235
qsmk 1 3.99069 1.10782 3.60 0.0003 1.81771 6.16367

SAS’s PROC REG does not handle product terms or categorical variables within the model statement itself. Let’s say we wanted to fit an adjusted model and include a product term to test an “interaction” between sex and age. First, we have to add the interaction variable to our nhefs dataset, as shown below, and include the new variable the model formula.

The code below fits a linear model for the effect of qsmk on sbp adjusted for years of smoking (smokeyrs), intensity of smoking (smokeintensity), diabetes, sex, age, and the “interaction” between sex and age (sex_age).

data nhefs;
    set nhefs;
    sex_age = sex * age;
run;

ods graphics off;
proc reg data=nhefs;
    model sbp = qsmk smokeyrs smokeintensity diabetes sex age sex_age /clb;
quit;
Model: MODEL1
Dependent Variable: sbp

Number of Observations Read 1629
Number of Observations Used 1552
Number of Observations with Missing Values 77

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 7 98121 14017 46.56 <.0001
Error 1544 464833 301.05758
Corrected Total 1551 562954

Root MSE 17.35101 R-Square 0.1743
Dependent Mean 128.70941 Adj R-Sq 0.1706
Coeff Var 13.48076

Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t| 95% Confidence Limits
Intercept 1 113.85479 2.95509 38.53 <.0001 108.05838 119.65120
qsmk 1 1.51803 1.03207 1.47 0.1415 -0.50638 3.54245
smokeyrs 1 0.06761 0.08043 0.84 0.4007 -0.09016 0.22537
smokeintensity 1 -0.06613 0.03933 -1.68 0.0929 -0.14328 0.01102
diabetes 1 -0.62827 0.44678 -1.41 0.1599 -1.50463 0.24809
sex 1 -16.63281 3.36583 -4.94 <.0001 -23.23489 -10.03073
age 1 0.39023 0.09439 4.13 <.0001 0.20507 0.57538
sex_age 1 0.28503 0.07558 3.77 0.0002 0.13678 0.43329

SAS offers two additional ways to fit a linear regression, both of which handle categorical variables and interaction terms directly, unlike PROC REG. These options are PROC GLM and PROC GENMOD. We’ll focus on PROC GENMOD here.

proc genmod data=nhefs;
    model sbp = qsmk /link=identity dist=normal;
run;
Model Information
Data Set WORK.NHEFS
Distribution Normal
Link Function Identity
Dependent Variable sbp

Number of Observations Read 1629
Number of Observations Used 1552
Missing Values 77

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1550 558280.0358 360.1807
Scaled Deviance 1550 1552.0000 1.0013
Pearson Chi-Square 1550 558280.0358 360.1807
Scaled Pearson X2 1550 1552.0000 1.0013
Log Likelihood -6769.1980
Full Log Likelihood -6769.1980
AIC (smaller is better) 13544.3961
AICC (smaller is better) 13544.4116
BIC (smaller is better) 13560.4380

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 127.6989 0.5571 126.6070 128.7908 52540.9 <.0001
qsmk 1 3.9907 1.1071 1.8208 6.1606 12.99 0.0003
Scale 1 18.9662 0.3404 18.3106 19.6453

Note: The scale parameter was estimated by maximum likelihood.


proc genmod data=nhefs;
    model sbp = qsmk smokeyrs smokeintensity diabetes sex:age /link=identity dist=normal;
run;
Model Information
Data Set WORK.NHEFS
Distribution Normal
Link Function Identity
Dependent Variable sbp

Number of Observations Read 1629
Number of Observations Used 1552
Missing Values 77

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 1544 464832.8969 301.0576
Scaled Deviance 1544 1552.0000 1.0052
Pearson Chi-Square 1544 464832.8969 301.0576
Scaled Pearson X2 1544 1552.0000 1.0052
Log Likelihood -6627.0482
Full Log Likelihood -6627.0482
AIC (smaller is better) 13272.0965
AICC (smaller is better) 13272.2132
BIC (smaller is better) 13320.2222

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 113.8548 2.9475 108.0779 119.6317 1492.13 <.0001
qsmk 1 1.5180 1.0294 -0.4996 3.5356 2.17 0.1403
smokeyrs 1 0.0676 0.0802 -0.0896 0.2248 0.71 0.3994
smokeintensity 1 -0.0661 0.0392 -0.1430 0.0108 2.84 0.0919
diabetes 1 -0.6283 0.4456 -1.5017 0.2451 1.99 0.1586
sex 1 -16.6328 3.3571 -23.2127 -10.0529 24.55 <.0001
sex_age 1 0.2850 0.0754 0.1373 0.4328 14.29 0.0002
age 1 0.3902 0.0942 0.2057 0.5748 17.18 <.0001
Scale 1 17.3062 0.3106 16.7080 17.9259

Note: The scale parameter was estimated by maximum likelihood.


7.4.2 Logistic regression

Let’s revisit our tabular analysis, in which we were interested in whether quitting smoking affected the likelihood of a subject’s gaining weight between 1971 and 1982.

We can answer the same question using a logistic regression model, using the glm() function.

proc logistic data=nhefs;
    model sbp_hi(event='1') = qsmk;
    estimate 'qsmk (1 vs. 0)' qsmk 1 /exp;
run;
Model Information
Data Set WORK.NHEFS
Response Variable sbp_hi
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring

Number of Observations Read 1629
Number of Observations Used 1552

Response Profile
Ordered
Value
sbp_hi Total
Frequency
1 0 1192
2 1 360

Probability modeled is sbp_hi=1.



Note: 77 observations were deleted due to missing values for the response or explanatory variables.

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics
Criterion Intercept Only Intercept and
Covariates
AIC 1683.227 1679.301
SC 1688.574 1689.996
-2 Log L 1681.227 1675.301

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 5.9256 1 0.0149
Score 6.0872 1 0.0136
Wald 6.0586 1 0.0138

Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -1.2858 0.0713 325.0998 <.0001
qsmk 1 0.3282 0.1333 6.0586 0.0138

Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
qsmk 1.388 1.069 1.803

Association of Predicted Probabilities and
Observed Responses
Percent Concordant 23.1 Somers' D 0.065
Percent Discordant 16.6 Gamma 0.163
Percent Tied 60.3 Tau-a 0.023
Pairs 429120 c 0.532

Estimate
Label Estimate Standard Error z Value Pr > |z| Exponentiated
qsmk (1 vs. 0) 0.3282 0.1333 2.46 0.0138 1.3885

proc genmod data=nhefs;
    model sbp_hi(ref='0') = qsmk /link=logit dist=bin;
    estimate 'qsmk (1 vs. 0)' qsmk 1 /exp;
run;
Model Information
Data Set WORK.NHEFS
Distribution Binomial
Link Function Logit
Dependent Variable sbp_hi

Number of Observations Read 1629
Number of Observations Used 1552
Number of Events 360
Number of Trials 1552
Missing Values 77

Response Profile
Ordered
Value
sbp_hi Total
Frequency
1 1 360
2 0 1192


PROC GENMOD is modeling the probability that sbp_hi=‘1’. One way to change this to model the probability that sbp_hi=‘0’ is to specify the DESCENDING option in the PROC statement.

Parameter Information
Parameter Effect
Prm1 Intercept
Prm2 qsmk

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Log Likelihood -837.6506
Full Log Likelihood -837.6506
AIC (smaller is better) 1679.3012
AICC (smaller is better) 1679.3089
BIC (smaller is better) 1689.9958

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 -1.2858 0.0713 -1.4256 -1.1460 325.10 <.0001
qsmk 1 0.3282 0.1333 0.0668 0.5895 6.06 0.0139
Scale 0 1.0000 0.0000 1.0000 1.0000

Note: The scale parameter was held fixed.


Contrast Estimate Results
Label Mean Estimate Mean L'Beta Estimate Standard
Error
Alpha L'Beta Chi-Square Pr > ChiSq
Confidence Limits Confidence Limits
qsmk (1 vs. 0) 0.5813 0.5167 0.6433 0.3282 0.1333 0.05 0.0668 0.5895 6.06 0.0139
Exp(qsmk (1 vs. 0)) 1.3884 0.1851 0.05 1.0691 1.8031

As in the linear regression example, you could run a model adjusting for other factors, in which case you would modify the model formula accordingly. We will forego running an adjusted model here to avoid redundancy.

7.4.3 Log-binomial regression

While the logistic regression remains the most frequently used generalized linear model (GLM) for binary outcomes, epidemiologists are often interested in risk differences and risk ratios.

Luckily, we can use the log-binomial model to estimate contrasts in risks.

Try this Calculate the risk ratio manually and convince yourself that the log-binomial model gave you the correct answer.

proc genmod data=nhefs;
    model sbp_hi(ref='0') = qsmk /link=log dist=bin;
    estimate 'qsmk (1 vs. 0)' qsmk 1 /exp;
run;
Model Information
Data Set WORK.NHEFS
Distribution Binomial
Link Function Log
Dependent Variable sbp_hi

Number of Observations Read 1629
Number of Observations Used 1552
Number of Events 360
Number of Trials 1552
Missing Values 77

Response Profile
Ordered
Value
sbp_hi Total
Frequency
1 1 360
2 0 1192


PROC GENMOD is modeling the probability that sbp_hi=‘1’. One way to change this to model the probability that sbp_hi=‘0’ is to specify the DESCENDING option in the PROC statement.

Parameter Information
Parameter Effect
Prm1 Intercept
Prm2 qsmk

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Log Likelihood -837.6506
Full Log Likelihood -837.6506
AIC (smaller is better) 1679.3012
AICC (smaller is better) 1679.3089
BIC (smaller is better) 1689.9958

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 -1.5299 0.0559 -1.6394 -1.4204 749.85 <.0001
qsmk 1 0.2474 0.0987 0.0539 0.4409 6.28 0.0122
Scale 0 1.0000 0.0000 1.0000 1.0000

Note: The scale parameter was held fixed.


Contrast Estimate Results
Label Mean Estimate Mean L'Beta Estimate Standard
Error
Alpha L'Beta Chi-Square Pr > ChiSq
Confidence Limits Confidence Limits
qsmk (1 vs. 0) 1.2807 1.0553 1.5542 0.2474 0.0987 0.05 0.0539 0.4409 6.28 0.0122
Exp(qsmk (1 vs. 0)) 1.2807 0.1265 0.05 1.0553 1.5542

To get the risk ratio for the effect of quitting smoking on weight gain, we again exponentiate the beta coefficient of interest: those who quit smoking were 1.28 times as likely to have high systolic blood pressure as those who did not quit smoking.

7.4.4 Modified Poisson regression

Another way to get a risk ratio using GLMs involves a procedure called modified Poisson regression. “Modified” here refers to a necessary modification of the standard error estimates when using a Poisson regression model to estimate a risk ratio. Again, save the statistical details for later. All we want to know for the moment are the nuts-and-bolts: how to fit the model and how to modify the standard errors.

proc genmod data=nhefs;
    class seqn;
    model sbp_hi(ref='0') = qsmk /link=log dist=poisson;
    repeated subject=seqn;
    estimate 'qsmk (1 vs. 0)' qsmk 1 /exp;
run;
Model Information
Data Set WORK.NHEFS
Distribution Poisson
Link Function Log
Dependent Variable sbp_hi

Number of Observations Read 1629
Number of Observations Used 1552
Missing Values 77

Class Level Information
Class Levels Values
seqn 1629 233 235 244 245 252 257 262 266 419 420 428 431 434 443 446 455 457 596 603 604 605 616 618 619 620 804 806 813 816 818 825 828 831 1094 1096 1101 1103 1104 1106 1109 1116 1120 1124 1126 1127 1129 1133 1135 1148 1476 1480 1498 1505 1513 1515 1519 1523 …

Parameter Information
Parameter Effect
Prm1 Intercept
Prm2 qsmk

Algorithm converged.

GEE Model Information
Correlation Structure Independent
Subject Effect seqn (1629 levels)
Number of Clusters 1629
Clusters With Missing Values 77
Correlation Matrix Dimension 1
Maximum Cluster Size 1
Minimum Cluster Size 0

Algorithm converged.

GEE Fit Criteria
QIC 2302.3462
QICu 2302.4294

Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.5299 0.0559 -1.6394 -1.4204 -27.38 <.0001
qsmk 0.2474 0.0987 0.0539 0.4409 2.51 0.0122

Contrast Estimate Results
Label Mean Estimate Mean L'Beta Estimate Standard
Error
Alpha L'Beta Chi-Square Pr > ChiSq
Confidence Limits Confidence Limits
qsmk (1 vs. 0) 1.2807 1.0553 1.5542 0.2474 0.0987 0.05 0.0539 0.4409 6.28 0.0122
Exp(qsmk (1 vs. 0)) 1.2807 0.1265 0.05 1.0553 1.5542

In the prior chapter, which showcased the log-binomial and modified Poisson model fitting procedures in R, we noted that the “modified” in modified Poisson refers to the adjusted of the standard error. In order to fit the modified Poisson model correctly in SAS, you must specify the repeated option as above, where the argument passed to subject is the unique observation identifier.