3. The Negative Binomial Regression Model (NBRM)

The SAS GENMODE procedure, STATA .nbreg command, and LIMDEP Negbin$ command estimate the negative binomial regression model (NBRM).


3.1 NBRM in SAS

The GENMOD procedure estimates the NBRM using the /DIST=NEGBIN option. Note that the dispersion parameter is equivalent to the alpha in STATA and LIMDEP.

PROC GENMOD DATA = masil.accident;
   MODEL accident=emps strict /DIST=NEGBIN LINK=LOG;
RUN;

                                      The GENMOD Procedure
 
                                       Model Information
 
                            Data Set                    COUNT.WASTE
                            Distribution          Negative Binomial
                            Link Function                       Log
                            Dependent Variable             Accident
                            Observations Used                   778
 
 
                             Criteria For Assessing Goodness Of Fit
 
                  Criterion                 DF           Value        Value/DF
 
                  Deviance                 775        589.7752          0.7610
                  Scaled Deviance          775        589.7752          0.7610
                  Pearson Chi-Square       775        845.6033          1.0911
                  Scaled Pearson X2        775        845.6033          1.0911
                  Log Likelihood                       37.5628
 
 
          Algorithm converged.
 
                                Analysis Of Parameter Estimates
 
                                   Standard     Wald 95% Confidence       Chi-
   Parameter     DF    Estimate       Error           Limits            Square    Pr > ChiSq
 
   Intercept      1      0.3851      0.1278      0.1345      0.6357       9.07        0.0026
   Emps           1      0.0052      0.0023      0.0008      0.0096       5.29        0.0214
   Strict         1     -0.6703      0.1671     -0.9978     -0.3427      16.09        <.0001
   Dispersion     1      3.9554      0.3501      3.3254      4.7048
 
NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.

The restricted model produces a log likelihood of 28.8627. Thus, the likelihood ratio for goodness-of-fit is 17.4002 = 2 * (37.5628 – 28.8627) (p<.00017).

PROC GENMOD DATA = masil.accident;
   MODEL accident= /DIST=NEGBIN LINK=LOG;
RUN;

The likelihood ratio for overdispersion is 1409.5838 = 2 * (37.5628 - (-667.2291)).

Top

3.2 NBRM in STATA

STATA has the .nbreg command for the NBRM. The command reports three log likelihood statistics: for the PRM, restricted NBRM (constant-only model), and unrestricted NBRM (full model), which make it easy to conduct likelihood ratio tests.

. nbreg accident emps strict

Fitting comparison Poisson model:
 
Iteration 0:   log likelihood = -1821.5112 
Iteration 1:   log likelihood = -1821.5101 
Iteration 2:   log likelihood = -1821.5101 
 
Fitting constant-only model:
 
Iteration 0:   log likelihood = -1256.6761 
Iteration 1:   log likelihood = -1152.6155 
Iteration 2:   log likelihood = -1125.6643 
Iteration 3:   log likelihood = -1125.4183 
Iteration 4:   log likelihood = -1125.4183 
 
Fitting full model:
 
Iteration 0:   log likelihood = -1117.1731 
Iteration 1:   log likelihood = -1116.7201 
Iteration 2:   log likelihood = -1116.7182 
Iteration 3:   log likelihood = -1116.7182 
 
Negative binomial regression                      Number of obs   =        778
                                                  LR chi2(2)      =      17.40
                                                  Prob > chi2     =     0.0002
Log likelihood = -1116.7182                       Pseudo R2       =     0.0077
 
------------------------------------------------------------------------------
    accident |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        emps |   .0051981   .0022595     2.30   0.021     .0007694    .0096267
      strict |  -.6702548   .1671191    -4.01   0.000    -.9978021   -.3427074
       _cons |   .3851111   .1278468     3.01   0.003      .134536    .6356861
-------------+----------------------------------------------------------------
    /lnalpha |    1.37509   .0885176                      1.201599    1.548582
-------------+----------------------------------------------------------------
       alpha |   3.955434   .3501257                       3.32543    4.704793
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0:  chibar2(01) = 1409.58 Prob>=chibar2 = 0.000

The restricted model or “constant-only model” gives us a log likelihood -1125.4183. Thus, the likelihood ratio for goodness-of-fit is 17.4002 = 2 * [-1116.7182 - (-1125.4183)] (p<.00017). The p-value is computed as follows (Note the .disp or .di is an abbreviation of the .display).

. disp chi2tail(2, 17.4002)
.00016657

The likelihood ratio test for overdispersion results in a chi-squared of 1409.5838 (p<.0000) and rejects the null hypothesis of alpha=0. The statistically significant evidence of overdispersion indicates that the NBRM is preferred to the PRM.

. di 2 * (-1116.7182 - (-1821.5101))
1409.5838

The p-value of the likelihood ratio for overdispersion is computed as,

. di chi2tail(1, 1409.5838)
1.74e-308

Now, let us calculate marginal effects (or changes) at the means of independent variables. You should the read the discrete change labeled “0->1” of a binary variable strict, since its marginal change at the mean (.5077) is meaningless.

. prchange

nbreg: Changes in Predicted Rate for accident
 
        min->max      0->1     -+1/2    -+sd/2  MargEfct
  emps    1.5326    0.0055    0.0068    0.2585    0.0068
strict   -0.8931   -0.8931   -0.8885   -0.4383   -0.8721
 
exp(xb):   1.3011
 
           emps   strict
    x=  42.0129  .507712
sd(x)=  38.1548  .500262

Top


3.3 NBRM in LIMDEP

LIMDEP has the Negbin$ command for the NBRM that reports the PRM as well. Note that the standard errors of parameter estimates are slightly different from those of SAS and STATA. The Marginal Effects$ and the Means$ subcommands compute marginal effects at the mean of independent variables. You may not omit the Means$ subcommand.

NEGBIN;
   Lhs=ACCIDENT;
   Rhs=ONE,EMPS,STRICT;
   Marginal Effects;
   Means$

+---------------------------------------------+
| Poisson Regression                          |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 08, 2005 at 09:35:36AM.|
| Dependent variable             ACCIDENT     |
| Weighting variable                 None     |
| Number of observations              778     |
| Iterations completed                  8     |
| Log likelihood function       -1821.510     |
| Restricted log likelihood     -1883.921     |
| Chi squared                    124.8218     |
| Degrees of freedom                    2     |
| Prob[ChiSqd > value] =         .0000000     |
| Chi- squared =  4944.94781  RsqP=  -.0051   |
| G  - squared =  2827.20794  RsqD=   .0423   |
| Overdispersion tests: g=mu(i)  :  4.720     |
| Overdispersion tests: g=mu(i)^2:  4.253     |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 Constant     .3900961420   .46678663E-01    8.357   .0000
 EMPS      .5418599057E-02  .74341923E-03    7.289   .0000     42.012853
 STRICT      -.7041663804   .66761926E-01  -10.547   .0000     .50771208
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
 
Normal exit from iterations. Exit status=0.
 
+---------------------------------------------+
| Negative Binomial Regression                |
| Maximum Likelihood Estimates                |
| Model estimated: Sep 08, 2005 at 09:35:36AM.|
| Dependent variable             ACCIDENT     |
| Weighting variable                 None     |
| Number of observations              778     |
| Iterations completed                  8     |
| Log likelihood function       -1116.718     |
| Restricted log likelihood     -1821.510     |
| Chi squared                    1409.584     |
| Degrees of freedom                    1     |
| Prob[ChiSqd > value] =         .0000000     |
+---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 Constant     .3851110699       .12855240    2.996   .0027
 EMPS      .5198057234E-02  .22602075E-02    2.300   .0215     42.012853
 STRICT      -.6702547660       .16729839   -4.006   .0001     .50771208
          Dispersion parameter for count data model
 Alpha        3.955434012       .35680876   11.086   .0000
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)
 
 
+-------------------------------------------+
| Partial derivatives of expected val. with |
| respect to the vector of characteristics. |
| They are computed at the means of the Xs. |
| Observations used for means are All Obs.  |
| Conditional Mean at Sample Point   1.3011 |
| Scale Factor for Marginal Effects  1.3011 |
+-------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
 Constant     .5010628939       .19396434    2.583   .0098
 EMPS      .6763123170E-02  .29746591E-02    2.274   .0230     42.012853
 STRICT      -.8720595665       .22469308   -3.881   .0001     .50771208
 (Note: E+nn or E-nn means multiply by 10 to + or -nn power.)

Read the coefficients (.0068 and -.8721) to confirm that they are identical to the corresponding marginal effects calculated in STATA.

SAS, STATA, and LIMDEP produce almost the same parameter estimates and goodness-of-fit statistics (Table 4). Note that SAS reports different log likelihoods, but the same likelihood ratio.

Figure 3 compares the PRM and NBRM. Look at the predictions for zero counts of the two models. As the likelihood ratio test indicates, the NBRM seems to fit these data better than PRM.

Figure 3. Comparison of the Poisson and Negative Binomial Regression Models




Up: Table of Contents
Next: The Zero-Inflated Poisson Regression Model
Prev: The Poisson Regression Model