stata stata

Regression Models for Categorical Dependent Variables using Stata

J. Scott Long, Jeremy Freese


Table of Contents

    Preface

    Part I General Information

    1 Introduction

    1.1 What is this book about?
    1.2 Which models are considered?
    1.3 Who is this book for?
    1.4 How is the book organized?
    1.5 What software do you need?
    1.5.1 Updating Stata 8
    1.5.2 Installing SPost
    Installing SPost using net search
    Installing SPost using net install
    1.5.3 What if commands do not work?
    1.5.4 Uninstalling SPost
    1.5.5 Additional files available on the web site
    1.6 Where can I learn more about the models?

    2 Introduction to Stata
    2.1 The Stata interface
    Changing the scrollback buffer size
    Changing the display of variable names in the Variables window
    2.2 Abbreviations
    2.3 How to get help
    2.3.1 Online help
    2.3.2 Manuals
    2.3.3 Other resources
    2.4 The working directory
    2.5 Stata file types
    2.6 Saving output to log files
    Options
    2.6.1 Closing a log file
    2.6.2 Viewing a log file
    2.6.3 Converting from SMCL to plain text or PostScript
    2.7 Using and saving datasets
    2.7.1 Data in Stata format
    2.7.2 Data in other formats
    2.7.3 Entering data by hand
    2.8 Size limitations on datasets
    2.9 do-files
    2.9.1 Adding comments
    2.9.2 Long lines
    2.9.3 Stopping a do-file while it is running
    2.9.4 Creating do-files
    Using Stata's do-file editor
    Using other editors to create do-files
    2.9.5 A recommended structure for do-files
    2.10 Using Stata for serious data analysis
    2.11 The syntax of Stata commands
    2.11.1 Commands
    2.11.2 Variable lists
    2.11.3 if and in qualifiers
    Examples of if qualifier
    2.11.4 Options
    2.12 Managing data
    2.12.1 Looking at your data
    2.12.2 Getting information about variables
    2.12.3 Missing values
    2.12.4 Selecting observations
    2.12.5 Selecting variables
    2.13 Creating new variables
    2.13.1 generate command
    2.13.2 replace command
    2.13.3 recode command
    2.13.4 Common transformations for RHS variables
    Breaking a categorical variable into a set of binary variables
    More examples of creating binary variables
    Nonlinear transformations
    Interaction terms
    2.14 Labeling variables and values
    2.14.1 Variable labels
    2.14.2 Value labels
    2.14.3 notes command
    2.15 Global and local macros
    2.16 Graphics
    2.16.1 The graph command
    2.16.2 Displaying previously drawn graphs
    2.16.3 Printing graphs
    2.16.4 Combining graphs
    2.17 A brief tutorial
    A batch version

    3 Estimation, Testing, Fit, and Interpretation
    3.1 Estimation
    3.1.1 Stata's output for ML estimation
    3.1.2 ML and sample size
    3.1.3 Problems in obtaining Ml estimates
    3.1.4 The syntax of estimation commands
    Variable lists
    Specifying the estimation sample
    Options
    3.1.5 Reading the output
    Header
    Estimates and standard errors
    3.1.6 Reformatting output with estimates table
    3.1.7 Reformatting output with outreg
    3.1.8 Alternative output with listcoef
    Options for types of coefficients
    Other options
    Standardized coefficients
    Factor and percent change
    3.1.9 Storing estimation results
    3.2 Post-estimation analysis
    3.3 Testing
    3.3.1 Wald tests
    The accumulate option
    3.3.2 LR tests
    Avoiding invalid LR tests
    3.4 Measures of fit
    Syntax of fitstat
    Options
    Models and measures
    Example of fitstat
    Methods and formulas for fitstat
    3.5 Interpretation
    3.5.1 Approaches to interpretation
    3.5.2 Predictions using predict
    3.5.3 Overview of prchange, prgen, prtab, and prvalue
    Specifying the levels of variables
    Options for controlling output
    3.5.4 Syntax for prchange
    Options
    3.5.5 Syntax for prgen
    Options
    Variables generated
    3.5.6 Syntax for prtab
    Options
    3.5.7 Syntax for prvalue
    Options
    3.5.8 Computing marginal effects using mfx compute
    3.6 Next steps

    Part II Models for Specific Kinds of Outcomes

    4 Models for Binary Outcomes
    4.1 The statistical model
    4.1.1 A latent variable model
    4.1.2 A nonlinear probability model
    4.2 Estimation using logit and probit
    Variable lists
    Specifying the estimation sample
    Weights
    Options
    Example
    4.2.1 Observations predicted perfectly
    4.3 Hypothesis testing with test and lrtest
    4.3.1 Testing individual coefficients
    One and two-tailed tests
    Testing single coefficients using test
    Testing single coefficients using lrtest
    4.3.2 Testing multiple coefficients
    Testing multiple coefficients using test
    Testing multiple coefficients using lrtest
    4.3.3 Comparing LR and Wald tests
    4.4 Residuals and influence using predict
    4.4.1 Residuals
    Example
    4.4.2 Influential cases
    4.5 Scalar measures of fit using fitstat
    Example
    4.6 Interpretation using predicted values
    4.6.1 Predicted probabilities with predict
    4.6.2 Individual predicted probabilities with prvalue
    4.6.3 Tables of predicted probabilities with prtab
    4.6.4 Graphing predicted probabilities with prgen
    4.6.5 Changes in predicted probabilities
    Marginal change
    Discrete change
    4.7 Interpretation using odds ratios with listcoef
    Multiplicative coefficients
    Effect of the base probability
    Percent change in the odds
    4.8 Other commands for binary outcomes

    5 Models for Ordinal Outcomes
    5.1 The statistical model
    5.1.1 A latent variable model
    5.1.2 A nonlinear probability model
    5.2 Estimation using ologit and oprobit
    Variable lists
    Specifying the estimation sample
    Weights
    Options
    5.2.1 Example of attitudes toward working mothers
    5.2.2 Predicting perfectly
    5.3 Hypothesis testing with test and lrtest
    5.3.1 Testing individual coefficients
    5.3.2 Testing multiple coefficients
    5.4 Scalar measures of fit using fitstat
    5.5 Converting to a different parameterization
    5.6 The parallel regression assumption
    5.7 Residuals and outliers using predict
    5.8 Interpretation
    5.8.1 Marginal change in y
    5.8.2 Predicted probabilities
    5.8.3 Predicted probabilities with predict
    5.8.4 Individual predicted probabilities with prvalue
    5.8.5 Tables of predicted probabilities with prtab
    5.8.6 Graphing predicted probabilities with prgen
    5.8.7 Changes in predicted probabilities
    Marginal change with prchange
    Marginal change with mfx compute
    Discrete change with prchange
    Computing discrete change for a 10-year increase in age
    Odds ratios using listcoef
    5.8.8 Odds ratios using listcoef
    5.9 Less-common models for ordinal outcomes
    5.9.1 Generalized ordered logit model
    5.9.2 The stereotype model
    5.9.3 The continuation ratio model

    6 Models for Nominal Outcomes
    6.1 The multinomial logit model
    6.1.1 Formal statement of the model
    6.2 Estimation using mlogit
    Variable lists
    Specifying the estimation sample
    Weights
    Options
    6.2.1 Example of occupational attainment
    6.2.2 Using different base categories
    6.2.3 Predicting perfectly
    6.3 Hypothesis testing of coefficients
    6.3.1 mlogtest for tests of the MNLM
    Options
    6.3.2 Testing the effects of the independent variables
    A likelihood-ratio test
    A Wald test
    Testing multiple independent variables
    6.3.3 Tests for combining dependent categories
    A Wald test for combining outcomes
    Using test [category]
    An LR test for combining outcomes
    Using constraint with lrtest
    6.4 Independence of irrelevant alternatives
    Hausman test of IIA
    Small and Hsiao test of IIA
    Conclusions regarding tests of IIA
    6.5 Measures of fit
    6.6 Interpretation
    6.6.1 Predicted probabilities
    6.6.2 Predicted probabilities with predict
    Using predict to compare mlogit and ologit
    6.6.3 Individual predicted probabilities with prvalue
    6.6.4 Tables of predicted probabilities with prtab
    6.6.5 Graphing predicted probabilities with prgen
    Plotting probabilities for one outcome and two groups
    Graphing probabilities for all outcomes for one group
    6.6.6 Changes in predicted probabilities
    Computing marginal and discrete change with prchange
    Marginal change with mfx compute
    6.6.7 Plotting discrete changes with prchange and mlogview
    6.6.8 Odds ratios using listcoef and mlogview
    Listing odds ratios with listcoef
    Plotting odds ratios
    6.6.9 Using mlogplot
    6.6.10 Plotting estimates from matrices with mlogplot
    Options for using matrices with mlogplot
    Global macros and matrices used by mlogplot
    Example
    6.7 The conditional logit model
    6.7.1 Data arrangement for conditional logit
    6.7.2 Fitting the conditional logit model
    Options
    Example of the clogit model
    6.7.3 Interpreting results from clogit
    Using odds ratios
    Using predicted probabilities
    6.7.4 Fitting the multinomial logit model using clogit
    Setting up the data
    Creating interactions
    Fitting the model
    6.7.5 Using clogit to fit mixed models

    7 Models for Count Outcomes
    7.1 The Poisson distribution
    7.1.1 Fitting the Poisson distribution with the poisson command
    7.1.2 Computing predicted probabilities with prcounts
    Syntax
    Options
    Variables generated
    7.1.3 Comparing observed and predicted counts with prcounts
    7.2 The Poisson regression model
    7.2.1 Estimating the PRM with poisson
    Variable lists
    Specifying the estimation sample
    Weights
    Options
    7.2.2 Example of fitting the PRM
    7.2.3 Interpretation using the rate µ
    Factor change in E(y|x)
    Percent change in E(y|x)
    Example of factor and percent change
    Marginal change in E(y|x)
    Example of marginal change using prchange
    Example of marginal change using mfx compute
    Discrete change in E(y|x)
    Example of discrete change using prchange
    7.2.4 Interpretation using predicted probabilities
    Example of predicted probabilities using prvalue
    Example of predicted probabilities using prgen
    Example of predicted probabilities using prcounts
    7.2.5 Exposure time
    7.3 The negative binomial regression model
    7.3.1 Fitting the NBRM with nbreg
    7.3.2 Example of fitting the NBRM
    Comparing the PRM and NBRM using estimates table
    7.3.3 Testing for overdispersion
    7.3.4 Interpretation using the rate µ
    7.3.5 Interpretation using predicted probabilities
    7.4 Zero-inflated count models
    7.4.1 Estimation of zero-inflated models with zinb and zip
    Variable lists
    Options
    7.4.2 Example of fitting the ZIP and ZINB models
    7.4.3 Interpretation of coefficients
    7.4.4 Interpretation of predicted probabilities
    Predicted probabilities with prvalue
    Predicted probabilities with prgen
    7.5 Comparisons among count models
    7.5.1 Comparing mean probabilities
    7.5.2 Tests to compare count models
    LR tests of a
    Vuong test non-nested models

    8 Additional Topics
    8.1 Ordinal and nominal independent variables
    8.1.1 Coding a categorical independent variable as a set of dummy variables
    8.1.2 Estimation and interpretation with categorical independent variables
    8.1.3 Tests with categorical independent variables
    Testing the effect of membership in one category versus the reference category
    Testing the effect of membership in two nonreference categories
    Testing that a categorical independent variable has no effect
    Testing whether treating an ordinal variable as interval loses information
    8.1.4 Discrete change for categorical independent variables
    Computing discrete change with prchange
    Computing discrete change with prvalue
    8.2 Interactions
    8.2.1 Computing gender differences in predictions with interactions
    8.2.2 Computing gender differences in discrete change with interactions
    8.3 Nonlinear nonlinear models
    8.3.1 Adding nonlinearities to linear predictors
    8.3.2 Discrete change in nonlinear nonlinear models
    8.4 Using praccum and forvalues to plot predictions
    Options
    8.4.1 Example using age and age-squared
    8.4.2 Using forvalues with praccum
    8.4.3 Using praccum for graphing a transformed variable
    8.4.4 Using praccum to graph interactions
    8.5 Extending SPost to other estimation commands
    8.6 Using Stata more efficiently
    8.6.1 profile.do
    8.6.2 Changing screen fonts and window preferences
    8.6.3 Using ado-files for changing directories
    8.6.4 me.hlp file
    8.6.5 Scrolling in the Results Window in Windows
    8.7 Conclusions

    A Syntax for SPost Commands
    A.1 brant
    Syntax
    Description
    Options
    Examples
    Saved results
    A.2 fitstat
    Syntax
    Description
    Options
    Examples
    Saved results
    A.3 listcoef
    Syntax
    Description
    Options
    Examples
    Saved results
    A.4 mlogplot
    Syntax
    Description
    Options
    Examples
    A.5 mlogtest
    Syntax
    Description
    Options
    Examples
    Saved results
    Acknowledgment
    A.6 mlogview
    Syntax
    Description
    Options
    Dialog box controls
    A.7 Overview of prchange, prgen, prtab, and prvalue
    Syntax
    Examples
    A.8 praccum
    Syntax
    Description
    Options
    Examples
    New variables generated
    A.9 prchange
    Syntax
    Description
    Options
    Examples
    A.10 prcounts
    Syntax
    Description
    Options
    New variables generated
    Examples
    A.11 prgen
    Syntax
    Description
    Options
    Examples
    New variables generated
    A.12 prtab
    Syntax
    Description
    Options
    Examples
    A.13 prvalue
    Syntax
    Description
    Options
    Examples
    Saved results

    B Description of Datasets
    B.1 binlfp2
    B.2 couart2
    B.3 gsskidvalue2
    B.4 nomocc2
    B.5 ordwarm2
    B.6 science2
    B.7 travel2

    Author index

    Subject index