stata stata

Multilevel Model-building : A pragmatic approach to regression analysis based on fractional polynomial for modelling continuous variables.

Patrick Royston and Willi Sauerbreil

Table of Contents

Preface

1 Introduction

  • 1.1 Real-Life Problems as Motivation for Model Building
    • 1.1.1 Many Candidate Models
    • 1.1.2 Functional Form for Continuous Predictors
    • 1.1.3 Example 1: Continuous Response
    • 1.1.4 Example 2: Multivariable Model for Survival Data
  • 1.2 Issues in Modelling Continuous Predictors
    • 1.2.1 Effects of Assumptions
    • 1.2.2 Global versus Local Influence Models
    • 1.2.3 Disadvantages of Fractional Polynomial Modelling
    • 1.2.4 Controlling Model Complexity
  • 1.3 Types of Regression Model Considered
    • 1.3.1 Normal-Errors Regression
    • 1.3.2 Logistic Regression
    • 1.3.3 Cox Regression
    • 1.3.4 Generalized Linear Models
    • 1.3.5 Linear and Additive Predictors
  • 1.4 Role of Residuals
    • 1.4.1 Uses of Residuals
    • 1.4.2 Graphical Analysis of Residuals
  • 1.5 Role of Subject-Matter Knowledge in Model Development
  • 1.6 Scope of Model Building in our Book
  • 1.7 Modelling Preferences
    • 1.7.1 General Issues
    • 1.7.2 Criteria for a Good Model
    • 1.7.3 Personal Preferences
  • 1.8 General Notation

2 Selection of Variables

  • 2.1 Introduction
  • 2.2 Background
  • 2.3 Preliminaries for a Multivariable Analysis
  • 2.4 Aims of Multivariable Models
  • 2.5 Prediction: Summary Statistics and Comparisons
  • 2.6 Procedures for Selecting Variables
    • 2.6.1 Strength of Predictors
    • 2.6.2 Stepwise Procedures
    • 2.6.3 All-Subsets Model Selection Using Information Criteria
    • 2.6.4 Further Considerations
  • 2.7 Comparison of Selection Strategies in Examples
    • 2.7.1 Myeloma Study
    • 2.7.2 Educational Body-Fat Data
    • 2.7.3 Glioma Study
  • 2.8 Selection and Shrinkage
    • 2.8.1 Selection Bias
    • 2.8.2 Simulation Study
    • 2.8.3 Shrinkage to Correct for Selection Bias
    • 2.8.4 Post-estimation Shrinkage
    • 2.8.5 Reducing Selection Bias
    • 2.8.6 Example
  • 2.9 Discussion
    • 2.9.1 Model Building in Small Datasets
    • 2.9.2 Full, Pre-specified or Selected Model?
    • 2.9.3 Comparison of Selection Procedures
    • 2.9.4 Complexity, Stability and Interpretability
    • 2.9.5 Conclusions and Outlook

3 Handling Categorical and Continuous Predictors

  • 3.1 Introduction
  • 3.2 Types of Predictor
    • 3.2.1 Binary
    • 3.2.2 Nominal
    • 3.2.3 Ordinal, Counting, Continuous
    • 3.2.4 Derived
  • 3.3 Handling Ordinal Predictors
    • 3.3.1 Coding Schemes
    • 3.3.2 Effect of Coding Schemes on Variable Selection
  • 3.4 Handling Counting and Continuous Predictors: Categorization
    • 3.4.1 ‘Optimal’ Cutpoints: A Dangerous Analysis
    • 3.4.2 Other Ways of Choosing a Cutpoint
  • 3.5 Example: Issues in Model Building with Categorized Variables
    • 3.5.1 One Ordinal Variable
    • 3.5.2 Several Ordinal Variables
  • 3.6 Handling Counting and Continuous Predictors: Functional Form
    • 3.6.1 Beyond Linearity
    • 3.6.2 Does Nonlinearity Matter?
    • 3.6.3 Simple versus Complex Functions
    • 3.6.4 Interpretability and Transportability
  • 3.7 Empirical Curve Fitting
    • 3.7.1 General Approaches to Smoothing
    • 3.7.2 Critique of Local and Global Influence Models
  • 3.8 Discussion
    • 3.8.1 Sparse Categories
    • 3.8.2 Choice of Coding Scheme
    • 3.8.3 Categorizing Continuous Variables
    • 3.8.4 Handling Continuous Variables

4 Fractional Polynomials for One Variable

  • 4.1 Introduction
  • 4.2 Background
    • 4.2.1 Genesis
    • 4.2.2 Types of Model
    • 4.2.3 Relation to Box–Tidwell and Exponential Functions
  • 4.3 Definition and Notation
    • 4.3.1 Fractional Polynomials
    • 4.3.2 First Derivative
  • 4.4 Characteristics
    • 4.4.1 FP1 and FP2 Functions
    • 4.4.2 Maximum or Minimum of a FP2 Function
  • 4.5 Examples of Curve Shapes with FP1 and FP2 Functions
  • 4.6 Choice of Powers
  • 4.7 Choice of Origin
  • 4.8 Model Fitting and Estimation
  • 4.9 Inference
    • 4.9.1 Hypothesis Testing
    • 4.9.2 Interval Estimation
  • 4.10 Function Selection Procedure
    • 4.10.1 Choice of Default Function
    • 4.10.2 Closed Test Procedure for Function Selection
    • 4.10.3 Example
    • 4.10.4 Sequential Procedure
    • 4.10.5 Type I Error and Power of the Function Selection Procedure
  • 4.11 Scaling and Centering
    • 4.11.1 Computational Aspects
    • 4.11.2 Examples
  • 4.12 FP Powers as Approximations to Continuous Powers
    • 4.12.1 Box–Tidwell and Fractional Polynomial Models
    • 4.12.2 Example
  • 4.13 Presentation of Fractional Polynomial Functions
    • 4.13.1 Graphical
    • 4.13.2 Tabular
  • 4.14 Worked Example
    • 4.14.1 Details of all Fractional Polynomial Models
    • 4.14.2 Function Selection
    • 4.14.3 Details of the Fitted Model
    • 4.14.4 Standard Error of a Fitted Value
    • 4.14.5 Fitted Odds Ratio and its Confidence Interval
  • 4.15 Modelling Covariates with a Spike at Zero
  • 4.16 Power of Fractional Polynomial Analysis
    • 4.16.1 Underlying Function Linear
    • 4.16.2 Underlying Function FP1 or FP2
    • 4.16.3 Comment
  • 4.17 Discussion

5 Some Issues with Univariate Fractional Polynomial Models

  • 5.1 Introduction
  • 5.2 Susceptibility to Influential Covariate Observations
  • 5.3 A Diagnostic Plot for Influential Points in FP Models
    • 5.3.1 Example 1: Educational Body-Fat Data
    • 5.3.2 Example 2: Primary Biliary Cirrhosis Data
  • 5.4 Dependence on Choice of Origin
  • 5.5 Improving Robustness by Preliminary Transformation
    • 5.5.1 Example 1: Educational Body-Fat Data
    • 5.5.2 Example 2: PBC Data
    • 5.5.3 Practical Use of the Pre-transformation gδ(x)
  • 5.6 Improving Fit by Preliminary Transformation
    • 5.6.1 Lack of Fit of Fractional Polynomial Models
    • 5.6.2 Negative Exponential Pre-transformation
  • 5.7 Higher Order Fractional Polynomials
    • 5.7.1 Example 1: Nerve Conduction Data
    • 5.7.2 Example 2: Triceps Skinfold Thickness
  • 5.8 When Fractional Polynomial Models are Unsuitable
    • 5.8.1 Not all Curves are Fractional Polynomials
    • 5.8.2 Example: Kidney Cancer
  • 5.9 Discussion

6 MFP: Multivariable Model-Building with Fractional Polynomials

  • 6.1 Introduction
  • 6.2 Motivation
  • 6.3 The MFP Algorithm
    • 6.3.1 Remarks
    • 6.3.2 Example
  • 6.4 Presenting the Model
    • 6.4.1 Parameter Estimates
    • 6.4.2 Function Plots
    • 6.4.3 Effect Estimates
  • 6.5 Model Criticism
    • 6.5.1 Function Plots
    • 6.5.2 Graphical Analysis of Residuals
    • 6.5.3 Assessing Fit by Adding More Complex Functions
    • 6.5.4 Consistency with Subject-Matter Knowledge
  • 6.6 Further Topics
    • 6.6.1 Interval Estimation
    • 6.6.2 Importance of the Nominal Significance Level
    • 6.6.3 The Full MFP Model
    • 6.6.4 A Single Predictor of Interest
    • 6.6.5 Contribution of Individual Variables to the Model Fit
    • 6.6.6 Predictive Value of Additional Variables
  • 6.7 Further Examples
    • 6.7.1 Example 1: Oral Cancer
    • 6.7.2 Example 2: Diabetes
    • 6.7.3 Example 3: Whitehall I
  • 6.8 Simple Versus Complex Fractional Polynomial Models
    • 6.8.1 Complexity and Modelling Aims
    • 6.8.2 Example: GBSG Breast Cancer Data
  • 6.9 Discussion
    • 6.9.1 Philosophy of MFP
    • 6.9.2 Function Complexity, Sample Size and Subject-Matter Knowledge
    • 6.9.3 Improving Robustness by Preliminary Covariate Transformation
    • 6.9.4 Conclusion and Future

7 Interactions

  • 7.1 Introduction
  • 7.2 Background
  • 7.3 General Considerations
    • 7.3.1 Effect of Type of Predictor
    • 7.3.2 Power
    • 7.3.3 Randomized Trials and Observational Studies
    • 7.3.4 Predefined Hypothesis or Hypothesis Generation
    • 7.3.5 Interactions Caused by Mismodelling Main Effects
    • 7.3.6 The ‘Treatment–Effect’ Plot
    • 7.3.7 Graphical Checks, Sensitivity and Stability Analyses
    • 7.3.8 Cautious Interpretation is Essential
  • 7.4 The MFPI Procedure
    • 7.4.1 Model Simplifications
    • 7.4.2 Check of the Results and Sensitivity Analysis
  • 7.5 Example 1: Advanced Prostate Cancer
    • 7.5.1 The Fitted Model
    • 7.5.2 Check of the Interactions
    • 7.5.3 Final Model
    • 7.5.4 Further Comments and Interpretation
    • 7.5.5 FP Model Simplification
  • 7.6 Example 2: GBSG Breast Cancer Study
    • 7.6.1 Oestrogen Receptor Positivity as a Predictive Factor
    • 7.6.2 A Predefined Hypothesis: Tamoxifen–Oestrogen Receptor Interaction
  • 7.7 Categorization
    • 7.7.1 Interaction with Categorized Variables
    • 7.7.2 Example: GBSG Study
  • 7.8 STEPP
  • 7.9 Example 3: Comparison of STEPP with MFPI
    • 7.9.1 Interaction in the Kidney Cancer Data
    • 7.9.2 Stability Investigation
  • 7.10 Comment on Type I Error of MFPI
  • 7.11 Continuous-by-Continuous Interactions
    • 7.11.1 Mismodelling May Induce Interaction
    • 7.11.2 MFPIgen: An FP Procedure to Investigate Interactions
    • 7.11.3 Examples of MFPIgen
    • 7.11.4 Graphical Presentation of Continuous-by-Continuous Interactions
    • 7.11.5 Summary
  • 7.12 Multi-Category Variables
  • 7.13 Discussion

8 Model Stability

  • 8.1 Introduction
  • 8.2 Background
  • 8.3 Using the Bootstrap to Explore Model Stability
    • 8.3.1 Selection of Variables Within a Bootstrap Sample
    • 8.3.2 The Bootstrap Inclusion Frequency and the Importance of a Variable
  • 8.4 Example 1: Glioma Data
  • 8.5 Example 2: Educational Body-Fat Data
    • 8.5.1 Effect of Influential Observations on Model Selection
  • 8.6 Example 3: Breast Cancer Diagnosis
  • 8.7 Model Stability for Functions
    • 8.7.1 Summarizing Variation between Curves
    • 8.7.2 Measures of Curve Instability
  • 8.8 Example 4: GBSG Breast Cancer Data
    • 8.8.1 Interdependencies among Selected Variables and Functions in Subsets
    • 8.8.2 Plots of Functions
    • 8.8.3 Instability Measures
    • 8.8.4 Stability of Functions Depending on Other Variables Included
  • 8.9 Discussion
    • 8.9.1 Relationship between Inclusion Fractions
    • 8.9.2 Stability of Functions

9 Some Comparisons of MFP with Splines

  • 9.1 Introduction
  • 9.2 Background
  • 9.3 MVRS: A Procedure for Model Building with Regression Splines
    • 9.3.1 Restricted Cubic Spline Functions
    • 9.3.2 Function Selection Procedure for Restricted Cubic Splines
    • 9.3.3 The MVRS Algorithm
  • 9.4 MVSS: A Procedure for Model Building with Cubic Smoothing Splines
    • 9.4.1 Cubic Smoothing Splines
    • 9.4.2 Function Selection Procedure for Cubic Smoothing Splines
    • 9.4.3 The MVSS Algorithm
  • 9.5 Example 1: Boston Housing Data
    • 9.5.1 Effect of Reducing the Sample Size
    • 9.5.2 Comparing Predictors
  • 9.6 Example 2: GBSG Breast Cancer Study
  • 9.7 Example 3: Pima Indians
  • 9.8 Example 4: PBC
  • 9.9 Discussion
    • 9.9.1 Splines in General
    • 9.9.2 Complexity of Functions
    • 9.9.3 Optimal Fit or Transferability?
    • 9.9.4 Reporting of Selected Models
    • 9.9.5 Conclusion

10 How to Work with MFP

  • 10.1 Introduction
  • 10.2 The Dataset
  • 10.3 Univariate Analyses
  • 10.4 MFP Analysis
  • 10.5 Model Criticism
    • 10.5.1 Function Plots
    • 10.5.2 Residuals and Lack of Fit
    • 10.5.3 Robustness Transformation and Subject-Matter Knowledge
    • 10.5.4 Diagnostic Plot for Influential Observations
    • 10.5.5 Refined Model
    • 10.5.6 Interactions
  • 10.6 Stability Analysis
  • 10.7 Final Model
  • 10.8 Issues to be Aware of
    • 10.8.1 Selecting the Main-Effects Model
    • 10.8.2 Further Comments on Stability
    • 10.8.3 Searching for Interactions
  • 10.9 Discussion

11 Special Topics Involving Fractional Polynomials

  • 11.1 Time-Varying Hazard Ratios in the Cox Model
    • 11.1.1 The Fractional Polynomial Time Procedure
    • 11.1.2 The MFP Time Procedure
    • 11.1.3 Prognostic Model with Time-Varying Effects for Patients with Breast Cancer
    • 11.1.4 Categorization of Survival Time
    • 11.1.5 Discussion
  • 11.2 Age-specific Reference Intervals
    • 11.2.1 Example: Fetal Growth
    • 11.2.2 Using FP Functions as Smoothers
    • 11.2.3 More Sophisticated Distributional Assumptions
    • 11.2.4 Discussion
  • 11.3 Other Topics
    • 11.3.1 Quantitative Risk Assessment in Developmental Toxicity Studies
    • 11.3.2 Model Uncertainty for Functions
    • 11.3.3 Relative Survival
    • 11.3.4 Approximating Smooth Functions
    • 11.3.5 Miscellaneous Applications

12 Epilogue

  • 12.1 Introduction
  • 12.2 Towards Recommendations for Practice
    • 12.2.1 Variable Selection Procedure
    • 12.2.2 Functional Form for Continuous Covariates
    • 12.2.3 Extreme Values or Influential Points
    • 12.2.4 Sensitivity Analysis
    • 12.2.5 Check for Model Stability
    • 12.2.6 Complexity of a Predictor
    • 12.2.7 Check for Interactions
  • 12.3 Omitted Topics and Future Directions
    • 12.3.1 Measurement Error in Covariates
    • 12.3.2 Meta-analysis
    • 12.3.3 Multi-level (Hierarchical) Models
    • 12.3.4 Missing Covariate Data
    • 12.3.5 Other Types of Model
  • 12.4 Conclusion

Appendix A: Data and Software Resources

  • A.1 Summaries of Datasets
  • A.2 Datasets used more than once
    • A.2.1 Research Body Fat
    • A.2.2 GBSG Breast Cancer
    • A.2.3 Educational Body Fat
    • A.2.4 Glioma
    • A.2.5 Prostate Cancer
    • A.2.6 Whitehall I
    • A.2.7 PBC
    • A.2.8 Oral Cancer
    • A.2.9 Kidney Cancer
  • A.3 Software

Appendix B: Glossary of Abbreviations

References
Index