
Multilevel Model-building : A pragmatic approach to regression analysis based on fractional polynomial for modelling continuous variables.
Patrick Royston and Willi Sauerbreil
Table of Contents
Preface
1 Introduction
- 1.1 Real-Life Problems as Motivation for Model Building
- 1.1.1 Many Candidate Models
- 1.1.2 Functional Form for Continuous Predictors
- 1.1.3 Example 1: Continuous Response
- 1.1.4 Example 2: Multivariable Model for Survival Data
- 1.2 Issues in Modelling Continuous Predictors
- 1.2.1 Effects of Assumptions
- 1.2.2 Global versus Local Influence Models
- 1.2.3 Disadvantages of Fractional Polynomial Modelling
- 1.2.4 Controlling Model Complexity
- 1.3 Types of Regression Model Considered
- 1.3.1 Normal-Errors Regression
- 1.3.2 Logistic Regression
- 1.3.3 Cox Regression
- 1.3.4 Generalized Linear Models
- 1.3.5 Linear and Additive Predictors
- 1.4 Role of Residuals
- 1.4.1 Uses of Residuals
- 1.4.2 Graphical Analysis of Residuals
- 1.5 Role of Subject-Matter Knowledge in Model Development
- 1.6 Scope of Model Building in our Book
- 1.7 Modelling Preferences
- 1.7.1 General Issues
- 1.7.2 Criteria for a Good Model
- 1.7.3 Personal Preferences
- 1.8 General Notation
2 Selection of Variables
- 2.1 Introduction
- 2.2 Background
- 2.3 Preliminaries for a Multivariable Analysis
- 2.4 Aims of Multivariable Models
- 2.5 Prediction: Summary Statistics and Comparisons
- 2.6 Procedures for Selecting Variables
- 2.6.1 Strength of Predictors
- 2.6.2 Stepwise Procedures
- 2.6.3 All-Subsets Model Selection Using Information Criteria
- 2.6.4 Further Considerations
- 2.7 Comparison of Selection Strategies in Examples
- 2.7.1 Myeloma Study
- 2.7.2 Educational Body-Fat Data
- 2.7.3 Glioma Study
- 2.8 Selection and Shrinkage
- 2.8.1 Selection Bias
- 2.8.2 Simulation Study
- 2.8.3 Shrinkage to Correct for Selection Bias
- 2.8.4 Post-estimation Shrinkage
- 2.8.5 Reducing Selection Bias
- 2.8.6 Example
- 2.9 Discussion
- 2.9.1 Model Building in Small Datasets
- 2.9.2 Full, Pre-specified or Selected Model?
- 2.9.3 Comparison of Selection Procedures
- 2.9.4 Complexity, Stability and Interpretability
- 2.9.5 Conclusions and Outlook
3 Handling Categorical and Continuous Predictors
- 3.1 Introduction
- 3.2 Types of Predictor
- 3.2.1 Binary
- 3.2.2 Nominal
- 3.2.3 Ordinal, Counting, Continuous
- 3.2.4 Derived
- 3.3 Handling Ordinal Predictors
- 3.3.1 Coding Schemes
- 3.3.2 Effect of Coding Schemes on Variable Selection
- 3.4 Handling Counting and Continuous Predictors: Categorization
- 3.4.1 ‘Optimal’ Cutpoints: A Dangerous Analysis
- 3.4.2 Other Ways of Choosing a Cutpoint
- 3.5 Example: Issues in Model Building with Categorized Variables
- 3.5.1 One Ordinal Variable
- 3.5.2 Several Ordinal Variables
- 3.6 Handling Counting and Continuous Predictors: Functional Form
- 3.6.1 Beyond Linearity
- 3.6.2 Does Nonlinearity Matter?
- 3.6.3 Simple versus Complex Functions
- 3.6.4 Interpretability and Transportability
- 3.7 Empirical Curve Fitting
- 3.7.1 General Approaches to Smoothing
- 3.7.2 Critique of Local and Global Influence Models
- 3.8 Discussion
- 3.8.1 Sparse Categories
- 3.8.2 Choice of Coding Scheme
- 3.8.3 Categorizing Continuous Variables
- 3.8.4 Handling Continuous Variables
4 Fractional Polynomials for One Variable
- 4.1 Introduction
- 4.2 Background
- 4.2.1 Genesis
- 4.2.2 Types of Model
- 4.2.3 Relation to BoxTidwell and Exponential Functions
- 4.3 Definition and Notation
- 4.3.1 Fractional Polynomials
- 4.3.2 First Derivative
- 4.4 Characteristics
- 4.4.1 FP1 and FP2 Functions
- 4.4.2 Maximum or Minimum of a FP2 Function
- 4.5 Examples of Curve Shapes with FP1 and FP2 Functions
- 4.6 Choice of Powers
- 4.7 Choice of Origin
- 4.8 Model Fitting and Estimation
- 4.9 Inference
- 4.9.1 Hypothesis Testing
- 4.9.2 Interval Estimation
- 4.10 Function Selection Procedure
- 4.10.1 Choice of Default Function
- 4.10.2 Closed Test Procedure for Function Selection
- 4.10.3 Example
- 4.10.4 Sequential Procedure
- 4.10.5 Type I Error and Power of the Function Selection Procedure
- 4.11 Scaling and Centering
- 4.11.1 Computational Aspects
- 4.11.2 Examples
- 4.12 FP Powers as Approximations to Continuous Powers
- 4.12.1 BoxTidwell and Fractional Polynomial Models
- 4.12.2 Example
- 4.13 Presentation of Fractional Polynomial Functions
- 4.13.1 Graphical
- 4.13.2 Tabular
- 4.14 Worked Example
- 4.14.1 Details of all Fractional Polynomial Models
- 4.14.2 Function Selection
- 4.14.3 Details of the Fitted Model
- 4.14.4 Standard Error of a Fitted Value
- 4.14.5 Fitted Odds Ratio and its Confidence Interval
- 4.15 Modelling Covariates with a Spike at Zero
- 4.16 Power of Fractional Polynomial Analysis
- 4.16.1 Underlying Function Linear
- 4.16.2 Underlying Function FP1 or FP2
- 4.16.3 Comment
- 4.17 Discussion
5 Some Issues with Univariate Fractional Polynomial Models
- 5.1 Introduction
- 5.2 Susceptibility to Influential Covariate Observations
- 5.3 A Diagnostic Plot for Influential Points in FP Models
- 5.3.1 Example 1: Educational Body-Fat Data
- 5.3.2 Example 2: Primary Biliary Cirrhosis Data
- 5.4 Dependence on Choice of Origin
- 5.5 Improving Robustness by Preliminary Transformation
- 5.5.1 Example 1: Educational Body-Fat Data
- 5.5.2 Example 2: PBC Data
- 5.5.3 Practical Use of the Pre-transformation gδ(x)
- 5.6 Improving Fit by Preliminary Transformation
- 5.6.1 Lack of Fit of Fractional Polynomial Models
- 5.6.2 Negative Exponential Pre-transformation
- 5.7 Higher Order Fractional Polynomials
- 5.7.1 Example 1: Nerve Conduction Data
- 5.7.2 Example 2: Triceps Skinfold Thickness
- 5.8 When Fractional Polynomial Models are Unsuitable
- 5.8.1 Not all Curves are Fractional Polynomials
- 5.8.2 Example: Kidney Cancer
- 5.9 Discussion
6 MFP: Multivariable Model-Building with Fractional Polynomials
- 6.1 Introduction
- 6.2 Motivation
- 6.3 The MFP Algorithm
- 6.3.1 Remarks
- 6.3.2 Example
- 6.4 Presenting the Model
- 6.4.1 Parameter Estimates
- 6.4.2 Function Plots
- 6.4.3 Effect Estimates
- 6.5 Model Criticism
- 6.5.1 Function Plots
- 6.5.2 Graphical Analysis of Residuals
- 6.5.3 Assessing Fit by Adding More Complex Functions
- 6.5.4 Consistency with Subject-Matter Knowledge
- 6.6 Further Topics
- 6.6.1 Interval Estimation
- 6.6.2 Importance of the Nominal Significance Level
- 6.6.3 The Full MFP Model
- 6.6.4 A Single Predictor of Interest
- 6.6.5 Contribution of Individual Variables to the Model Fit
- 6.6.6 Predictive Value of Additional Variables
- 6.7 Further Examples
- 6.7.1 Example 1: Oral Cancer
- 6.7.2 Example 2: Diabetes
- 6.7.3 Example 3: Whitehall I
- 6.8 Simple Versus Complex Fractional Polynomial Models
- 6.8.1 Complexity and Modelling Aims
- 6.8.2 Example: GBSG Breast Cancer Data
- 6.9 Discussion
- 6.9.1 Philosophy of MFP
- 6.9.2 Function Complexity, Sample Size and Subject-Matter Knowledge
- 6.9.3 Improving Robustness by Preliminary Covariate Transformation
- 6.9.4 Conclusion and Future
7 Interactions
- 7.1 Introduction
- 7.2 Background
- 7.3 General Considerations
- 7.3.1 Effect of Type of Predictor
- 7.3.2 Power
- 7.3.3 Randomized Trials and Observational Studies
- 7.3.4 Predefined Hypothesis or Hypothesis Generation
- 7.3.5 Interactions Caused by Mismodelling Main Effects
- 7.3.6 The ‘TreatmentEffect’ Plot
- 7.3.7 Graphical Checks, Sensitivity and Stability Analyses
- 7.3.8 Cautious Interpretation is Essential
- 7.4 The MFPI Procedure
- 7.4.1 Model Simplifications
- 7.4.2 Check of the Results and Sensitivity Analysis
- 7.5 Example 1: Advanced Prostate Cancer
- 7.5.1 The Fitted Model
- 7.5.2 Check of the Interactions
- 7.5.3 Final Model
- 7.5.4 Further Comments and Interpretation
- 7.5.5 FP Model Simplification
- 7.6 Example 2: GBSG Breast Cancer Study
- 7.6.1 Oestrogen Receptor Positivity as a Predictive Factor
- 7.6.2 A Predefined Hypothesis: TamoxifenOestrogen Receptor Interaction
- 7.7 Categorization
- 7.7.1 Interaction with Categorized Variables
- 7.7.2 Example: GBSG Study
- 7.8 STEPP
- 7.9 Example 3: Comparison of STEPP with MFPI
- 7.9.1 Interaction in the Kidney Cancer Data
- 7.9.2 Stability Investigation
- 7.10 Comment on Type I Error of MFPI
- 7.11 Continuous-by-Continuous Interactions
- 7.11.1 Mismodelling May Induce Interaction
- 7.11.2 MFPIgen: An FP Procedure to Investigate Interactions
- 7.11.3 Examples of MFPIgen
- 7.11.4 Graphical Presentation of Continuous-by-Continuous Interactions
- 7.11.5 Summary
- 7.12 Multi-Category Variables
- 7.13 Discussion
8 Model Stability
- 8.1 Introduction
- 8.2 Background
- 8.3 Using the Bootstrap to Explore Model Stability
- 8.3.1 Selection of Variables Within a Bootstrap Sample
- 8.3.2 The Bootstrap Inclusion Frequency and the Importance of a Variable
- 8.4 Example 1: Glioma Data
- 8.5 Example 2: Educational Body-Fat Data
- 8.5.1 Effect of Influential Observations on Model Selection
- 8.6 Example 3: Breast Cancer Diagnosis
- 8.7 Model Stability for Functions
- 8.7.1 Summarizing Variation between Curves
- 8.7.2 Measures of Curve Instability
- 8.8 Example 4: GBSG Breast Cancer Data
- 8.8.1 Interdependencies among Selected Variables and Functions in Subsets
- 8.8.2 Plots of Functions
- 8.8.3 Instability Measures
- 8.8.4 Stability of Functions Depending on Other Variables Included
- 8.9 Discussion
- 8.9.1 Relationship between Inclusion Fractions
- 8.9.2 Stability of Functions
9 Some Comparisons of MFP with Splines
- 9.1 Introduction
- 9.2 Background
- 9.3 MVRS: A Procedure for Model Building with Regression Splines
- 9.3.1 Restricted Cubic Spline Functions
- 9.3.2 Function Selection Procedure for Restricted Cubic Splines
- 9.3.3 The MVRS Algorithm
- 9.4 MVSS: A Procedure for Model Building with Cubic Smoothing Splines
- 9.4.1 Cubic Smoothing Splines
- 9.4.2 Function Selection Procedure for Cubic Smoothing Splines
- 9.4.3 The MVSS Algorithm
- 9.5 Example 1: Boston Housing Data
- 9.5.1 Effect of Reducing the Sample Size
- 9.5.2 Comparing Predictors
- 9.6 Example 2: GBSG Breast Cancer Study
- 9.7 Example 3: Pima Indians
- 9.8 Example 4: PBC
- 9.9 Discussion
- 9.9.1 Splines in General
- 9.9.2 Complexity of Functions
- 9.9.3 Optimal Fit or Transferability?
- 9.9.4 Reporting of Selected Models
- 9.9.5 Conclusion
10 How to Work with MFP
- 10.1 Introduction
- 10.2 The Dataset
- 10.3 Univariate Analyses
- 10.4 MFP Analysis
- 10.5 Model Criticism
- 10.5.1 Function Plots
- 10.5.2 Residuals and Lack of Fit
- 10.5.3 Robustness Transformation and Subject-Matter Knowledge
- 10.5.4 Diagnostic Plot for Influential Observations
- 10.5.5 Refined Model
- 10.5.6 Interactions
- 10.6 Stability Analysis
- 10.7 Final Model
- 10.8 Issues to be Aware of
- 10.8.1 Selecting the Main-Effects Model
- 10.8.2 Further Comments on Stability
- 10.8.3 Searching for Interactions
- 10.9 Discussion
11 Special Topics Involving Fractional Polynomials
- 11.1 Time-Varying Hazard Ratios in the Cox Model
- 11.1.1 The Fractional Polynomial Time Procedure
- 11.1.2 The MFP Time Procedure
- 11.1.3 Prognostic Model with Time-Varying Effects for Patients with Breast Cancer
- 11.1.4 Categorization of Survival Time
- 11.1.5 Discussion
- 11.2 Age-specific Reference Intervals
- 11.2.1 Example: Fetal Growth
- 11.2.2 Using FP Functions as Smoothers
- 11.2.3 More Sophisticated Distributional Assumptions
- 11.2.4 Discussion
- 11.3 Other Topics
- 11.3.1 Quantitative Risk Assessment in Developmental Toxicity Studies
- 11.3.2 Model Uncertainty for Functions
- 11.3.3 Relative Survival
- 11.3.4 Approximating Smooth Functions
- 11.3.5 Miscellaneous Applications
12 Epilogue
- 12.1 Introduction
- 12.2 Towards Recommendations for Practice
- 12.2.1 Variable Selection Procedure
- 12.2.2 Functional Form for Continuous Covariates
- 12.2.3 Extreme Values or Influential Points
- 12.2.4 Sensitivity Analysis
- 12.2.5 Check for Model Stability
- 12.2.6 Complexity of a Predictor
- 12.2.7 Check for Interactions
- 12.3 Omitted Topics and Future Directions
- 12.3.1 Measurement Error in Covariates
- 12.3.2 Meta-analysis
- 12.3.3 Multi-level (Hierarchical) Models
- 12.3.4 Missing Covariate Data
- 12.3.5 Other Types of Model
- 12.4 Conclusion
Appendix A: Data and Software Resources
- A.1 Summaries of Datasets
- A.2 Datasets used more than once
- A.2.1 Research Body Fat
- A.2.2 GBSG Breast Cancer
- A.2.3 Educational Body Fat
- A.2.4 Glioma
- A.2.5 Prostate Cancer
- A.2.6 Whitehall I
- A.2.7 PBC
- A.2.8 Oral Cancer
- A.2.9 Kidney Cancer
- A.3 Software
Appendix B: Glossary of Abbreviations
References
Index
|