New Webpage
The webpage for Biostat 778 during the 2007--2008 school year will be located at http://www.biostat.jhsph.edu/~rpeng/biostat778/.
The webpage for Biostat 778 during the 2007--2008 school year will be located at http://www.biostat.jhsph.edu/~rpeng/biostat778/.
You can download a dataset for use in Problem 3:
It is a comma-separated-value file, which you can read into R using 'read.csv()'.
In your log-linear Poisson regression model, you should treat the 'l1pm10' variable as the "x" variable and all of the other variables should be part of the "z" vector. Day of the week ('dow') should be treated as a categorical/factor variable and you should also include a smooth function of time with 4 x 14 degrees of freedom in the model. This can be done using natural splines in R, so
ns(date, 4 * 14)
constructs the natural spline basis matrix.
There are some missing values in the file and you can simply delete all cases with any missing values.
The final homework is due on the last day of class:
Datasets for problem 1:
Datasets for problem 3:
Dataset for problem 4: Download BetaBinomial.R
The datasets above can all be read into R using the function 'dget()'. In fact, 'dget()' can be used to read the dataset directly off website. So to download the dataset for problem 4, you can do
data <- dget("http://www.biostat778.org/files/BetaBinomial.R")
For problem 5, you can obtain data by running in R:
d <- read.csv("http://www.biostat.jhsph.edu/MCAPS/estimates-subset.csv")
est <- subset(d, outcome == "heart failure", c(beta, var))
The data frame should have two columns---"beta" and "var". You can use the hierarchical model to pool the betas to get an overall log-relative risk.
For more background, these estimates come from the paper Dominici F, et al. (2006) JAMA, 295 (10) 1127--1134. Compare the overall log-relative risks that you get with the ones in the paper. You will have to multiply your estimates by 1000 to make a fair comparison.
Homework 2 is available in PDF. It is due Wednesday November 22.
The reference for finding the observed information matrix when using the EM algorithm is:
Louis, Thomas A. (1982) "Finding the observed information matrix when using the EM algorithm ," Journal of the Royal Statistical Society, Series B: Methodological, 44, 226--233.
Meilisjson's method of calculating the observed information when the data are independent can be found in
Meilijson, I. (1989) "A fast improvement to the EM algorithm on its own terms," JRSS-B, 51 (1), 127--138.
Homework 1 is now available as PDF. It is due on Thursday November 9.
Biostatistics 778: Advanced Statistical Computing
Instructor:
Roger Peng
Teaching Assistant: Aristide Achy-Brou
Time:
Tuesday, Thursday, 8:30-9:50am
Room: W4007
Textbook:
Lange, K (1999). Numerical Analysis for Statisticians,
Springer.
Course weblog: http://www.biostat778.org/
Office hours: Please just send me an email (rpeng AT jhsph.edu) or drop by my office (E3535)
Overview
This course covers the theory and application of common algorithms used in statistical computing. Topics include root finding algorithms, optimization algorithms, numerical integration methods, Monte Carlo, Markov chain Monte Carlo, stochastic optimization, and bootstrapping.
Textbook and Reading
The textbook is Ken Lange's Numerical Analysis for Statisticians. We will not cover every topic in the book but most of the important ones. The textbook will be supplemented with handouts given in class.
Grading and Exams
Grading for the class will be based on homework assignments. There will be no exams.
Homework
Students are required to use LaTeX for typesetting and the R programming language for computing. There may be some assignments requiring the use of the C programming language.
There will be one homework assignment per 2 weeks. The homeworks will typically be a mix of reading, programming, and mathematical exercises.
Prerequisites
Students should have completed at least one year of doctoral-level statistics/biostatistics theory and methods courses. Prior programming experience would be useful.
This course will be taught again in October 2006 (2nd term). See you then!
I wrote up a short (incomplete) bibliography for the class.
Homework 6 is now available. This is the last homework and it's due on Friday December 23rd.
Apologies for the delay on this homework. It is available in PDF:
The data are available here also:
Homework 4 is available as PDF:
The data for Problem 2 are available here:
The data can be read directly off the website using 'dget()'.
A reference for Meilijson's method can be found at JSTOR. Also, Tom Louis' paper maybe useful for Problem 2.
There's no homework this week. Have a nice Thanksgiving!