September 07, 2007

New Webpage

The webpage for Biostat 778 during the 2007--2008 school year will be located at http://www.biostat.jhsph.edu/~rpeng/biostat778/.

December 11, 2006

Dataset for Homework 3

You can download a dataset for use in Problem 3:

It is a comma-separated-value file, which you can read into R using 'read.csv()'.

In your log-linear Poisson regression model, you should treat the 'l1pm10' variable as the "x" variable and  all of the other variables should be part of the "z" vector.  Day of the week ('dow') should be treated as a categorical/factor variable and you should also include a smooth function of time with 4 x 14 degrees of freedom in the model.  This can be done using natural splines in R, so

ns(date, 4 * 14)

constructs the natural spline basis matrix.

There are some missing values in the file and you can simply delete all cases with any missing values.

December 03, 2006

Homework 3

The final homework is due on the last day of class:

November 11, 2006

Homework 2 Datasets

Datasets for problem 1:

Datasets for problem 3:

Dataset for problem 4:  Download BetaBinomial.R

The datasets above can all be read into R using the function 'dget()'.  In fact, 'dget()' can be used to read the dataset directly off website.  So to download the dataset for problem 4, you can do

data <- dget("http://www.biostat778.org/files/BetaBinomial.R")

For problem 5, you can obtain data by running in R:

d <- read.csv("http://www.biostat.jhsph.edu/MCAPS/estimates-subset.csv")
est <- subset(d, outcome == "heart failure", c(beta, var))

The data frame should have two columns---"beta" and "var".  You can use the hierarchical model to pool the betas to get an overall log-relative risk.

For more background, these estimates come from the paper Dominici F, et al. (2006) JAMA, 295 (10) 1127--1134.  Compare the overall log-relative risks that you get with the ones in the paper.  You will have to multiply your estimates by 1000 to make a fair comparison.

Homework 2

Homework 2 is available in PDF.  It is due Wednesday November 22.

November 09, 2006

References

The reference for finding the observed information matrix when using the EM algorithm is:

Louis, Thomas A. (1982) "Finding the observed information matrix when using the EM algorithm ," Journal of the Royal Statistical Society, Series B: Methodological, 44, 226--233.

Meilisjson's method of calculating the observed information when the data are independent can be found in

Meilijson, I. (1989) "A fast improvement to the EM algorithm on its own terms," JRSS-B, 51 (1), 127--138.

November 01, 2006

Sample Word Lists for Anagrams

Here are some sample word lists (dictionaries) for the "Anagrams" program in Homework 1.

October 26, 2006

Homework 1

Homework 1 is now available as PDF.  It is due on Thursday November 9.

October 23, 2006

Syllabus

Biostatistics 778:  Advanced Statistical Computing

Instructor: Roger Peng
Teaching Assistant:  Aristide Achy-Brou

Time: Tuesday, Thursday, 8:30-9:50am
Room:  W4007
Textbook: Lange, K (1999).  Numerical Analysis for Statisticians, Springer.
Course weblog:  http://www.biostat778.org/
Office hours:  Please just send me an email (rpeng AT jhsph.edu) or drop by my office (E3535)


Overview

This course covers the theory and application of common algorithms used in statistical computing.  Topics include root finding algorithms, optimization algorithms, numerical integration methods, Monte Carlo, Markov chain Monte Carlo, stochastic optimization, and bootstrapping.

Textbook and Reading

The textbook is Ken Lange's Numerical Analysis for Statisticians. We will not cover every topic in the book but most of the important ones.  The textbook will be supplemented with handouts given in class.

Grading and Exams

Grading for the class will be based on homework assignments.  There will be no exams.

Homework

Students are required to use LaTeX for typesetting and the R programming language for computing.  There may be some assignments requiring the use of the C programming language. 

There will be one homework assignment per 2 weeks.  The homeworks will typically be a mix of reading, programming, and mathematical exercises.

Prerequisites

Students should have completed at least one year of doctoral-level statistics/biostatistics theory and methods courses.  Prior programming experience would be useful.

June 27, 2006

Until later....

This course will be taught again in October 2006 (2nd term).  See you then!

December 22, 2005

Partial Bibliography

I wrote up a short (incomplete) bibliography for the class.    

December 17, 2005

Homework 6

Homework 6 is now available.  This is the last homework and it's due on Friday December 23rd.

December 06, 2005

Homework 5

Apologies for the delay on this homework.  It is available in PDF:

The data are available here also:

November 29, 2005

Homework 4

Homework 4 is available as PDF:

The data for Problem 2 are available here:

The data can be read directly off the website using 'dget()'.

A reference for Meilijson's method can be found at JSTOR.  Also, Tom Louis' paper maybe useful for Problem 2.

November 22, 2005

No homework

There's no homework this week.  Have a nice Thanksgiving!