Access over 20 million homework & study documents
search

Revised Simulations

Content type

User Generated

Subject

Statistics

School

GWU

Type

Homework

Rating

Showing Page:
1/11
Statistics: Running Simulations for Five Methods
Given a linear model Y=β0+X∗⊤β+ε, εN (0,σ2).
The summation ϵ is an independent and identically distributed or an iid with normal random
variables having a mean of 0 and a variance of σ2. Arguably, this model comprises of three fixed
parameters that can be estimated: β0, β1, and σ2 unknown constants. In the above linear model,
there is a slight modification of the notation here. We have applied the notation to be Y since the
model will be fitted using a set of n data points ranging from i=1, 2… onwards.
1. Running simulations for five methods:
LASSO (Tibshirani; 1996)
Summary
true model is y=Xβ+ϵy=Xβ+ϵ
where ϵNn (0, I) ϵNn (0, I)
Example 1
Small signal. Lots of noise.
β=(115…,04085)Tβ=(1,...,115,0,...,04085)T
p=5000>n=1000p=5000>n=1000
Uncorrelated predictors:
o XiiidN(0,I)
Generating Data
library(MASS) # Package needed to generate correlated predictors
library(glmnet) # Package to fit ridge/lasso/elastic net models
# Generate data
set.seed(19875) # Set seed for reproducibility
n <- 1000 # Number of observations
p <- 5000 # Number of predictors included in model
real_p <- 15 # Number of true predictors
x <- matrix(rnorm(n*p), nrow=n, ncol=p)
y <- apply(x[,1:real_p], 1, sum) + rnorm(n)
# Split data into train (2/3) and test (1/3) sets
train_rows <- sample(1:n, .66*n)
x.train <- x[train_rows, ]
x.test <- x[-train_rows, ]
y.train <- y[train_rows]
y.test <- y[-train_rows]
Fit Models
# Fit models

Sign up to view the full document!

lock_open Sign Up
Showing Page:
2/11
# (For plots on left):
fit.lasso <- glmnet(x.train, y.train, family="gaussian", alpha=1)
fit.ridge <- glmnet(x.train, y.train, family="gaussian", alpha=0)
fit.elnet <- glmnet(x.train, y.train, family="gaussian", alpha=.5)
# 10-fold Cross validation for each alpha = 0, 0.1, ... , 0.9, 1.0
# (For plots on Right)
for (i in 0:10) {
assign(paste("fit", i, sep=""), cv.glmnet(x.train, y.train, type.measure="mse",
alpha=i/10,family="gaussian"))
}
# Plot solution paths:
par(mfrow=c(3,2))
# For plotting options, type '?plot.glmnet' in R console
plot(fit.lasso, xvar="lambda")
plot(fit10, main="LASSO")
plot(fit.ridge, xvar="lambda")
plot(fit0, main="Ridge")
plot(fit.elnet, xvar="lambda")
plot(fit5, main="Elastic Net")
Output

Sign up to view the full document!

lock_open Sign Up
Showing Page:
3/11

Sign up to view the full document!

lock_open Sign Up
End of Preview - Want to read all 11 pages?
Access Now

Unformatted Attachment Preview

Statistics: Running Simulations for Five Methods Given a linear model Y=β0∗+X∗⊤β∗+ε, ε∼N (0,σ∗2). The summation ϵ is an independent and identically distributed or an iid with normal random variables having a mean of 0 and a variance of σ2. Arguably, this model comprises of three fixed parameters that can be estimated: β0, β1, and σ2 unknown constants. In the above linear model, there is a slight modification of the notation here. We have applied the notation to be Y since the model will be fitted using a set of n data points ranging from i=1, 2… onwards. 1. Running simulations for five methods: • LASSO (Tibshirani; 1996) Summary • true model is y=Xβ+ϵy=Xβ+ϵ • where ϵ∼Nn (0, I) ϵ∼Nn (0, I) • • • o Example 1 Small signal. Lots of noise. β=(115…,04085)Tβ=(1,...,1⏟15,0,...,0⏟4085)T p=5000>n=1000p=5000>n=1000 Uncorrelated predictors: Xi∼iidN(0,I) Generating Data library(MASS) # Package needed to generate correlated predictors library(glmnet) # Package to fit ridge/lasso/elastic net models # Generate data set.seed(19875) # Set seed for reproducibility n <- 1000 # Number of observations p <- 5000 # Number of predictors included in model real_p <- 15 # Number of true predictors x <- matrix(rnorm(n*p), nrow=n, ncol=p) y <- apply(x[,1:real_p], 1, sum) + rnorm(n) # Split data into train (2/3) and test (1/3) sets train_rows <- sample(1:n, .66*n) x.train <- x[train_rows, ] x.test <- x[-train_rows, ] y.train <- y[train_rows] y.test ...
Purchase document to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Anonymous
I was struggling with this subject, and this helped me a ton!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4