Econometrics I

TA Christian Alemán

Session 5: Friday 18, February 2022

Activity 1: Simulating Consistency of OLS

Consider the model

$y_{i} = \beta_{0}+\beta_{1}x_{i}+\epsilon_{i}$

$x\sim U[a,b]$

We will show consistency of $\beta_{1}^{OLS}$

$E(\hat{\beta}_{1}|X) = \beta_{1}$

Lets assume different distributions of $\epsilon$

  1. Normal : $\epsilon \sim \mathcal{N}(0,\sigma^{2})$
  2. Poisson: $\epsilon \sim Pois(\lambda)$
  3. Pareto: $\epsilon \sim Pareto(x_{m},\alpha)$

We will Simulate a population of 50000

Make 10000 draws of sample of n = [25,100,1000,10000];

% Housekeeping
clc
close all
clear all

% Set seed for reproductibility


% Options
opt.load_betas = 1;                 % 1:load betas to save time


% Parameters
par.N = 100000;                      % Population
par.n_ms = 10000;                    % Number of MC simulations
par.n_grid = [25,100,1000,10000];   % Sample size
par.beta0 = 1;
par.beta1 = 2;

% Parameters distributions of \epsilon:
par.sigma = 1;      % Standard deviation normal
par.lambda = 1;     % Poisson Rate
par.k = 1;          % Shape if >=1 var is infinity try 4
par.sigma_gp = 2;   % Scale
par.theta = 0;
par.ub = 100;
par.lb = 0;

% Generate population
vars.e_pop = NaN(par.N,3);
for i = 1:3
    if i == 1
        rng(123+i)
        vars.e_pop(:,1) = par.sigma.*randn(par.N,1);
        vars.e_pop(:,1) = vars.e_pop(:,1)-mean(vars.e_pop(:,1)); % demean
    elseif i == 2
        rng(123+i)
        vars.e_pop(:,2) = sim_poisson(par.N,par.lambda);
        vars.e_pop(:,2) = vars.e_pop(:,2)-mean(vars.e_pop(:,2));
    elseif i == 3
        rng(123+i)
        vars.e_pop(:,3) = gprnd(par.k,par.sigma_gp,par.theta,par.N,1);
        %vars.e_pop(:,3) = sim_gp(par.N,par.k,par.sigma_gp,par.theta);
        vars.e_pop(:,3) = vars.e_pop(:,3)-mean(vars.e_pop(:,3));
    end
end

vars.x_pop = unifrnd(par.lb,par.ub,par.N,1);
vars.y_pop = par.beta0 +par.beta1.*repmat(vars.x_pop,1,3) +vars.e_pop;

% Here we do the sampling and compute the distributions of parameters

% Initialize holder
vars.hbeta1 = NaN(par.n_ms,4,3);

if opt.load_betas ==1
    load('data')
    vars.hbeta1 = mc_data;
else
    % Generate NS number of samples from the universe.
    for i = 1:3  %  Distributions
        for ii = 1:4    % Sample sizes
            disp([i,ii])
            for j = 1:par.n_ms   % Number of MC simulations

                I_sample = randsample(par.N,par.n_grid(ii));
                x_sample = vars.x_pop(I_sample);
                y_sample = vars.y_pop(I_sample,i);
                % Run regression
                X = [ones(par.n_grid(ii),1),x_sample];
                Y = y_sample;
                hat_betas = X\Y;
                % Save betas
                vars.hbeta1(j,ii,i) = hat_betas(2);
            end
        end
    end

    mc_data = vars.hbeta1;
    save('data','mc_data')
    %save('data_inc','data_income')
end

vars.kdensity = NaN(100,4,3);            % Initialize Kernel Density
par.support  = NaN(100,4,3);
 for i = 1:3  %  Distributions
        for ii = 1:4    % Sample sizes
        % Do kernel approximations to smooth histograms
        [vars.kdensity(:,ii,i),par.support(:,ii,i)] = ksdensity(vars.hbeta1(:,ii,i),'Kernel','normal');
        end
 end

figure(1)
hold on
p1 = plot(par.support(:,1,1),vars.kdensity(:,1,1),'r-','linewidth',1.2);
p2 = plot(par.support(:,2,1),vars.kdensity(:,2,1),'r--','linewidth',1.2);
p3 = plot(par.support(:,3,1),vars.kdensity(:,3,1),'r-.','linewidth',1.2);
p4 = plot(par.support(:,4,1),vars.kdensity(:,4,1),'r.','linewidth',1.2);
xlim([1.98,2.02])
legend([p1 p2 p3 p4],{'$n=25$','$n=100$','$n=1000$','$n=10000$'},'interpreter','latex')
title('Kernel Density Approximations, Normal Errors')
ylabel('Density')

figure(2)
hold on
p1 = plot(par.support(:,1,2),vars.kdensity(:,1,2),'r-','linewidth',1.2);
p2 = plot(par.support(:,2,2),vars.kdensity(:,2,2),'r--','linewidth',1.2);
p3 = plot(par.support(:,3,2),vars.kdensity(:,3,2),'r-.','linewidth',1.2);
p4 = plot(par.support(:,4,2),vars.kdensity(:,4,2),'r.','linewidth',1.2);
xlim([1.98,2.02])
legend([p1 p2 p3 p4],{'$n=25$','$n=100$','$n=1000$','$n=10000$'},'interpreter','latex')
title('Kernel Density Approximations, Poisson Errors')
ylabel('Density')

figure(3)
hold on
p1 = plot(par.support(:,1,3),vars.kdensity(:,1,3),'r-','linewidth',1.2);
p2 = plot(par.support(:,2,3),vars.kdensity(:,2,3),'r--','linewidth',1.2);
p3 = plot(par.support(:,3,3),vars.kdensity(:,3,3),'r-.','linewidth',1.2);
p4 = plot(par.support(:,4,3),vars.kdensity(:,4,3),'r.','linewidth',1.2);
xlim([-4,10])
legend([p1 p2 p3 p4],{'$n=25$','$n=100$','$n=1000$','$n=10000$'},'interpreter','latex')
title('Kernel Density Approximations, Pareto Errors')
ylabel('Density')

figure(4)
hold on
p4 = plot(par.support(1:30,4,3),vars.kdensity(1:30,4,3),'r-','linewidth',1.2);
legend([p4],{'$n=10000$'},'interpreter','latex')
title('Kernel Density Approximations, Pareto Errors')
ylabel('Density')

Activity 1.1: Asymtotic Normality

Normality test for the last case H0: x comes from a standard normal distribution

name = {'Normal:','Poisson:','Pareto:'};
for i = 1:3
    disp(name{i})
    test_cdf = makedist('normal','mu',mean(vars.hbeta1(:,4,i)),'sigma',std(vars.hbeta1(:,4,i)));
    %test_cdf = [vars.hbeta1(:,4,i),cdf('normal',vars.hbeta1(:,4,i),mean(vars.hbeta1(:,4,i)),std(vars.hbeta1(:,4,i)))];
    h = kstest(vars.hbeta1(:,4,i),'CDF',test_cdf); % Test if the data are from the hypothesized distribution.
if h==1
    disp('We reject the Null: Then distribution is not normal')
else
    disp('We CANNOT reject the Null: Then distribution is normal')
end
end
Normal:
We CANNOT reject the Null: Then distribution is normal
Poisson:
We CANNOT reject the Null: Then distribution is normal
Pareto:
We reject the Null: Then distribution is not normal

Activity 2: Testing Non Linear Constrains WALD test

Consider the following consumption function with different short- and long-run marginal propensities to consume (MPC).

$lnC_{t}=\beta_{1}+\beta_{2}lnY_{t}+\beta_{3}lnC_{t-1}+\epsilon_{t}$

  1. $C_{t}$ is Consumption at $t$ (in real USD)
  2. $Y_{t}$ is Disposable Income at $t$ (in real USD)
  1. $\beta_{2}: Short Run MPC$
  2. $\frac{\beta_{2}}{1-\beta_{3}}=\gamma: Long Run MPC$

We are interested in knowing whether $\gamma=1$

load('data_inc2')
%{
1: C0
2: C1
3: Y
%}
data_income = log(data_income);
n = size(data_income,1);
K = 3;
X = [ones(n,1),data_income(:,3),data_income(:,2)];
y = data_income(:,1);
% Compute OLS
betas = X\y;
hgamma = betas(2)/(1-betas(3));

Param = {'beta 1';'beta 2';'beta 3';'gamma'};


hat_betas = [betas;hgamma];
tab_B = table(Param,hat_betas);
disp('OLS Estimates:')
disp(tab_B)
OLS Estimates:
      Param       hat_betas
    __________    _________

    {'beta 1'}    0.0031416
    {'beta 2'}     0.074958
    {'beta 3'}      0.92462
    {'gamma' }      0.99446

Predict errors

he = y-X*betas;
hsigma = (he'*he)/(n-K);
% Compute Asymtotic VarCovar
Avar_beta = hsigma.*(inv(X'*X));
AVCV = sqrt((diag(Avar_beta)));     % Variance Covariance

% Compute the analytical derivatives:
% Derivative: \partial beta2/(1-beta3)\partial(\beta)
grad = [0;1/(1-betas(3));betas(2)/(1-betas(3))^(2)];
% Compute the asymtotic Variance
Avar_Cbeta = grad'*Avar_beta*grad;
% Since we are testing one restriction we compute the z-score:
% H0: gamma = 1
z = (hgamma-1)./sqrt(Avar_Cbeta);
disp(['Z-Score = ',num2str(z)])
disp('We cannot reject \gamma=1')
Z-Score = -0.33863
We cannot reject \gamma=1

Activity 3: Bootstrap standard errors

3.1 Non-Parametric Bootstrap

  1. Generate B samples with replacement of pairs $(y{i},x_{i})$
  2. Estimate the bootstrap $\hat{\beta}$ by fitting the model
  3. Compute the standard errors
par.n = par.n_grid(3);      % Sample size
par.K = 2;
vars.x =1+2*randn(par.n,2) ;
vars.x(:,1) =4+vars.x(:,1);
vars.e =2*randn(par.n,1) ;
vars.y = vars.x*ones(2 ,1) + vars.e ;
if par.K ==3
    vars.x = [ones(par.n,1),vars.x];
    vars.y = par.beta0+vars.x(:,2:end)*ones(2 ,1) + vars.e ;
end



vars.hbeta = vars.x\vars.y ;
vars.ehat = vars.y-vars.x*vars.hbeta;
par.B = 1000;%

hbeta_sample =NaN(par.B,par.K);
for i = 1:par.B

    I_sample = ceil(par.n*rand(1,par.n));
    ysample = vars.y(I_sample);
    xsample(:,1) = vars.x(I_sample,1);
    xsample(:,2) = vars.x(I_sample,2);

    hbeta_sample(i,:)  = (xsample\ysample)';
end
diff = hbeta_sample-repmat(vars.hbeta',par.B,1);
bootVCV = diff'*diff/par.B;
OLSVCV = (vars.e'*vars.e)/par.n*inv(vars.x'*vars.x);

vars.SEbeta_hat_OLS = sqrt(diag(OLSVCV));
vars.SEbeta_hat_boot = sqrt(diag(bootVCV));

Param = {'beta 0';'beta 1'};
if  par.K ==3
    Param = {'beta 0';'beta 1','beta2'};
end

SE_OLS = vars.SEbeta_hat_OLS;
SE_Bootstrap = vars.SEbeta_hat_boot;
tab_SE = table(Param,SE_OLS,SE_Bootstrap);
disp('Non-Parametric Standard Errors:')
disp(tab_SE)
Non-Parametric Standard Errors:
      Param        SE_OLS     SE_Bootstrap
    __________    ________    ____________

    {'beta 0'}    0.012984      0.012932  
    {'beta 1'}    0.031073      0.030937  

3.2 Parametric Bootstrap

  1. Generate 2*B samples with replacement of $e_{i},x_{i}$ indepedently
  2. Construct values of $y$
  3. Estimate the bootstrap $\hat{\beta}$ by fitting the model
  4. Compute the standard errors
for i = 1:par.B

    I_sample = ceil(par.n*rand(1,par.n));
    II_sample = ceil(par.n*rand(1,par.n));
    esample = vars.ehat(I_sample);
    xsample(:,1) = vars.x(I_sample,1);
    xsample(:,2) = vars.x(I_sample,2);
    ysample = xsample*vars.hbeta+esample;
    hbeta_sample(i,:)  = (xsample\ysample)';
end
diff = hbeta_sample-repmat(vars.hbeta',par.B,1);
bootVCV = diff'*diff/par.B;
OLSVCV = (vars.e'*vars.e)/par.n*inv(vars.x'*vars.x);

vars.SEbeta_hat_OLS = sqrt(diag(OLSVCV));
vars.SEbeta_hat_boot = sqrt(diag(bootVCV));


SE_OLS = vars.SEbeta_hat_OLS;
SE_Bootstrap = vars.SEbeta_hat_boot;
tab_SE = table(Param,SE_OLS,SE_Bootstrap);
disp('Parametric Standard Errors:')
disp(tab_SE)


dert_stop = 1;
%-----------------------------------------------------------
function data = sim_gp(N,k,sigma,theta)
% This function simulates a vector data for the generalized Pareto using the inverse
% CDF method:
%{

Inputs:
    N     : Number of observations to be simulated
    k     : Index Shape \chi
    sigma : Scale
    theta : Threshold location \mu

Output:
    data  : NX1 vector of data

With shape ξ > 0  and location μ = σ / ξ
the GPD is equivalent to the Pareto distribution with scale x m = σ / ξ
 and shape α = 1 / ξ
%}
var_x = rand(N,1);
data = sigma./k.*((1-var_x).^(-k)-1)+theta;
end
%-----------------------------------------------------------
function data = sim_poisson(N,lambda)
% This function simulates a vector data for the generalized Poisson Distribution using the inverse
% CDF method:
%{

Inputs:
    N          : Number of observations to be simulated
    lambda     : Poisson rate

Output:
    data  : NX1 vector of data


%}
var_x = rand(N,1);
data = icdf('Poisson',var_x,lambda);
end
Parametric Standard Errors:
      Param        SE_OLS     SE_Bootstrap
    __________    ________    ____________

    {'beta 0'}    0.012984       0.01324  
    {'beta 1'}    0.031073      0.030953