Econometrics I

TA Christian Alemán

Session 8: Monday 14, March 2022

1: Measurement Error

  1. Measurement error in the dependent variable
  2. Measurement error in the independent variable
  3. IV Estimation

The case with no measurement error

Consider the following DGP:

$y^{*} = \beta_{0} +\beta_{1}x^{*} +e$

$e~N(0,2)$

% Housekeeping
clear
clc
close all


% Set seed for reproductibility
rng(123)

N=1000;
betas = [2 2]';
err = sqrt(2).*randn(N,1);
x2_star = 10+2.*randn(N,1);
X_star = [ones(N,1) x2_star];
y_star = X_star*betas +err;
ols_func = @(X,y) (X'*X)\(X'*y);


% The OLS estimation of the true model is
names = str2mat("beta_{0}", "beta_{1}");

[b] = mc_ols(y_star, X_star, names, 0, 1);
y_pred = X_star*b;

figure(1)
hold on
plot(x2_star,y_star,'kx','MarkerFaceColor','k')
plot(x2_star, y_pred,'m-','linewidth',1.1)
ylabel('$y_{i}$', 'interpreter','latex')
xlabel('$x_{2,i}$','interpreter','latex')
title('No measurement error')

1.1: Measurement error in the dependent variable

$y = y^{*}+\nu$

In this case OLS is still unbiased and consistent.

v =  2.*randn(N,1);
y = y_star +v;

% OLS estimation
[b] = mc_ols(y, X_star, names, 0, 1);
y_pred = X_star*b;

figure(2)
hold on
plot(x2_star,y,'kx','MarkerFaceColor','k')
plot(x2_star, y_pred,'m-','linewidth',1.1)
ylabel('$y_{i}$', 'interpreter','latex','fontsize',16)
xlabel('$x_{2,i}$','interpreter','latex','fontsize',16)
title('Measurement error in the dependent variable')

% See that with Montecarlo experiment:
no_reps = 1000;
K = size(X_star,2);
MC_betas = nan(no_reps,K);

 for i = 1:no_reps
    e_rep = sqrt(4).*randn(N,1);  %The error term of the regression
    y_star = X_star*betas +e_rep;
    v = 2.*randn(N,1);
    y_rep = y_star +v;
    MC_betas(i,:) = (ols_func(X_star,y_rep))';
 end

figure(3)
histfit(MC_betas(:,2))
xline(mean(MC_betas(:,2)),'k-','linewidth',1.2)
title('$\hat{beta}_{1}$','Interpreter','latex')
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.745898
Sigma-squared 6.189020

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       1.897       0.381       4.984       0.000
beta_{1}       2.010       0.037      54.125       0.000

*********************************************************

1.2: Measurement error in the explanatory variable: Attenuation bias

% $x = x^{*}+\nu$

Under attenuation bias then OLS estimator will be biased towards 0.

% Now suppose our dependent variable is measured with error
v =  8.*randn(N,1);
x2 = x2_star +v;

X = [ones(N,1) x2];

% OLS estimation
[b] = mc_ols(y_star, X, names, 0, 1);
y_pred = X*b;

figure(4)
hold on
plot(x2,y_star,'kx','MarkerFaceColor','k')
plot(x2, y_pred,'m-','linewidth',1.2)
ylabel('$y_{i}$', 'interpreter','latex','fontsize',16)
xlabel('$x_{2,i}$','interpreter','latex','fontsize',16)
xlim([2,18])
ylim([5,40])
title('Measurement error in the independent variable')

% See that with Montecarlo experiment:

K = size(X,2);
MC_betas = nan(no_reps,K);

 for i = 1:no_reps
    e_rep = sqrt(4).*randn(N,1);  %The error term of the regression
    y_star = X_star*betas +e_rep;
    v = 2.*randn(N,1);
    x2 = x2_star +v;       %Classical additive measurement error
    X = [ones(N,1) x2];
    MC_betas(i,:) = (ols_func(X,y_star))';
 end


figure(5)
hold on
histfit(MC_betas(:,2))
p1 = xline(mean(MC_betas(:,2)),'k-','linewidth',1.2);
p2 = xline(2,'m-','linewidth',1.2);
title('$\hat{beta}_{1}$','Interpreter','latex')
legend([p1,p2],{'Biased Estimate','True'})
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.049470
Sigma-squared 21.048797

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}      20.894       0.229      91.191       0.000
beta_{1}       0.125       0.017       7.207       0.000

*********************************************************

Dealing with measurement error

  1. IV
  2. 2SLS

Example Instrumental Variable Estimation.

n = 1000;

z1 = 3.*randn(n,1);
z2 = 2.*randn(n,1);

x = 3-2*z2+4*z1+randn(n,1);
y = 2+2*x +12*z2+randn(n,1);

% Case with no ommited variable

X = [ones(n,1) x z2];
names = str2mat("beta_{0}", "beta_{1}", "beta_{2}");
b = mc_ols(y, X, names, 0, 1);
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.998771
Sigma-squared 0.998597

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       2.043       0.033      62.688       0.000
beta_{1}       2.001       0.003     770.861       0.000
beta_{2}      12.004       0.017     688.998       0.000

*********************************************************

Suppose we only observe x and z1:

Case with ommited variable:

X = [ones(n,1)  x];
names = str2mat("beta_{0}", "beta_{1}");
[b] = mc_ols(y, X, names, 0, 1);
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.413590
Sigma-squared 475.999738

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       3.591       0.710       5.059       0.000
beta_{1}       1.423       0.054      26.531       0.000

*********************************************************

Solution

We can use z1 as an intrument:

  1. z1 is correlated with x
  2. z1 is not correlation to z2, thus not correlated with the residuals

Case 1: A very good instrument

disp('Check Correlation between x and z1')
disp(corr(x,z1))

Z = [ones(n,1) z1];

[b] = mc_olsIV(y,X,Z, names, 0, 1);
%b_IV= inv(Z'*X)*Z'*y
Check Correlation between x and z1
    0.9501


*********************************************************
OLS estimation results
Observations 1000
R-squared 0.354973
Sigma-squared 523.580490

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       1.920       0.724       2.652       0.008
beta_{1}       1.959       0.119      16.477       0.000

*********************************************************

Two Stage Least Squares

b_hat_x = inv(Z'*Z)*Z'*X;
X_hat = Z*b_hat_x;

[b] = mc_ols(y,X_hat, names, 0, 1);
%b_2SLS = inv(X_hat'*X_hat)*X_hat'*y
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.707395
Sigma-squared 237.512636

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       1.920       0.503       3.818       0.000
beta_{1}       1.959       0.040      49.120       0.000

*********************************************************

Case 2: A good instrument

z1= 0.9.*randn(n,1);
x = 3- 2*z2+4*z1+randn(n,1);
y = 2+ 2*x +12*z2+randn(n,1);

disp('Check Correlation between x and z1')
disp(corr(x,z1))


X = [ones(n,1)  x];
Z = [ones(n,1) z1];

[b] = mc_olsIV(y,X,Z, names, 0, 1);
%b_IV= inv(Z'*X)*Z'*y
Check Correlation between x and z1
    0.6684


*********************************************************
OLS estimation results
Observations 1000
R-squared -0.859892
Sigma-squared 553.530829

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       1.339       0.748       1.791       0.074
beta_{1}       2.130       0.412       5.170       0.000

*********************************************************

Two Stage Least Squares

b_hat_x = inv(Z'*Z)*Z'*X;
X_hat = Z*b_hat_x;

[b] = mc_ols(y,X_hat, names, 0, 1);
%b_2SLS = inv(X_hat'*X_hat)*X_hat'*y



dert_stop = 1;
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.194707
Sigma-squared 239.666920

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       1.339       0.645       2.076       0.038
beta_{1}       2.130       0.137      15.534       0.000

*********************************************************

Case 3: A weak instrument

z1= 0.07.*randn(n,1);
x = 3- 2*z2+4*z1+randn(n,1);
y = 2+ 2*x +12*z2+randn(n,1);

disp('Check Correlation between x and z1')
disp(corr(x,z1))


X = [ones(n,1)  x];
Z = [ones(n,1) z1];

[b] = mc_olsIV(y,X,Z, names, 0, 1);
%b_IV= inv(Z'*X)*Z'*y
Check Correlation between x and z1
    0.0707


*********************************************************
OLS estimation results
Observations 1000
R-squared -0.326477
Sigma-squared 319.804815

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       5.832       0.417      14.000       0.000
beta_{1}       0.632       4.026       0.157       0.875

*********************************************************

Two Stage Least Squares

b_hat_x = inv(Z'*Z)*Z'*X;
X_hat = Z*b_hat_x;

[b] = mc_ols(y,X_hat, names, 0, 1);
%b_2SLS = inv(X_hat'*X_hat)*X_hat'*y


dert_stop = 1;

Functions

%-------------------------------------------------------------

function [beta,sigma,r]= ols(y,x)
% Simple OLS regression
t = size(x,1);
beta = inv (x'*x) * x' * y;
sigma = (y-x*beta)'*(y-x*beta)/(t-rank(x));
r = y - x*beta;

end
%-------------------------------------------------------------


function prettyprint(mat, rlabels, clabels)
%{
This function prints matrices with row and column labels

Copyright (C) 2010 Michael Creel <michael.creel@uab.es>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation
%}
	% left pad the column labels
	a = size(rlabels,2);
	for i = 1:a
		fprintf(' ');
	end
	fprintf('  ');

	% print the column labels
    try
	clabels = ['        ';clabels]; % pad to 8 characters wide
    catch
        dert_stop = 1;
    end
	clabels = strjust(clabels,'right');

	k = size(mat,2);
	for i = 1:k
		fprintf('%s  ',clabels(i+1,:));
	end

	% now print the row labels and rows
	fprintf('\n');
	k = size(mat,1);
	for i = 1:k
		if ischar(rlabels(i,:))
			fprintf(rlabels(i,:));
		else
			fprintf('%i', rlabels(i,:));
		end
		fprintf('  %10.3f', mat(i,:));
		fprintf('\n');
	end
end


function result = eemult_mv(m,v)

	if not(ismatrix(m))
		error("eemult_mv: first arg must be a matrix");
    end

	if not(isvector(v))
		error("eemult_mv: second arg must be a vector");
    end

	[rm, cm] = size(m);
	[rv, cv] = size(v);

	if (rm == rv)
		v = kron(v, ones(1,cm));
		result = m .* v;
	elseif (cm == cv)
		v = kron(v, ones(rm, 1));
		result = m .* v;
	else
		error("eemult_mv: dimension of vector must match one of the dimensions of the matrix");
    end
end
%-------------------------------------------------------------

function [b, varb, e, ess] = mc_ols(y, x, names, silent, regularvc)
%{

Copyright (C) 2010 Michael Creel <michael.creel@uab.es>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation

Calculates ordinary LS estimator using the Huber-White heteroscedastic
consistent variance estimator.

 inputs:
 y: dep variable
 x: matrix of regressors
 names (optional) names of regressors
 silent (bool) default false. controls screen output
 regularvc (bool) default false. use normal varcov estimator, instead of het consistent (default)

 outputs:
 b: estimated coefficients
 varb: estimated covariance matrix of coefficients (Huber-White by default, ordinary OLS if requested with switch)
 e: ols residuals
 ess: sum of squared residuals

%}
	k = size(x,2);

	if nargin < 5 regularvc = 0; end
	if nargin < 4 silent = 0; end
	if (nargin < 3) || (size(names,1) ~= k)
		names = 1:k;
		names = names';
	end

	[b, sigsq, e] = ols(y,x);



	xx_inv = inv(x'*x);
	n = size(x,1);

	ess = e' * e;

	% Ordinary or het. consistent variance estimate
	if regularvc==1
		varb = xx_inv*sigsq;
    end

	seb = sqrt(diag(varb));
	t = b ./ seb;

	tss = y - mean(y);
	tss = tss' * tss;
	rsq = 1 - ess / tss;

	labels = char('estimate', 'st.err.', 't-stat.', 'p-value');
	if silent==0
		fprintf('\n*********************************************************\n');
		fprintf('OLS estimation results\n');
		fprintf('Observations %d\n',n);
		fprintf('R-squared %f\n',rsq);
		fprintf('Sigma-squared %f\n',sigsq);
		p = 2 - 2*tcdf(abs(t), n - k);
		results = [b, seb, t, p];
		if regularvc
			fprintf('\nResults (Ordinary var-cov estimator)\n\n');
		else
			fprintf('\nResults (Het. consistent var-cov estimator)\n\n');
		end
		prettyprint(results, names, labels);
		fprintf('\n*********************************************************\n');
	end

end
%-------------------------------------------------------------

function [b, varb, e, ess] = mc_olsIV(y, x, z, names, silent, regularvc)
%{

Copyright (C) 2010 Michael Creel <michael.creel@uab.es>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation

Calculates ordinary LS estimator using the Huber-White heteroscedastic
consistent variance estimator.

 inputs:
 y: dep variable
 x: matrix of regressors
 names (optional) names of regressors
 silent (bool) default false. controls screen output
 regularvc (bool) default false. use normal varcov estimator, instead of het consistent (default)

 outputs:
 b: estimated coefficients
 varb: estimated covariance matrix of coefficients (Huber-White by default, ordinary OLS if requested with switch)
 e: ols residuals
 ess: sum of squared residuals

%}
	k = size(x,2);

	if nargin < 6 regularvc = 0; end
	if nargin < 5 silent = 0; end
	if (nargin < 4) || (size(names,1) ~= k)
		names = 1:k;
		names = names';
	end

	%[b, sigsq, e] = ols(y,x);
    t = size(x,1);
    b = inv(z'*x)*z'*y;
    sigsq = (y-x*b)'*(y-x*b)/(t-rank(x));
    e = y - x*b;



	xx_inv = inv(z'*x);
	n = size(x,1);

	ess = e' * e;

	% Ordinary or het. consistent variance estimate
	if regularvc==1
		varb = xx_inv*sigsq;
	end

	seb = sqrt(diag(varb));
	t = b ./ seb;

	tss = y - mean(y);
	tss = tss' * tss;
	rsq = 1 - ess / tss;

	labels = char('estimate', 'st.err.', 't-stat.', 'p-value');
	if silent==0
		fprintf('\n*********************************************************\n');
		fprintf('OLS estimation results\n');
		fprintf('Observations %d\n',n);
		fprintf('R-squared %f\n',rsq);
		fprintf('Sigma-squared %f\n',sigsq);
		p = 2 - 2*tcdf(abs(t), n - k);
		results = [b, seb, t, p];
		if regularvc
			fprintf('\nResults (Ordinary var-cov estimator)\n\n');
		else
			fprintf('\nResults (Het. consistent var-cov estimator)\n\n');
		end
		prettyprint(results, names, labels);
		fprintf('\n*********************************************************\n');
	end

end
*********************************************************
OLS estimation results
Observations 1000
R-squared 0.904879
Sigma-squared 1.961686

Results (Ordinary var-cov estimator)

          estimate   st.err.   t-stat.   p-value  
beta_{0}       1.621       0.214       7.566       0.000
beta_{1}       2.038       0.021      97.437       0.000

*********************************************************