Select Page

# Author: espin086

## Quantifying Multicollinearity

Multicollinearity is one of the more serious problems that can arise in a regression analysis or even in simple frequency and mean analysis with a causal interpretation. Regression analysis assumes a degree of independence between the explanatory factors. However, in practice many of the explanatory variables are correlated with each other. Imagine you are trying to understand what factors are leading causes of lung cancer. There could be a host of socioeconomic and demographic factors than can be correlated with behaviors such as smoking. It may be that people who smoke are more likely to 1) work in factories with carcinogens , 2) have poorer diets, 3) live closer to toxic waste dumps or under power lines. How can one separate out important causal risk factors for the development of lung cancer when all of these factors move together? This inseparability is the essence of what econometricians call multicollinearity problem. Fortunately, we don’t have to guess or speculate about the severity of this problem. One can use a regression analysis to precisely quantify the severity of this problem. This statistic is called the Variance Inflation Factor (VIF), and it is simply the R-squared of the regression of the suspected multicollinear variable with the other explanatory variables. Here is the formal mathematical syntax for the VIF in the context of a causal regression between smoking and lung cancer. This is...

## Programming in R: Modelling Investment Portfolios with Matrix Algebra

Investment portfolios are a collection of investments.  These investments can be anything including real estate, merchandise inventory, or a collection of businesses in a multinational corporation. However, the term is most commonly used to describe an investment in stocks and bonds in financial markets. Whatever the context may be, a portfolio is a collection of assets purchased at a certain price, held for a certain time, may provide income/cost during the holding period, and then are sold for profit/loss. Matrix algebra is a branch of mathematics that is often used to model investment portfolios.  The goal of this post is to introduce the used of matrix algebra via the programming language R to solve commonly asked questions about investment portfolios in stocks.   The expected return and the riskiness of the portfolio will be analyzed both analytically and computationally. 1. Vectors and Matrix Definition The following example is for 3 assets but could easily be extended to a many asset model representation of the portfolio problem. The following notation is used to represent the asset returns, their joint normal distributions, expected returns, variance of returns, and the covariance of returns. R represents the asset return for investments A, B, C. The returns are distributed as a multivariate normal, mu subscript i is the expected return for asset i, sigma squared subscript i is the variance or returns for asset i,...

## The Least Squares Assumptions

This post presents the ordinary least squares assumptions.  The assumptions are important in understanding when OLS will and will not give useful results.  The objective of the following post is to define the assumptions of ordinary least squares, another post will address methods to identify violations of these assumptions and provide potential solutions to dealing with violations of OLS assumptions. ASSUMPTION #1:  The conditional distribution of a given error term given a level of an independent variable x has a mean of zero. This assumption states that the OLS regression errors will on average be equal to zero.  This still allows for over and underestimations of Y, but that the OLS estimates will fluctuate around the true value of Y. ASSUMPTION #2: (X,Y)  for all n are independently and identically distributed. The second assumption assumes that the observations of X and Y are not systematically chosen in a way that is biased.  Typically randomly selected samples of X and Y are considered to be independent and identically distributed. This assumption is important when considering cases where the aim of the regression analysis is to look at the effects of a treatment X on an outcome Y.  If the treatment isn’t randomly assigned there is no guarantee that the outcome Y is caused by X.  Suppose that one is evaluating a program that provides job-training to prisoners and would like...

## Election Outcomes and Economic Performance

The following links contain the dataset and the STATA program used to generate the econometric estimates found in this post. Data: Election Outcomes and Economic Peformance (1996) STATA Program: Election Outcomes and Economic Performance This blog post replicates the analysis of the relationship between economic performance and election outcomes done by Fair (1996).  The regression model will be used to predict the likelihood that President Obama is reelected based on several important political and economic variables identified in the analysis. Outlined below are the variables, data, and regression model predictions.  These estimates suggest  that president Obama will lose the next election by a narrow margin receiving 49.4 % of the popular vote. VARIABLES The model will use several variables to predict the percentage of votes going to democratic presidential candidates (demvote) based on important political and economic factors.  The variables are: incum = takes on a value of 1 if a democrat is the incumbent and -1 if the incumbent is republican partyWH = takes on a value of 1 if a democrat is in the Whitehouse and -1 otherwise gnews = number of quarters,from first 15 quarters of incumbent presidency, where per capita output was above 2.9  inf = average annual inflation rate in the first 15 quarters of incumbent presidency There are also several interaction terms between the variable partyWH and gnews and inf that are used...

## Evaluation of 1Q-2011 GDP Forecast: Missed by only 1/3 of 1%

A few months ago I created an ARIMA forecast for 2010’s 4th quarter and 2011’s first quarter GDP numbers.   My forecast proved to be very accurate in predicting the growth of the economy for the last 6 months.  Please see my previous post for the 4th quarter evaluation. Evaluation of Actual vs. My Forecast of GDP for 4th Quarter 2010 http://espin086.wordpress.com/2011/02/03/evaluation-of-previous-posts-gdp-forecast-off-by-15-of-1/ My Forecast for 1st Quarter 2011 GDP http://espin086.wordpress.com/2011/01/16/gdp-forecast-for-2010q4-and-2011q1-box-jenkins-methodology-and-arima-forecast-model/http://espin086.wordpress.com/2011/01/16/gdp-forecast-for-2010q4-and-2011q1-box-jenkins-methodology-and-arima-forecast-model/ Actual 1st Quarter 2011 GDP http://www.bea.gov/newsreleases/national/gdp/gdpnewsrelease.htm In the 4th quarter my forecast for US GDP were off by 1/5 of 1%. The first quarter forecast was just as accurate, missing by only 1/3 of 1%. I forecasted that GDP in the first quarter of 2011 would be at 15,062 billion dollars and actual GDP came in at 15,006 billion dollars. Actual GDP grew by 1.9% in the first quarter of 2011 compared to estimate of 1.8%. Although these forecast are extremely accurate, I believe that a multivariate approach such as a Vector Autoregressive process would provide even greater accuracy.  Given the uncertainty of the current economic environment, accurate forecast can aid in better planning and risk mitigation for both governments and...