Share

Torturing the statistics until they confess

ANYONE who has spent any amount of time reading economic journals will have come across a number of instances where people with PhDs in econometrics and many years' modelling experience criticise complex research published by other people with PhDs in econometrics and many years' modelling experience.

Unfortunately, even far simpler representations of statistics can be misrepresented and criticised. This report deals with a few of the more basic issues, from the perspective of a user of economic statistics in South Africa.

There are two ways in which economic data tends to be presented to SA's investment community. The first is by pretty graphs, which show good "eyeball" relationships (only sometimes with any attempt to actually establish a statistical relationship between the variables presented). The second is a bombardment of seemingly impressive correlations between variables.

Playing with pictures

Charts showing seemingly good relationships between variables are the default method to present theories about the way economies and markets work. The two questions to ask when confronted with charts are: Why has the analyst chosen the timeframe that she/he has to present the data? And why has the analyst chosen to use particular Y-axis scales?

The first-mentioned in particular is subject to abuse. For example, I've seen a chart which "proves" that US Federal Reserve Bank cannot influence inflation, by illustrating high inflation rates in the US.

However, that chart starts at the first oil price shock in 1973 and ends before the first inflation hawk, Paul Volcker, took over at the Fed. It tells us absolutely nothing about how the Fed currently reacts to inflation.

Sometimes there are valid reasons to use timescales that are shorter than the available data - for example, if there's been a structural change in the variables being measured (indeed, that would be a good reason to exclude, for example, the above-mentioned time period when looking at how the Fed responds to inflation).

Often, however, it's the case simply that longer timescales either don't suit the analysis or do not "look" as good (which is why the back-up of a correlation coefficient is always useful).

For example, Chart 1 shows the relationship between the rand/US dollar exchange rate and the JP Morgan Emerging Markets Bond index spread since mid-2002. The chart looks great.

One might conclude that these two will continue to move in a step fashion and watch carefully what global emerging market bonds are doing to help predict the rand.

However, Chart 2, which shows the same relationship on a more extended timescale gives a very different picture.

There are long periods of time when there's very little relationship between the two and periods when the correlation is even negative rather than positive.

Furthermore, on these different Yaxis scales even the good relationship of the past two years does not appear quite as good as it does in the chart above (of course, the statistical relationship remains the same).

It so happens that in this instance there are logical reasons to explain the periods of breakdown in the relationship.

However, while the first chart would imply that one can simply expect the relationship to continue, the second chart would warn that one needs to be aware of the type of circumstances that could see the relationship break down.

The implication is very different.

Going beyond "eyeballing" to examine the statistics, brings the differences home even more powerfully. In Chart 1 the correlation coefficient is an impressive 0,96. In Chart 2, it's -0,14! Not only is the relationship far weaker than that presented in the first chart - indeed, far too weak to be considered statistically useful - it now presents itself as a negative rather than a positive correlation.

So, two charts of the same variables can lead to very different conclusions, simply depending on the timescales chosen.

Of course, sometimes timescales can be too long. Structural changes in economies and markets change relationships.

For example, in an environment of structurally lower inflation, a given GDP growth rate will be associated with a lower level of inflation than history may suggest.

Or one may be presented with data that one's told will return to a 40- or 50- year mean - even though there may have been significant structural changes over that time that will have led to a change in equilibrium value of the variable. (And it's sometimes claimed that a variable will revert to a mean when there have been no tests conducted to even establish whether or not it is, in fact, mean-reverting in the first place.) Correlation causes... conclusions?

A good correlation by itself means nothing. A well-known study in Sweden found a high correlation between stork sightings and the number of births in certain regions of the country. Do we conclude that storks bring babies?

Studies in Britain have found that asthma prevalence has risen as computer chip speed has improved. Do faster computers cause asthma?

Despite the fact that the first thing drummed into any economics student is that "correlation does not mean causation" there are still too many instances where a correlation is presented as some kind of "proof".

If the analyst cannot explain the relationship, one has to conclude that the correlation is spurious. That's data mining - simply looking for good statistical relationships without an underlying justification for the relationship. These analysts "use statistics much like a drunk man uses a lamp post, more for support than illumination".

Simply put, if one doesn't understand what the theoretical underpinning of a relationship is, there's little reason to be able to trust it to continue in the future.

There are often correlations that are good only because both variables are dependent on a third variable. A pertinent example is the relationship between the rand/US dollar exchange rate and that of the Australian dollar.

The correlation is good - but not because the rand is dependent on movements in Antipodean currencies (or vice versa).

Rather, both countries are commodity exporters and their currencies are therefore influenced by commodity prices; and both are high-yielding currencies in a global context and therefore are influenced by global risk appetite.

Incidentally, there are also reasons for the occasional good relationship between the rand and the EMBI spread.

Using the EMBI spread - or, perhaps, the Australian dollar - as a summary of, or proxy for, global factors that affect the rand is fine as long as it's understood that this is the case and that one's not being presented with some kind of "causal" link.

For causal relationships one should rather use - for example - commodity prices directly as an input into a rand model and understand why one's doing so.

Finally, models are sometimes presented with umpteen variables and amazingly high R-squareds. As more variables are added, that will raise the R-squared anyway - even if they make no economic sense in the equation.

Another problem - one that seems to be fairly common in some models presented in SA - is that some of the explanatory variables are themselves correlated (multicollinearity), which will bias the R-squared upwards.

For example, one might use the rand and inflation to explain interest rates - but there's obviously a relationship between the rand and inflation in the first instance. The textbooks tell us we can adjust for these factors; in practice it's not always clear-cut.

These issues aside, it's also important to understand what the purpose of the model is. Most are presented from the perspective "this is the best historical fit, therefore it's the best model to predict with".

First, only some of these are provided with an out-of-sample history to show that they are actually good at predicting.

Second, one has to watch for "garbage in, garbage out": a historical relationship may be good but how trustworthy, for example, is the commodity price forecast used as an input to a rand forecast model?

Finally, a model showing a high Rsquared may be excellent at explaining the past - but the more variables involved, the greater is the difficulty in using it to forecast, simply because one then has to provide forecasts for each of those variables to forecast the dependent variable. In practice, that leads to a high degree of forecast risk.

The bottom line is that it's very easy to manipulate charts and numbers.

However, it's important to know if the statistics back up the proposed relationship and equally important to know if the relationship makes economic sense in the first place.

The analyst should be able to explain why he's chosen the variables and the timeframe and what kind of factors could cause a change in the relationship.

Coming up with a number is easy; interpreting it meaningfully isn't.

Chantal Valentine

VALENTINE joined Coronation a year ago. She's responsible for formulating the macroeconomic view and fixed interest strategy, as well as contributing to asset allocation decisions. She studied economics and finance. Prior to joining the asset management industry, she gained broad-based experience and has worked in academia, a mining house, a large bank and stockbroking over the past 14 years. She was rated the top analyst in both economic trends (domestic) and fixed interest securities while on the sell-side.

We live in a world where facts and fiction get blurred
Who we choose to trust can have a profound impact on our lives. Join thousands of devoted South Africans who look to News24 to bring them news they can trust every day. As we celebrate 25 years, become a News24 subscriber as we strive to keep you informed, inspired and empowered.
Join News24 today
heading
description
username
Show Comments ()
Rand - Dollar
19.00
+0.1%
Rand - Pound
23.81
-0.0%
Rand - Euro
20.41
+0.0%
Rand - Aus dollar
12.44
-0.3%
Rand - Yen
0.12
+0.5%
Platinum
926.80
+0.1%
Palladium
989.00
-0.2%
Gold
2,346.08
+0.6%
Silver
27.71
+1.0%
Brent Crude
89.01
+1.1%
Top 40
68,996
+0.8%
All Share
74,906
+0.8%
Resource 10
62,861
+1.2%
Industrial 25
103,533
+1.0%
Financial 15
15,824
+0.1%
All JSE data delayed by at least 15 minutes Iress logo
Company Snapshot
Editorial feedback and complaints

Contact the public editor with feedback for our journalists, complaints, queries or suggestions about articles on News24.

LEARN MORE
Government tenders

Find public sector tender opportunities in South Africa here.

Government tenders
This portal provides access to information on all tenders made by all public sector organisations in all spheres of government.
Browse tenders