#### Evaluating the system: Part 3a

Posted on January 6, 2011 at 3:00 AM |

What if the distribution is not normal? An assumption in the t-test is that the underlying distribution of the data is normal. However, the distribution of profit/loss figures of a trading system is anything but normal, especially if there are stops and profit targets, which shows the distribution of profits and losses for trades taken by the Directional RSI system. Think of it for a moment. Rarely will a profit greater than the profit target occur. In fact, a lot of trades are going to bunch up with a profit equal to that of the profit target. Other trades are going to bunch up where the stop loss is set, with losses equal to that; and there will be trades that will fall somewhere in between, depending on the exit method. The shape of the distribution will not be that of the bell curve that describes the normal distribution. This is a violation of one of the assumptions underlying the t-test. In this case, however, the Central Limit Theorem comes to the rescue. It states that as the number of cases in the sample increases, the distribution of the sample mean approaches normal. By the time there is a sample size of 10, the errors resulting from the violation of the normality assumption will be small, and with sample sizes greater than 20 or 30, they will have little practical significance for inferences regarding the mean. Consequently, many statistics can be applied with reasonable assurance that the results will be meaningful, as long as the sample size is adequate, as was the case in the example above, which had an n of 45.

What if there is serial dependence? A more serious violation, which makes the above-described application of the t-test not quite cricket, is serial dependence, which is when cases constituting a sample (e.g., trades) are not statistically independent of one another. Trades come from a time series. When a series of trades that occurred over a given span of dates is used as a sample, it is not quite a random sample. A truly random sample would mean that the 100 trades were randomly taken from the period when the contract for the market started (e.g., 1983 for the S&P 500) to far into the future; such a sample would not only be less likely to suffer from serial dependence, but be more representative of the population from which it was drawn. However, when developing trading systems, sampling is usually done from one narrow point in time; consequently, each trade may be correlated with those adjacent to it and so would not be independent,

The practical effect of this statistically is to reduce the effective sample size. When trying to make inferences, if there is substantial serial dependence, it may be as if the sample contained only half or even one-fourth of the actual number of trades or data points observed. To top it off, the extent of serial dependence cannot definitively be determined. A rough “guestimate,” however, can be made. One such guestimate may be obtained by computing a simple lag/lead serial correlation: A correlation is computed between the profit and loss for Trade i and the profit and loss for Trade i + I, with i ranging from 1 to n - 1. In the example, the serial correlation (Durbin-Watson) was 1.552, not very high, but a lower number would be preferable.

Serial dependence is a serious problem. If there is a substantial amount of it, it would need to be compensated for by treating the sample as if it were smaller than it actually is. Another way to deal with the effect of serial dependence is to draw a random sample of trades from a larger sample of trades computed over a longer period of time. This would also tend to make the sample of trades more representative of the population.

What if the markets change? When developing trading systems, a third assumption of the t-test may be inadvertently violated. There are no precautions that can be taken to prevent it from happening or to compensate for its occurrence. The reason is that the population from which the development or verification sample was drawn may be different from the population from which future trades may be taken. This would happen if the market underwent some real structural or other change. As mentioned before, the population of trades of a system operating on the S&P 500 before 1983 would be different from the population after that year since, in 1983, the options and futures started trading on the S&P 500 and the market changed. This sort of thing can devastate any method of evaluating a trading system. No matter how much a system is back-tested, if the market changes before trading begin, the trades will not be taken from the same market for which the system was developed and tested; the system will fall apart. All systems, even currently profitable ones, will eventually succumb to market change. Regardless of the market, change is inevitable. It is just a question of when it will happen. Despite this grim fact, the use of statistics to evaluate systems remains essential, because if the market does not change substantially shortly after trading of the system commences, or if the change is not sufficient to grossly affect the system’s performance, then a reasonable estimate of expected probabilities and returns can be calculated.

Categories: Articles