Personal Online Notedesk

Site Blog

view:  full / summary

Evaluating the system: Part 1

Posted on January 6, 2011 at 5:00 AM

Among the kinds of inferential statistics that are most useful to traders are t-tests. T-tests are useful for determining the probability that the mean or sum of any series of independent values (derived from a sampling process) is greater or less than some other such mean, is a fixed number, or falls within a certain band. For example, t-tests can reveal the probability that the total profits from a series of trades, each with its individual profit/loss figure, could be greater than some threshold as a result of chance or sampling. These tests are also useful for evaluating snipes of returns, e.g., the daily or monthly returns of a portfolio over a period of years. Finally, t-tests can help to set the boundaries of likely future performance (assuming no structural change in the market), making possible such statements as “the probability that the average profit will be between x and y in the future is greater than 95%”.


Although, for statistical reasons, the system developer should seek the largest sample possible, there is a trade-off between sample size and representativeness when dealing with the financial markets. Larger samples mean samples that go farther back in time, which is a problem because the market of years ago may be fundamentally different from the market of today-remember the S&P 500 in 1983? This means that a larger sample may sometimes be a less representative sample, or one that confounds several distinct populations of data! Therefore, keep in mind that, although the goal is to have the largest sample possible, it is equally important to try to make sure the period from which the sample is drawn is still representative of the market being predicted.


Let us look at how statistics are used when developing and evaluating a trading system. The examples below employ a system that was optimized on one sample of data (the m-sample data) and then run (tested) on another sample of data (the out-of-sample data).



Evaluating the system: Part 2

Posted on January 6, 2011 at 4:00 AM

The parameters of the trading model have already been set. A sample of data was drawn from a period in the past, in this specific case, 1/2/1990 through 3/31/2000; this is the out-of-sample or verification data. The model was then run on this out of- sample data, and it generated simulated trades. Forty-five trades were taken. This set of trades can itself be considered a sample of trades, one drawn from the population of all trades that the system took in the past or will take in the future; i.e., it is a sample of trades taken from the universe or population of all trades for that system. At this point, some inference must be made regarding the average profit per trade in the population as a whole, based on the sample of trades. Could the performance obtained in the sample be due to chance alone? To find the answer, the system must be statistically evaluated.



Trading Strategy:

  • System name: Directional RSI.
  • Alert when the Positive Directional Movement Index moving above the Negative Directional Movement Index.
  • Buy when the Relative Strength Index crosses above 50.
  • Sell when the Relative Strength Index crosses above 70.
  • Directional Movement Index parameter: 5.
  • Relative Strength Index parameter: 8.
  • Time frame: daily.
  • Initial equity: $ 10,000.00.


System function:


var Bar: integer;


for Bar := 1 to BarCount - 1 do


  if not LastPositionActive then


  if GetSeriesValue( Bar, DIPlusSeries( 5 ) ) > GetSeriesValue( Bar, DIMinusSeries( 5 ) ) then

    if CrossOverValue( Bar, RSISeries( #CLose, 8 ), 50 ) then

      BuyAtMarket( Bar + 1,  'Buy');




    if CrossOverValue( Bar, RSISeries( #CLose, 8 ), 70 ) then

      SellAtMarket( Bar + 1, LastPosition, 'Sell' );




Trade History:

Evaluating the system: Part 3a

Posted on January 6, 2011 at 3:00 AM

What if the distribution is not normal? An assumption in the t-test is that the underlying distribution of the data is normal. However, the distribution of profit/loss figures of a trading system is anything but normal, especially if there are stops and profit targets, which shows the distribution of profits and losses for trades taken by the Directional RSI system. Think of it for a moment. Rarely will a profit greater than the profit target occur. In fact, a lot of trades are going to bunch up with a profit equal to that of the profit target. Other trades are going to bunch up where the stop loss is set, with losses equal to that; and there will be trades that will fall somewhere in between, depending on the exit method. The shape of the distribution will not be that of the bell curve that describes the normal distribution. This is a violation of one of the assumptions underlying the t-test. In this case, however, the Central Limit Theorem comes to the rescue. It states that as the number of cases in the sample increases, the distribution of the sample mean approaches normal. By the time there is a sample size of 10, the errors resulting from the violation of the normality assumption will be small, and with sample sizes greater than 20 or 30, they will have little practical significance for inferences regarding the mean. Consequently, many statistics can be applied with reasonable assurance that the results will be meaningful, as long as the sample size is adequate, as was the case in the example above, which had an n of 45.


What if there is serial dependence? A more serious violation, which makes the above-described application of the t-test not quite cricket, is serial dependence, which is when cases constituting a sample (e.g., trades) are not statistically independent of one another. Trades come from a time series. When a series of trades that occurred over a given span of dates is used as a sample, it is not quite a random sample. A truly random sample would mean that the 100 trades were randomly taken from the period when the contract for the market started (e.g., 1983 for the S&P 500) to far into the future; such a sample would not only be less likely to suffer from serial dependence, but be more representative of the population from which it was drawn. However, when developing trading systems, sampling is usually done from one narrow point in time; consequently, each trade may be correlated with those adjacent to it and so would not be independent,


The practical effect of this statistically is to reduce the effective sample size. When trying to make inferences, if there is substantial serial dependence, it may be as if the sample contained only half or even one-fourth of the actual number of trades or data points observed. To top it off, the extent of serial dependence cannot definitively be determined. A rough “guestimate,” however, can be made. One such guestimate may be obtained by computing a simple lag/lead serial correlation: A correlation is computed between the profit and loss for Trade i and the profit and loss for Trade i + I, with i ranging from 1 to n - 1. In the example, the serial correlation (Durbin-Watson) was 1.552, not very high, but a lower number would be preferable.


Serial dependence is a serious problem. If there is a substantial amount of it, it would need to be compensated for by treating the sample as if it were smaller than it actually is. Another way to deal with the effect of serial dependence is to draw a random sample of trades from a larger sample of trades computed over a longer period of time. This would also tend to make the sample of trades more representative of the population.


What if the markets change? When developing trading systems, a third assumption of the t-test may be inadvertently violated. There are no precautions that can be taken to prevent it from happening or to compensate for its occurrence. The reason is that the population from which the development or verification sample was drawn may be different from the population from which future trades may be taken. This would happen if the market underwent some real structural or other change. As mentioned before, the population of trades of a system operating on the S&P 500 before 1983 would be different from the population after that year since, in 1983, the options and futures started trading on the S&P 500 and the market changed. This sort of thing can devastate any method of evaluating a trading system. No matter how much a system is back-tested, if the market changes before trading begin, the trades will not be taken from the same market for which the system was developed and tested; the system will fall apart. All systems, even currently profitable ones, will eventually succumb to market change. Regardless of the market, change is inevitable. It is just a question of when it will happen. Despite this grim fact, the use of statistics to evaluate systems remains essential, because if the market does not change substantially shortly after trading of the system commences, or if the change is not sufficient to grossly affect the system’s performance, then a reasonable estimate of expected probabilities and returns can be calculated.



Evaluating the system: Part 3b

Posted on January 6, 2011 at 2:00 AM

Normal Distribution test (Skewness-Kurtosis).


Serial Correlation test (Durbin-Watson).



Evaluating the system: Part 4

Posted on January 6, 2011 at 1:00 AM

Over the 10 years of data on which the system was optimized, there were 45 trades (n= 45). The mean or average trade yielded about $199.4936, and the trades were normally variable, with a sample standard deviation of around ±$150.7709. The expected standard deviation of the mean suggests that if samples of this kind were repeatedly taken, the mean would vary only about one-tenth as much as the individual trades, and that many of the samples would have mean profit abilities in the range of $199.4936 ± $22.47549.


The t-statistic for the best-performing system from the set of optimization runs was 8.876, which has a statistical significance of 0.000. This was a fairly strong result.


The serial correlation (Durbin-Watson) was 1.552, while the dL value is 1.48 and the dU 1.57. These results strongly suggest that there was no meaningful serial correlation between trades and that the statistical analyses discussed above are likely to be correct.


Sometimes we need to make sure that the system is really makes profit. To realize it we used the 25% percentiles as test value.