2021-04-23

Exponential Moving Average Calculations in SQL Server

Financial systems use a large variety of different analysis functions to identify trends and provide other derivations from periodic price or volume data.  One of these is the Exponential Moving Average (EMA), which provides a price average where more recent periods are weighted more significantly than older periods.  I'm going to demonstrate a way this can be calculated in SQL Server's T-SQL in a set-based way.

The EMA has only one parameter, which is a number of periods from which the weighting value is calculated.

The calculation is:

weight = 2 / ( number of periods + 1 )  [which will yield a value between 0 and 1]

EMA = CurrentPrice * weight + PreviousPeriodEMA * (1 - weight)

    = p0 * w + EMA1 * (1 - w)

where p0 is the price for the current period, EMA1 is the EMA value for the previous period, and w is the weighting factor.

As each EMA is calculated from the previous EMA, most solutions calculate this iteratively, but it can be calculated in a set-based manner.

The EMA is a sum of the price for each period, multiplied by a weighting based on the age of the period:

  • The current period price (p0) has been multiplied by the weight (w).
  • The previous period price (p1) has been multiplied by w and by (1-w), so w * (1-w) in total.
  • The period before that (p2) has been multiplied by w, then by (1-w), and then by (1-w) again, so w * (1-w) * (1-w) = w * (1-w)2 in total.
  • The nth period ago (pn) has been multiplied by w * (1-w)n.

This pattern continues, with each previous period being multiplied by an extra (1-w) factor all the way back until the weighting value becomes insignificantly small.  This pattern is similar to the way that compound interest calculations are performed.

We can therefore calculate the weighting value for a period of a given age by using the POWER() function, ready to be multiplied by the price for that period, and then a straight SUM() to wrap everything up.

There is an extra wrinkle.  At the start of the data, the Initial EMA value to be used should be a Simple Moving Average (SMA) of the price, which then provides the data for the initial EMA in the iterative calculation above.  However, it is considerably more straightforward to just run the EMA calculation over a large enough number of periods so that the weighting becomes small enough to be insignificant at the desired precision level (remembering to account for the cumulative rounding errors).  As long as there is sufficient data preceding the periods being calculated, this will give the correct result.

First, calculate the number of periods we need to consider before the weighting function rounds to zero.

DECLARE @weight decimal(18,9) = 2.0 / (1.0 + @periods) DECLARE @periodCount int = 10 WHILE ( @weight * POWER(1.0-@weight, @periodCount) > 0 ) BEGIN SET @periodCount = @periodCount + 1 END

I'm using a decimal(18,9) throughout these calculations (which can store 9 digits to the left of the decimal point and 9 to the right), although in this case I'm expecting a result which is more in the range of decimal(9,3).  The extra storage of the decimal(18,9) allows us to disregard the rounding errors for hundreds of thousands of operations (and therefore rows).

At this precision, an EMA period value of 10 will cause the weighting to be insignificantly small after around 95 periods.  Using a larger EMA period value will need more records and therefore be more costly to run, so it may be necessary to calculate the precision and therefore the number of records to a tighter tolerance for large period values to avoid an unnecessarily costly query that is fetching rows to then multiply them by zero.  Conversely, requiring the results to be calculated to a lesser precision would cause the weighting to become insignificant faster.

For the actual query, we need only INNER JOIN the table to itself so that each period has the data from the previous periods it needs, to calculate the weighting value based on the period age, then SUM() it all with a GROUP BY to collapse the record set back to one-row-per-period.

DECLARE @periodData TABLE ( period int, closePrice decimal(10,2) ) SELECT period, SUM(weightedClose) EMA, AVG(closePrice) closePrice FROM ( SELECT PeriodData.period, PeriodData.closePrice, PeriodData.period - EmaData.period periodAge, EmaData.closePrice * @weight * POWER(1.0-@weight, (PeriodData.period - EmaData.period)) weightedClose FROM @periodData PeriodData INNER JOIN @periodData EmaData ON EmaData.period BETWEEN (PeriodData.period - @periodCount) AND PeriodData.period ) DVTBL GROUP BY period ORDER BY period DESC

Note that we make no special case for the current period as the difference in period number will be zero, that will yield (1-w)0, which is 1.

Much of the calculation above is actually fairly static.  The number of periods to be considered, and the weighting for each period age value are constant for the given EMA period value, so can actually be completely pre-calculated and stored in a table.  This would eliminate a lot of the run time math and improve the speed.

As we said above, this calculation only yields the correct answer when there are enough previous periods, so we should also make certain that our calculation only yields a result in that case.  If a period doesn't have enough periods before it to be SUM()med, it will yield a value that will simply be too small, and therefore misleading.  Adding a HAVING clause to the query will ensure that only periods with a sufficiently large maximum period age are returned.

... ) DVTBL GROUP BY period HAVING MAX(periodAge) >= @periodCount ORDER BY period DESC

Hope this helps

No comments:

Post a Comment