strategy choices and impact: May 2015

Introduction

Standard deviation is a statistical term that measures the amount of variability or dispersion around an average. Standard deviation is also a measure of volatility. Generally speaking, dispersion is the difference between the actual value and the average value. The larger this dispersion or variability is, the higher the standard deviation. The smaller this dispersion or variability is, the lower the standard deviation. Chartists can use the standard deviation to measure expected risk and determine the significance of certain price movements.

CONCEPT OF Standard Deviation(SD) AND Standard Error of Mean

To study the entire population is time and resource intensive and not always feasible; therefore studies are often done on the sample; and data is summarized using descriptive statistics. These findings are further generalized to the larger, unobserved population using inferential statistics.

For example, in order to understand cholesterol levels of the population, cholesterol levels of study sample, drawn from same population are measured. The findings of this sample are best described by two parameters; mean and SD. Sample mean is average of these observations and denoted by X̄ . It is the center of distribution of observations (central tendency). Other parameter, SD tells us dispersion of individual observations about the mean. In other words, it characterizes typical distance of an observation from distribution center or middle value. If observations are more disperse, then there will be more variability. Thus, a low SD signifies less variability while high SD indicates more spread out of data. Mathematically, the SD is

s = sample SD; X - individual value; X̄ - sample mean; n = sample size.

Figure 1a shows cholesterol levels of population of 200 healthy individuals. Cholesterol of the most of individuals is between 190-210mg/dl, with a mean (μ) 200mg/dl and SD (s) 10mg/dl. A study in 10 individuals drawn from same population with cholesterol levels of 180, 200, 190, 180, 220, 190, 230, 190, 190, 180mg/dl gives X̄ = 195 mg/dl and SD (s) = 17.1 mg/dl.

Figure 1

If one draws three different groups of 10 individuals each, one will obtain three different mean and SD. (Adapted from Glantz, 2002)

These sample results are used to make inferences based on the premise that what is true for a randomly selected sample will be true, more or less, for the population from which the sample is chosen. This means, sample mean (X̄ ) estimates the true but unknown population mean (μ) and sample SD (s) estimates population SD (s). However, the precision with which sample results determine population parameters needs to be addressed. Thus, in above case X̄ = 195 mg/ dl estimates the population mean μ = 200 mg/dl. If other samples of 10 individuals are selected, because of intrinsic variability, it is unlikely that exactly same mean and SD [Figures [Figures1b,1b, c and d] would be observed; and therefore we may expect different estimate of population mean every time.

Figure 2 shows mean of 25 groups of 10 individuals each drawn from the population shown in Figure 1. If these 25 group means are treated as 25 observations, then as per the statistical “Central Limit Theorem” these observations will be normally distributed regardless of nature of original population. Mean of all these sample means will equal the mean of original population and standard deviation of all these sample means will be called as SEM as explained below.

Figure 2

This figure illustrates the mean of 25 groups of 10 individuals each drawn from the population of 200 individuals shown in the Figure 1. The means of three groups shown in Figure 1 are shown using circles filled with corresponding patterns

SEM is the standard deviation of mean of random samples drawn from the original population. Just as the sample SD (s) is an estimate of variability of observations, SEM is an estimate of variability of possible values of means of samples. As mean values are considered for calculation of SEM, it is expected that there will be less variability in the values of sample mean than in the original population. This shows that SEM is a measure of the precision with which sample mean X̄ estimate the population mean μ. The precision increases as the sample size increases [Figure 3].

Figure 3

The figure shows that the SEM is a function of the sample size

Thus, SEM quantifies uncertainty in the estimate of the mean.[13,14] Mathematically, the best estimate of SEM from single sample is

σ_M = SEM; s = SD of sample; n = sample size.

However, SEM by itself doesn’t convey much useful information. Its main function is to help construct confidence intervals (CI).[16] CI is the range of values that is believed to encompass the actual (“true”) population value. This true population value usually is not known, but can be estimated from an appropriately selected sample. If samples are drawn repeatedly from population and CI is constructed for every sample, then certain percentage of CIs can include the value of true population while certain percentage will not include that value. Wider CIs indicate lesser precision, while narrower ones indicate greater precision.[17]

CI is calculated for any desired degree of confidence by using sample size and variability (SD) of the sample, although 95% CIs are by far the most commonly used; indicating that the level of certainty to include true parameter value is 95%. CI for the true population mean μ is given by[12]

s = SD of sample; n = sample size; z (standardized score) is the value of the standard normal distribution with the specific level of confidence. For a 95% CI, Z = 1.96.

A 95% CI for population as per the first sample with mean and SD as 195 mg/dl and 17.1 mg/dl respectively will be 184.4 - 205.5 mg/dl; indicating that the interval includes true population mean m = 200 mg/dl with 95% confidence. In essence, a confidence interval is a range that we expect, with some level of confidence, to include the actual value of population mean.[17]

APPLICATION

As explained above, SD and SEM estimate quite different things. But in many articles, SEM and SD are used interchangeably and authors summarize their data with SEM as it makes data seem less variable and more representative. However, unlike SD which quantifies the variability, SEM quantifies uncertainty in estimate of the mean.[13] As readers are generally interested in knowing the variability within sample and not proximity of mean to the population mean, data should be precisely summarized with SD and not with SEM.[18,19]

The importance of SD in clinical settings is discussed below. In a atherosclerotic disease study, an investigator reports mean peak systolic velocity (PSV) in the carotid artery, a measure of stenosis, as 220cm/sec with SD of 10cm/ sec.[20] In this case it would be unusual to observe PSV less than 200 cm/sec or greater than 240cm/sec as 95% of population fall within 2SD of the mean, assuming that the population follows a normal distribution. Thus, there is a quick summary of the population and the range against which to compare the specific findings. Unfortunately, investigators are quite likely to report the PSV as 220cm/ sec ± 1.6 (SEM). If one confused the SEM with the SD, one would believe that the range of the population is narrow (216.8 to 223.2cm/sec), which is not the case.

Additionally, when two groups are compared (e.g. treatment and control groups), SD helps in visualizing the effect size, which is an index of how much difference is there between two groups.[12] Effect size gives an idea of magnitude of difference to help differentiate between statistical significance and practical importance. Effect size is determined by calculating the difference between the means divided by the pooled or average standard deviation from two groups. Generally, effect size of 0.8 or more is considered as a large effect and indicates that the means of two groups are separated by 0.8SD; effect size of 0.5 and 0.2, are considered as moderate or small respectively and indicate that the means of the two groups are separated by 0.5 and 0.2SD.[12] However, same can’t be interpreted with SEM. More importantly, SEMs do not provide direct visual impression of the effect size, if number of subjects differs between groups.

Exceptionally the SD as an index of variability may be a deceptive one in many experimental situations where biological variable differs grossly from a normal distribution (e.g. distribution of plasma creatinine, growth rate of tumor and plasma concentration of immune or inflammatory mediators). In these cases, because of the skewed distribution, SD will be an inflated measure of variability. In such cases, data can be presented using other measures of variability (e.g. mean absolute deviation and the interquartile range), or can be transformed (common transformations include the logarithmic, inverse, square root, and arc sine transformations).[17]

Some journal editors require their authors to use the SD and not the SEM. There are two reasons for this trend. First, the SEM is a function of the sample size, so it can be made smaller simply by increasing the sample size (n) [Figure 3]. Second, the interval (mean ± 2 SEM) will contain approximately 95% of the means of samples, but will never contain 95% of the observations on individuals; in the latter situation, mean ± 2 SD is needed.[21]

In general, the use of the SEM should be limited to inferential statistics where the author explicitly wants to inform the reader about the precision of the study, and how well the sample truly represents the entire population.[22] In graphs and figures too, use of SD is preferable to the SEM. Further, in every case, standard deviations should preferably be reported in parentheses [i.e., mean (SD)] than using mean ± SD expressions, as the latter specification can be confused with a 95% CI.[17]

Calculation

StockCharts.com calculates the standard deviation for a population, which assumes that the periods involved represent the whole data set, not a sample from a bigger data set. The calculation steps are as follows:

1. Calculate the average (mean) price for the number of periods or observations.

2. Determine each period's deviation (close less average price).

3. Square each period's deviation.

4. Sum the squared deviations.

5. Divide this sum by the number of observations.

6. The standard deviation is then equal to the square root of that number.

The spreadsheet above shows an example for a 10-period standard deviation using QQQQ data. Notice that the 10-period average is calculated after the 10th period and this average is applied to all 10 periods. Building a running standard deviation with this formula would be quite intensive. Excel has an easier way with the STDEVP formula. The table below shows the 10-period standard deviation using this formula. Here's an Excel Spreadsheet that shows the standard deviation calculations.

CONCLUSION

Proper understanding and use of fundamental statistics, such as SD and SEM and their application will allow more reliable analysis, interpretation, and communication of data to readers. Though, SEM and SD are used interchangeably to express the variability; they measure different parameters. SEM, an inferential parameter, quantifies uncertainty in the estimate of the mean; whereas SD is a descriptive parameter and quantifies the variability. As readers are generally interested in knowing variability within the sample, descriptive data should be precisely summarized with SD. Use of SEM should be limited to compute CI which measures the precision of population estimate.

strategy choices and impact

Saturday, May 30, 2015

Perfect Competition( Definition)

Standard Deviation (Volatility)

APPLICATION

CONCLUSION

About Me

Links

Previous Posts

Archives