Student's t-test

t-test is a parametric statistic based on Student’s t distribution. The test assumes that the underlying distribution of a collected data follows  a Student’s t distribution under the null hypothesis. The Student’s t distribution is applied for small samples, when the test would follow the normal distribution if the sample were large and the variance were known.

The test was developped by an Irish chemist Willian Sealy Gosset, while working for the Guinness brewly in Ireland. He used a pen name of Student to publish the results.

The assumptions underlying the Student’s test are based on a normal distribution.

Assumptions

Student’s t-test assumes an underlying normal distribution if the samples sizes were large. In general, for a one sample t-test, three assumptions need to be met:

§  T follows a standard normal distribution under the null hypothesis

§  (n-1)s2 follows a chi-squared distribution under the null hypothesis, with n-1 degrees of freedom  where n is a positive constant, corresponding to the sample size.

§  Z and s are independent

where T = Z/sZ,  a standardized scaling parameter, and s are functions of the data.

 In one one-sample t-test  , where    is the mean of the sample data, n is the sample size and  σ is  the population standard deviation.  When the population standard deviation is unknown,in the one-sample t-test, it is estimated by s, the sample standard deviation.

In the t-test comparing the means of two independent samples assume the following assumptions:

§  The observations in the samples are assummed independent.

 

Cases of T-Test

The Student’s t-test may be divided into four situations depending on the cases. Student’ t-tests  frequently cover the following cases:

§  A one-sample location test. The test is used to assess whether the mean of a normally distributed population has a value specified  under the null hypothesis.

§  A two sample location test: Under the null hypothesis, the test is used to assess whethere the means of two normally distributed populations are equal. The tests may be conducted  with the following asusmptions:

These tests also referred to as independent or unpaired samples, because the units or subjects within the two samples are unrelated or non-overlapping.

§  Paired or “repeated measures” t-test. The test is used to assess the differences between two response means, measured on the same statistical unit. Under the null hypothesis, the differences of the populations means is zero.

§  A test of whether the slope of a regression line differs significantly from 0.

 

Unpaired and paired two-sample t-tests

Two-sample t-tests for a difference in mean can be either unpaired or paired.

In general, paired t-tests are used for greater power when similar or equivalent test units can be paired or can form a block. Paired t-tests help reduce the effects of confounding factors.

When two separate sets of independet and identically distributed samples may be obtained, the unpaired, or "independent samples" t-test  may be used.

Calculations

Explicit expressions that can be used to carry out various t-tests are given in the AroniSmartStart section explaining the choice of a parametric test. The formula  used to estimate the t statistic under the null hypothesis is given in each case. To test the significance of each test, one, left or right, or two tailed test may be conducted..

Once a t value is determined, the p-value can be computed either from the AroniStat 1.0.1 and AroniSmartLytics distribution module or the AroniSmartLytics data analysis module. If the calculated p-value is below the statistical significance level chosen by the researcher of practictioner,  then the null hypothesis is rejected in favor of the alternative hypothesis. The parametric section of AroniSmartLytics shows the following calculations and allows testing  the following scenarios:

Independent one-sample t-test

In testing the null hypothesis that the population mean is equal to a specified value μ0, one uses the statistic

Description: Description: Description: Description:  t = \frac{\overline{x} - \mu_0}{s/\sqrt{n}},

where Description: Description: Description: Description: \overline{x} is the sample mean, s is the sample standard deviation and n is the sample size. The degrees of freedom used in this test is n − 1.

Independent two-sample t-test

·         Equal sample sizes, equal variance

o    This test is only used when both:

o    the two sample sizes (that is, the number, n, of participants of each group) are equal;

o    it can be assumed that the two distributions have the same variance.

o    Violations of these assumptions are discussed below.

.

·         Unequal sample sizes, equal variance

o    This test is used only when it can be assumed that the two distributions have the same variance. (When this assumption is violated, see below.) The t statistic to test whether the means are different can be calculated as follows:

·         Unequal sample sizes, unequal variance

This test also known as Welch’s test is used only when the two population variances are assumed to be different (the two sample sizes may or may not be equal) and hence must be estimated separately.  The degree of freedom calculation is called the Welch-Satterthwaite equation.  

Dependent t-test for paired samples

This test is used when the samples are dependent. Dependence may achiived in one of the two ways:

·         one sample that has been tested twice (repeated measures)

·          two samples that have been matched or "paired".

To test two dependent samples, the differences between all paired observations and their means are calculated.  The average (XD) and standard deviation (sD) of those differences are used in the equation. The constant μ0 is non-zero  in case the need is to assess whether the average of the differences  is significantly different from μ0. The degree of freedom used is n – 1, with n being the sample size.

Slope of a regression line

Let the model to be fit:

where xii = 1, ..., n are known, n is the sample size,  α and β are unknown, and εi are independent identically normally distributed random errors  with expected value 0 and unknown variance σ2, and Yii = 1, ..., n are observed.

Under the null hypothesis, the slope β is equal to some specified value β0, often taken to be 0, in which case the hypothesis is that x and y are unrelated.

 

Let

 = the least-quares estimates of the intercept and the slope, respectively.

= the standard errors of the least squares estimates

Then

 hast-distribution with n − 2 degrees of freedom if the null hypothesis is true

 

 may be written as follows:

  

The estimated equation may be written in terms of the residuals.

Let  be the residuals or estimated errors.

Then, the sum of squares of the residuals is:

 

And tscore is given by:

 

Alternatives to the t-test for location problems

For exactness, the t-test and Z-test require normality of the sample means, and the t-test additionally requires that the sample variance follows a scaled  and that the sample mean and sample variance be statistically independent. 

The t-test  is appropriate and provides exact closed form formula for testing for the equality of the means of two normal populations with unknown, but equal, variances. In case normality requirements may not be met, the t-test additionally requires that:

·         the sample scaled variance follows a chi-squared distribution

·         the sample mean and variance are statistically independent.

The Welch's version of the t-test is a nearly-exact test for the case where the data are normal but the variances may differ.

For moderately large samples and a one tailed test, the t is relatively robust to moderate violations of the normality assumption.

When the data belongs to more than two groups, one-way analysis of variance (ANOVA) may be used as general case

 

Multivariate testing

Student's t statistic can be generalized to the cases in which multiple responses need to be tested simultaneously. The Hotelling’s T-square statistic, named after the American Statistician Harold Hotelling, is used in these cases. The test handles cases where responses may be potentially correlated. Hotelling'sT 2 statistic follows a T 2 distribution, which in most cases is converted to an F distribution.

 One-sample T 2 test

For a one-sample multivariate test, the hypothesis is that the mean vector  is equal to a given vector (). The test statistic is defined as:

where n is the sample size,  is the vector of column means and S is a m x m sample covariance matrix.  

Two-sample T 2 test

For a two-sample multivariate test, the hypothesis is that the mean vectors. () of two samples are equal. The test statistic is defined as:

 

Description: Description: Description: Description: T^2 = \frac{n_1 n_2}{n_1+n_2}(\overline{\mathbf x}_1-\overline{\mathbf x}_2)'{\mathbf S_\text{pooled}}^{-1}(\overline{\mathbf x}_1-\overline{\mathbf x}_2).

 

©AroniSoft LLC, 2010-2011.