Student's t-test
A t-test is a parametric statistic
based on Student’s t distribution. The test assumes that the underlying
distribution of a collected data follows a Student’s t distribution under the null
hypothesis. The Student’s t distribution is applied for small samples, when
the test would follow the normal distribution if the sample
were large and the variance were known.
The test was developped by an Irish chemist Willian Sealy
Gosset, while working for the Guinness brewly in Ireland. He used a pen name of
Student to publish the results.
The assumptions underlying the Student’s test are based on a normal distribution.
Assumptions
Student’s t-test assumes an underlying normal distribution if
the samples sizes were large. In general, for a one sample t-test, three
assumptions need to be met:
§ T follows a standard
normal distribution under the null hypothesis
§ (n-1)s2 follows
a chi-squared distribution under the null hypothesis, with n-1 degrees of freedom where n is a positive
constant, corresponding to the sample size.
§ Z and s are independent
where T = Z/s, Z, a standardized scaling parameter, and s are
functions of the data.
In one
one-sample t-test , where
is the mean of the sample data, n is the sample size and σ
is the population standard
deviation. When the population standard
deviation is unknown,in the one-sample t-test, it is estimated by s, the sample standard
deviation.
In the t-test comparing the means of two
independent samples assume the following assumptions:
§ The observations in the
samples are assummed independent.
Cases
of T-Test
The Student’s t-test may be divided into four situations
depending on the cases. Student’ t-tests frequently cover the following cases:
§ A one-sample location test. The test is used to assess
whether the mean of a normally distributed population has a value specified under the
null hypothesis.
§ A two sample location test: Under the null hypothesis, the test is used to
assess whethere the means of two normally distributed populations are equal.
The tests may be conducted with the
following asusmptions:
These tests also referred to as independent or unpaired
samples, because the units or subjects within the two samples are unrelated or
non-overlapping.
§ Paired or “repeated measures” t-test. The test is used to assess the
differences between two response means, measured on the same statistical unit.
Under the null hypothesis, the differences of the populations
means is zero.
§ A test of whether the slope of a regression line differs significantly from 0.
Unpaired
and paired two-sample t-tests
Two-sample t-tests for a difference in mean
can be either unpaired or paired.
In general, paired t-tests are used for greater power when
similar or equivalent test units can be paired or can form a block. Paired
t-tests help reduce the effects of confounding factors.
When two separate sets of independet and identically
distributed samples may be obtained, the unpaired, or "independent
samples" t-test may be used.
Calculations
Explicit expressions that can be used to carry out
various t-tests are given in the AroniSmartStart
section explaining the choice of a parametric test. The formula used to estimate the t statistic under
the null hypothesis is given in each case. To test the significance of each
test, one, left or right, or two tailed test may be conducted..
Once a t value is determined, the p-value
can be computed either from the AroniStat 1.0.1 and AroniSmartLytics distribution module or the AroniSmartLytics data analysis module. If the
calculated p-value is below the statistical significance level chosen by the
researcher of practictioner, then the null
hypothesis is rejected in favor of the alternative hypothesis. The parametric
section of AroniSmartLytics shows the following
calculations and allows testing the following scenarios:
Independent
one-sample t-test
In testing the null hypothesis that the population
mean is equal to a specified value μ0, one
uses the statistic
where is
the sample mean, s is the sample standard deviation and n is
the sample size. The degrees of freedom used in this test is n − 1.
Independent
two-sample t-test
·
Equal sample sizes, equal variance
o
This
test is only used when both:
o
the two
sample sizes (that is, the number, n, of participants of each
group) are equal;
o
it can be assumed that the two distributions have
the same variance.
o
Violations
of these assumptions are discussed below.
.
·
Unequal sample sizes, equal variance
o
This
test is used only when it can be assumed that the two distributions have the
same variance. (When this assumption is violated, see below.) The t statistic
to test whether the means are different can be calculated as follows:
·
Unequal sample sizes, unequal variance
This test also known as Welch’s
test is used only when the two
population variances are assumed to be different (the two sample sizes may or
may not be equal) and hence must be estimated separately. The degree of freedom calculation is called
the Welch-Satterthwaite equation.
Dependent t-test
for paired samples
This test is used when the samples are dependent. Dependence may
achiived in one of the two ways:
·
one
sample that has been tested twice (repeated measures)
·
two samples that have
been matched or "paired".
To test two dependent samples, the differences between all
paired observations and their means are calculated. The average (XD) and
standard deviation (sD) of those
differences are used in the equation. The constant μ0 is
non-zero in
case the need is to assess whether the average of the differences is significantly different from μ0.
The degree of freedom used is n – 1, with n being the
sample size.
Slope
of a regression line
Let the model to be fit:
where xi, i = 1, ..., n are
known, n is the sample size, α and β are
unknown, and εi are independent identically
normally distributed random errors with expected value 0 and unknown
variance σ2, and Yi, i = 1, ..., n are
observed.
Under the null hypothesis, the slope β is
equal to some specified value β0, often taken to be
0, in which case the hypothesis is that x and y are
unrelated.
Let
= the least-quares estimates of the intercept
and the slope, respectively.
= the standard errors of the
least squares estimates
Then
has a t-distribution
with n − 2 degrees of freedom if the null hypothesis
is true
may be written as
follows:
The estimated equation may be written in terms of the
residuals.
Let be the residuals or estimated errors.
Then, the sum of squares of the residuals is:
And tscore is given by:
Alternatives
to the t-test for location problems
For exactness, the t-test and
Z-test require normality of the sample means, and the t-test additionally requires that the sample variance follows
a scaled and that the sample
mean and sample variance be statistically independent.
The t-test is appropriate and provides exact
closed form formula for testing for the equality of the means of two normal
populations with unknown, but equal, variances. In case normality requirements
may not be met, the t-test additionally requires that:
·
the
sample scaled variance follows a chi-squared distribution
·
the sample mean and variance are statistically
independent.
The Welch's version of the t-test is a
nearly-exact test for the case where the
data are normal but the variances may differ.
For moderately large samples and a one tailed test, the t is relatively robust to
moderate violations of the normality assumption.
When the data belongs to more than two
groups, one-way analysis of variance (ANOVA) may be used as general case
Multivariate
testing
Student's t statistic can be generalized to
the cases in which multiple responses need to be tested simultaneously. The Hotelling’s T-square statistic, named after the American Statistician
Harold Hotelling, is used in these cases. The test handles cases where responses may be potentially correlated. Hotelling'sT 2 statistic
follows a T 2 distribution, which in most
cases is converted to an F distribution.
One-sample T 2 test
For a one-sample multivariate test, the hypothesis is that
the mean vector is equal to a given vector (
). The test statistic is
defined as:
where n is the
sample size, is the vector of column
means and S is a m x m sample covariance matrix.
Two-sample T 2 test
For a two-sample multivariate test, the hypothesis is that
the mean vectors. () of two samples are equal. The
test statistic is defined as:
©AroniSoft LLC, 2010-2011.