M.Com-1st Sememster-2023, Bhadrak Autonomous College, Statistics
M.Com-1st
Sememster-2023, Bhadrak Autonomous College
Dr. Ramesh
Chandra Das
Asst. Professor
of Commerce
E-mail-
rameshchandradas99@gmail.com
Questions:
(1)
(a)
Three
fair coins are tossed 3000 times. Find the frequencies of the distribution of
heads and tails and tabulate the results. Also calculate the Mean and Standard
Deviation of the distribution.
(b)
The
probability that a Poisson variate takes a positive value is (1-e-1.5).
Find also the probability that x lies between -1.5 and +1.5
(c)
Airline
passengers arrive randomly and independently of the passenger careening
facility of a major international
airport. The mean arrival rate is on passenger per minute.
i.
Calculate
the probability of no arrival rate is on passenger per minute
ii.
Compute
the probability that three of fewer passengers arrive in a 1 minute period
(d)
Among
the 10,000 random digits in how many cases do we expect that digit three
appears at the most 950 times. (The area under normal curve for Z = 1.667 in
0.4525 approximately)
Question (2)
a)
A
random sample of 400 items gives the mean 4.45 and the variance as 4. Can the
sample be regarded as drawn from a normal population with mean 4? (At 5% level
of significance)
b)
In
a city 350 out of 600 persons were found to be vegetarian. On the basis of this
data, can we say that majority of the population in the city is vegetarian?
c)
From
a population size 240, a sample of 49 individuals is taken. From this sample,
the mean is found to be 15.8 and standard deviation is 4.2. Construct at 98
percent confidence interval for the population mean.
d)
In
a random sample of 1000 items, the mean weight is 45 kg with standard deviation
of 15 kg. Assuming the normality of the distribution. Find the number of items
weight between 40 and 60 kgs.
Question (3)
(a)
A
book has 700 pages. The number of pages with various numbers of misprints is
recorded below. At 5% significance level, are the misprints distributed
according to Poisson law.
|
No of
Misprints (x) |
0 |
1 |
2 |
3 |
4 |
5 |
Total |
|
No of pages
with x misprint |
616 |
70 |
10 |
2 |
1 |
1 |
700 |
(b)
In
a laboratory experiment two random sample gave the following results.
|
Sample |
Size |
Sample
Mean |
Sum
square of deviation from mean |
|
1 |
10 |
15 |
90 |
|
2 |
12 |
14 |
108 |
Test the equality
of sample variance at 5% level of significance.
(c)
In
a test given to two groups of students drawn from two normal populations, the marks
obtained were as follows.
|
Group
A |
18 |
20 |
36 |
50 |
49 |
36 |
34 |
49 |
41 |
|
Group
B |
29 |
28 |
26 |
35 |
30 |
44 |
46 |
|
|
Examine at 5%
level of significance, whether the two populations have the same variance.
(d)
The
numbers of automobile accidents per week in a certain community were as
follows.
12,
8, 20, 2, 14, 10, 10, 15, 6, 9, 4
Are these
frequencies in agreement with the belief that accident condition were the same
during this 10 = week period?
Question (4)|
(a)
What
is multiple regression equations?
(b)
What
is the difference between coefficient of multiple determination and coefficient
of multiple correlation
(c)
Estimate
the equation that best describe the effect of advertising outlay and number of
salesman on sales
|
Sales (Rs Lakhs) |
Ad. Outlay (Rs in Lakhs) |
Salesman |
|
100 80 60 120 150 90 70 130 |
40 30 20 50 60 40 20 60 |
10 10 7 15 20 12 8 14 |
Predict sales of
advertising outlay in Rs 45 Lakhs and number of salesman are 15.
(d)
Give
notes on
Run test
Answer (1)
(a)
The
question follows the binomial distributions, as there is two chances (head or
tail) in tossing a coin, so Probability of getting Head in a single toss i.e.,
P(H) =1/2 = 0.05
And
probability of getting tail in a single toss, i.e., P (T) = ½ = 0.05
Here
the question is given for recording the frequencies of Heads and tails of 3
fair coin tosses in 3000 times.
⇒ Head (H) can occur in 3 fair coin tosses
are 0, 1, 2, 3
⇒ Tail (T) can occur in 3 fair coin tosses
are 0, 1, 2,3
Here P ( 0) Head under binomial
distribution is 
In case of 0 Head
= 0.1250
In case of
1 Head ‘’ = 0.3750
In case of
2 Head ‘’ = 0.3750
In case of
2 Head ‘’ = 0.1250
In case of 0 Tail
= 0.1250
In case of
1 Tails ‘’ = 0.3750
In case of
2 Tails ‘’ = 0.3750
In case of
3 Tails ‘’ = 0.1250
|
No
of Heads (X) |
No
of Tails (3-X) |
Probability |
Expected
Frequency |
|
0 1 2 3 |
3 2 1 0 |
0.1250 0.3750 0.3750 0.1250 |
3000×0.1250
= 375 3000×0.3750
=1125 3000×0.3750
= 1125 3000×0.1250=
375 |
|
Total |
|
=1
|
3000 |
Mean
= np = 3×1/2 = 1.5
Standard
Deviation =
=
0.75
(b)
(c)



(d)

Question-2
(a)
Given Sample size n = 400, sample mean (X̄) =
4.45
Sample
SD (s) = √4 = 2
Null
Hypothesis H0: µ = 4 (the sample has been drawn from the population
mean µ = 4)
Alternative
Hypothesis (H1): µ ≠ 4 (Two
tail test)
The
sample has drawn from the population and sample sd is given not population sd.
The
level of significance α = 0.05
Test
statistics
Z
=
= ![]()
The
critical value Z0.05 (two tail) – 1.96
Here
the calculated value (5) > Critical value (1.96), the null hypothesis is
rejected, hence it can be concluded that sample is not drawn from the
population mean.
(b)
Given
the sample size (n) = 600
P
= sample proportion = ![]()
Assume
that more 50% (0.5) people are vegetarian
Null
Hypothesis H0: P < 0.5 (proportion of vegetarian is less than 50%)
Alternative
Hypothesis (H1): µ > 0.5
(One tail test)
Standard
error of Proportion (
= 0.02012

Z
= ![]()
The
critical value of Z0.05 (on tail test) = 1.645
Here
the calculated value (4.140) > Critical value (1.645), the null hypothesis
is rejected, hence it can be concluded that proportion of vegetarian is greater
than 50%)
(c)
Given
sample size (n) = 49
Sample
mean (X̄) = 15.8
Sample
standard deviation sd (s) = 4.2

µ = 15.8 ± 2.326 (
) = 15.8 ± 1.3956=
14.4044 – 17.1956
(d) The
distribution is normal distribution, first need to calculate the probability.
Then multiply the sample size with probability to find the number of items.

Question 3 (a)
|
No of misprints per pages (x) |
No of pages (f) |
fx |
|
0 |
616 |
0 |
|
1 |
70 |
70 |
|
2 |
10 |
20 |
|
3 |
2 |
6 |
|
4 |
1 |
4 |
|
5 |
1 |
5 |
|
|
700 |
105 |
Average number misprint per pages (X̄) = ![]()
Mean (m) = 0.15
The distribution
follows the Poisson law, so the expected frequency can be calculated as follow
P (r=0) =
0.86071
|
No
of misprints per pages (X) |
Cumulated
Probability |
Absolute
probability |
Expected
Frequency |
||||||
|
0 1 2 3 4 5 |
0.98981 0.99950 0.99998 1.0000 1.0000 |
0.86071 00.98981-0.86071=
0.1291 0.99950-0.98981=0.00969 0.99998-0.99998=0.00048 1.0000-0.99998=0.00002 1.0000-1.0000=0.0000 |
|
||||||
|
Total |
|
1.000 |
700 |
Null Hypothesis (H0): The distribution
does not follow the Poisson distribution
Alternative Hypothesis (H1): The
distribution does follow the Poisson distribution
|
Observed frequency |
Expected frequency |
(O-E)2/E |
|
616 |
602.497 |
0.302626 |
|
70 |
90.37 |
4.591534 |
|
10 |
6.783 |
1.525739 |
|
2 |
0.336 |
8.240762 |
|
1 |
0.014 |
69.44257 |
|
1 |
0 |
0 |
|
Total |
84.10323 |
The critical value of ꭓ2 , 0.05, df-5
= 11.07
Here the calculated value (84.10323) > Critical
value (11.07), so the null hypothesis is rejected and study is significant at
0.05 level and can be interpreted that the distribution does follow the
binomial distribution.
(b) Given the
mean of sample 1: 15, sample size = 10
Given the mean of sample 2: 14, sample size = 12
Variance of sample 1 : ![]()
Variance of sample 2 : ![]()
Null Hypothesis (H0): δ2 1 = δ2 2
Alternative Hypothesis (H1): δ2 1 ≠
δ2 2

Critical value of F,
0.05, 9,11 = 2.8962
Here the calculated value (1.018) < crtical value
(2.8962). So the study fails to reject the null hypothesis. That means study is
not significant. It can be interpreted that two samples have equal varience.
(c)
|
Group A |
Group B |
|
18 |
29 |
|
20 |
28 |
|
36 |
26 |
|
50 |
35 |
|
49 |
30 |
|
36 |
44 |
|
34 |
46 |
|
49 |
|
|
41 |
|
|
|
|
|
δ1=11.905 |
δ2 = 8.021 |
|
n=9 |
n=7 |
Null Hypothesis (H0): δ2 1 = δ2 2
Alternative Hypothesis (H1): δ2 1 ≠
δ2 2

Critical value of F,
0.05, 8,6 = 4.1468
Here the calculated value (1.431) < crtical value
(4.1468). So the study fails to reject the null hypothesis. That means study is
not significant. It can be interpreted that two samples have equal varience.
(d) Null Hypothesis (H0):
Number of automobile Accidents per week is not equal
Alternative Hypothesis (H1): Number of
automobile Accidents per week is equal
|
Observed frequency |
Expected frequency |
(O-E)^2/E |
|
12 |
10 |
0.4 |
|
8 |
10 |
0.4 |
|
20 |
10 |
10 |
|
2 |
10 |
6.4 |
|
14 |
10 |
1.6 |
|
10 |
10 |
0 |
|
10 |
10 |
0 |
|
15 |
10 |
2.5 |
|
6 |
10 |
1.6 |
|
9 |
10 |
0.1 |
|
4 |
10 |
3.6 |
|
Total |
|
26.6 |
The critical value of ꭓ2 , 0.05, df-10
= 18.31
Here the calculated value (26.6) > Critical value
(18.31), so the null hypothesis is rejected and study is significant at 0.05
level and can be interpreted that the Number of automobile Accidents per week
is equal.
Question 4
(a) What
is Multiple regression Equation?
Multiple regression generally explains the relationship
between multiple independent or predictor variables and one dependent or
criterion variable. A dependent variable
is modeled as a function of several independent variables with corresponding
coefficients, along with the constant term.
Multiple regression requires two or more predictor variables, and this
is why it is called multiple regression.
The multiple regression equation explained above
takes the following form:
![]()
Here, bi’s (i=1,2…n) are the regression
coefficients, which represent the value at which the criterion variable changes
when the predictor variable changes.
As an example, let’s say that the test score of a
student in an exam will be dependent on various factors like his focus while
attending the class, his intake of food before the exam and the amount of sleep
he gets before the exam. Using this test
one can estimate the appropriate relationship among these factors.
Multiple regression in SPSS is done by selecting
“analyze” from the menu. Then, from
analyze, select “regression,” and from regression select “linear.”
Assumptions:
·
There should be proper specification of
the model in multiple regression. This
means that only relevant variables must be included in the model and the model should be reliable.
·
Linearity must be assumed; the model
should be linear in nature.
·
Normality must be assumed in multiple
regression. This means that in multiple
regression, variables must have normal distribution.
·
Homoscedasticity must be assumed; the
variance is constant across all levels of the predicted variable.
There are certain terminologies that help in
understanding multiple regression. These
terminologies are as follows:
·
The beta value is used in measuring how
effectively the predictor variable influences the criterion variable, it is
measured in terms of standard deviation.
·
R, is the measure of association between
the observed value and the predicted value of the criterion variable. R Square, or R2, is the square of the
measure of association which indicates the percent of overlap between the
predictor variables and the criterion variable.
Adjusted R2 is an estimate of the R2 if you used
this model with a new data set.
(b) Coefficient
of correlation vs. Coefficient of determination
Coefficient of Correlation (r):
·
It measures the strength and the
direction of a linear relationship between two variables (x and y) with
possible values between -1 and 1.
·
Positive Correlation: It indicates that
two variables are in perfect harmony. They rise and fall together. +1 is
perfect +ve correlation
·
Negative Correlation: It indicates that
two variables are perfect opposites. One goes up and other goes down. -1 is
perfect -ve correlation
·
No correlation: If there is no linear
correlation or a weak linear correlation, r is close to 0.
Coefficient
of determination (r2):
·
Coefficient of determination (r2)
= Coefficient of Correlation (r) x Coefficient of Correlation (r)
·
It provides percentage variation in y
which is explained by all the x variables together
·
Its value is (usually) between 0 and 1
and it indicates strength of Linear Regression model
·
Higher the R2 value, data
points are less scattered so it is a good model. Lesser the R2 value
is more scattered the data points.
(c) Multiple
Regression solution
(Y^ = a+ b1X1+ b2X2
|
|
Y |
X1 |
X2 |
X1Y |
X2Y |
X1X2 |
X1^2 |
X2^2 |
Y^2 |
|
|
100 |
40 |
10 |
4000 |
1000 |
400 |
1600 |
100 |
10000 |
|
|
80 |
30 |
10 |
2400 |
800 |
300 |
900 |
100 |
6400 |
|
|
60 |
20 |
7 |
1200 |
420 |
140 |
400 |
49 |
3600 |
|
|
120 |
50 |
15 |
6000 |
1800 |
750 |
2500 |
225 |
14400 |
|
|
150 |
60 |
20 |
9000 |
3000 |
1200 |
3600 |
400 |
22500 |
|
|
90 |
40 |
12 |
3600 |
1080 |
480 |
1600 |
144 |
8100 |
|
|
70 |
20 |
8 |
1400 |
560 |
160 |
400 |
64 |
4900 |
|
|
130 |
60 |
14 |
7800 |
1820 |
840 |
3600 |
196 |
16900 |
|
Total |
800 |
320 |
96 |
35400 |
10480 |
4270 |
14600 |
1278 |
86800 |
|
|
|
|
|
|
|
|
|
|
|
|
Mean |
|
|
|
|
|
|
|
|
|
|
Y |
100 |
|
|
|
|
|
|
|
|
|
X1 |
40 |
|
|
|
|
|
|
|
|
|
X2 |
12 |
|
|
|
|
|
|
|
|
800
= 8a + 320b1 + 96b2----------------Eq (1)
35400
= 320a + 14600b1 + 4270b2 --------------- Eq (2)
10480
= 96a + 4270b1 + 1278b2
After
solving the above equations, we find the
a
= 17.336
b1
= 1.193
b2
= 2.912
Y = 17.336 + 1.193X1 + 2.912X2 -----Predicted
Model
Given
Adv. Outlay = 45
No.
of salesmen = 15
So,
the predicted sales (Y) = 17.336 +1.193 ×45 + 2.912×15 =
= 17.336+ 53.685+43.68 = 114.701
(d)
Run test
A run is a
sequence of similar or like events, items or symbols that is preceded by and
followed by an event, item or symbol of a different type, or by none at all. Randomness of of the
series is unlikely when there appear to
be either too many or two few runs. In this case, a run test need to be carried
out to determine the randomness. The Run Test when performed helps us to decide
whether a sequence of events, items or symbol is the result of a random
process.
Example of Runs
A data scientist
carrying out a research interviewed 10 persons during a survey. We denote the
genders of the poeple by M for men and W for women.
Assuming the
respondents were chosen as follows:
Scenario 1
M M M M M F F F
F F
Scenario 2
F M F M F M F M
F M
Scenario 3
F F F M M F M M
F F
·
Scenario
1 has only 2 runs and therefore the scenario cannot be considered random
because there are to few runs
·
Scenario
2 has too many runs, 5 runs. And therefore, would not be considered as random
·
Scenario
3 has 5 runs and therefore we need to perform a test to determine the
randomness of the data.
…………………………………………….Jai
Hind………………………………………………
Comments
Post a Comment