M.Com-1st Sememster-2023, Bhadrak Autonomous College, Statistics

M.Com-1st Sememster-2023, Bhadrak Autonomous College

Dr. Ramesh Chandra Das

Asst. Professor of Commerce

E-mail- rameshchandradas99@gmail.com

Questions: (1)

(a) Three fair coins are tossed 3000 times. Find the frequencies of the distribution of heads and tails and tabulate the results. Also calculate the Mean and Standard Deviation of the distribution.

(b) The probability that a Poisson variate takes a positive value is (1-e^-1.5). Find also the probability that x lies between -1.5 and +1.5

(c) Airline passengers arrive randomly and independently of the passenger careening facility of a major international airport. The mean arrival rate is on passenger per minute.

i. Calculate the probability of no arrival rate is on passenger per minute

ii. Compute the probability that three of fewer passengers arrive in a 1 minute period

(d) Among the 10,000 random digits in how many cases do we expect that digit three appears at the most 950 times. (The area under normal curve for Z = 1.667 in 0.4525 approximately)

Question (2)

a) A random sample of 400 items gives the mean 4.45 and the variance as 4. Can the sample be regarded as drawn from a normal population with mean 4? (At 5% level of significance)

b) In a city 350 out of 600 persons were found to be vegetarian. On the basis of this data, can we say that majority of the population in the city is vegetarian?

c) From a population size 240, a sample of 49 individuals is taken. From this sample, the mean is found to be 15.8 and standard deviation is 4.2. Construct at 98 percent confidence interval for the population mean.

d) In a random sample of 1000 items, the mean weight is 45 kg with standard deviation of 15 kg. Assuming the normality of the distribution. Find the number of items weight between 40 and 60 kgs.

Question (3)

(a) A book has 700 pages. The number of pages with various numbers of misprints is recorded below. At 5% significance level, are the misprints distributed according to Poisson law.

No of Misprints (x)	0	1	2	3	4	5	Total
No of pages with x misprint	616	70	10	2	1	1	700

(b) In a laboratory experiment two random sample gave the following results.

Sample	Size	Sample Mean	Sum square of deviation from mean
1	10	15	90
2	12	14	108

Test the equality of sample variance at 5% level of significance.

Group A	18	20	36	50	49	36	34	49	41
Group B	29	28	26	35	30	44	46

Examine at 5% level of significance, whether the two populations have the same variance.

(d) The numbers of automobile accidents per week in a certain community were as follows.

12, 8, 20, 2, 14, 10, 10, 15, 6, 9, 4

Are these frequencies in agreement with the belief that accident condition were the same during this 10 = week period?

Question (4)|

(a) What is multiple regression equations?

(b) What is the difference between coefficient of multiple determination and coefficient of multiple correlation

Sales (Rs Lakhs)

Ad. Outlay (Rs in Lakhs)

Salesman

100

120

150

130

Predict sales of advertising outlay in Rs 45 Lakhs and number of salesman are 15.

(d) Give notes on

Run test

Answer (1)

(a) The question follows the binomial distributions, as there is two chances (head or tail) in tossing a coin, so Probability of getting Head in a single toss i.e., P(H) =1/2 = 0.05

And probability of getting tail in a single toss, i.e., P (T) = ½ = 0.05

Here the question is given for recording the frequencies of Heads and tails of 3 fair coin tosses in 3000 times.

⇒ Head (H) can occur in 3 fair coin tosses are 0, 1, 2, 3

⇒ Tail (T) can occur in 3 fair coin tosses are 0, 1, 2,3

Here P ( 0) Head under binomial distribution is

In case of 0 Head = 0.1250

In case of 1 Head ‘’ = 0.3750

In case of 2 Head ‘’ = 0.3750

In case of 2 Head ‘’ = 0.1250

In case of 0 Tail = 0.1250

In case of 1 Tails ‘’ = 0.3750

In case of 2 Tails ‘’ = 0.3750

In case of 3 Tails ‘’ = 0.1250

No of Heads (X)

No of Tails (3-X)

Probability

Expected Frequency

0.1250

0.3750

0.1250

3000×0.1250 = 375

3000×0.3750 =1125

3000×0.3750 = 1125

3000×0.1250= 375

Total

3000

Mean = np = 3×1/2 = 1.5

Standard Deviation = = 0.75

(b)

(c)

(d)

Question-2

(a) Given Sample size n = 400, sample mean (X̄) = 4.45

Sample SD (s) = √4 = 2

Null Hypothesis H₀: µ = 4 (the sample has been drawn from the population mean µ = 4)

Alternative Hypothesis (H₁): µ ≠ 4 (Two tail test)

The sample has drawn from the population and sample sd is given not population sd.

The level of significance α = 0.05

Test statistics

Z = =

The critical value Z_0.05 (two tail) – 1.96

Here the calculated value (5) > Critical value (1.96), the null hypothesis is rejected, hence it can be concluded that sample is not drawn from the population mean.

(b) Given the sample size (n) = 600

P = sample proportion =

Assume that more 50% (0.5) people are vegetarian

Null Hypothesis H₀: P < 0.5 (proportion of vegetarian is less than 50%)

Alternative Hypothesis (H₁): µ > 0.5 (One tail test)

Standard error of Proportion (= 0.02012

Z =

The critical value of Z_0.05 (on tail test) = 1.645

Here the calculated value (4.140) > Critical value (1.645), the null hypothesis is rejected, hence it can be concluded that proportion of vegetarian is greater than 50%)

Sample mean (X̄) = 15.8

Sample standard deviation sd (s) = 4.2

µ = 15.8 ± 2.326 () = 15.8 ± 1.3956= 14.4044 – 17.1956

(d) The distribution is normal distribution, first need to calculate the probability. Then multiply the sample size with probability to find the number of items.

Question 3 (a)

No of misprints per pages (x)	No of pages (f)	fx
0	616	0
1	70	70
2	10	20
3	2	6
4	1	4
5	1	5
	700	105

Average number misprint per pages (X̄) =

Mean (m) = 0.15

The distribution follows the Poisson law, so the expected frequency can be calculated as follow

P (r=0) = 0.86071

No of misprints per pages (X)

Cumulated Probability

Absolute probability

Expected Frequency

0.86071

0.98981

0.99950

0.99998

1.0000

0.86071

00.98981-0.86071= 0.1291

0.99950-0.98981=0.00969

0.99998-0.99998=0.00048

1.0000-0.99998=0.00002

1.0000-1.0000=0.0000

700×0.86071=602.497

700×0.1291=90.37

700×0.00969=6.783

700×0.00048=0.336

700×2E-05=0.014

700×0=0

Total

1.000

700

Null Hypothesis (H₀): The distribution does not follow the Poisson distribution

Alternative Hypothesis (H₁): The distribution does follow the Poisson distribution

Observed frequency	Expected frequency	(O-E)²/E
616	602.497	0.302626
70	90.37	4.591534
10	6.783	1.525739
2	0.336	8.240762
1	0.014	69.44257
1	0	0
Total		84.10323

The critical value of ꭓ²_{, 0.05, df-5} = 11.07

Here the calculated value (84.10323) > Critical value (11.07), so the null hypothesis is rejected and study is significant at 0.05 level and can be interpreted that the distribution does follow the binomial distribution.

(b) Given the mean of sample 1: 15, sample size = 10

Given the mean of sample 2: 14, sample size = 12

Variance of sample 1 :

Variance of sample 2 :

Null Hypothesis (H0): δ²₁ = δ²₂

Alternative Hypothesis (H1): δ²₁≠ δ²₂

Critical value of F, 0.05, 9,11 = 2.8962

Here the calculated value (1.018) < crtical value (2.8962). So the study fails to reject the null hypothesis. That means study is not significant. It can be interpreted that two samples have equal varience.

(c)

Group A	Group B
18	29
20	28
36	26
50	35
49	30
36	44
34	46
49
41

δ1=11.905	δ2 = 8.021
n=9	n=7

Null Hypothesis (H0): δ²₁ = δ²₂

Alternative Hypothesis (H1): δ²₁≠ δ²₂

Critical value of F, 0.05, 8,6 = 4.1468

Here the calculated value (1.431) < crtical value (4.1468). So the study fails to reject the null hypothesis. That means study is not significant. It can be interpreted that two samples have equal varience.

(d) Null Hypothesis (H₀): Number of automobile Accidents per week is not equal

Alternative Hypothesis (H₁): Number of automobile Accidents per week is equal

Observed frequency	Expected frequency	(O-E)^2/E
12	10	0.4
8	10	0.4
20	10	10
2	10	6.4
14	10	1.6
10	10	0
10	10	0
15	10	2.5
6	10	1.6
9	10	0.1
4	10	3.6
Total		26.6

The critical value of ꭓ²_{, 0.05, df-10} = 18.31

Here the calculated value (26.6) > Critical value (18.31), so the null hypothesis is rejected and study is significant at 0.05 level and can be interpreted that the Number of automobile Accidents per week is equal.

Question 4

(a) What is Multiple regression Equation?

Multiple regression generally explains the relationship between multiple independent or predictor variables and one dependent or criterion variable. A dependent variable is modeled as a function of several independent variables with corresponding coefficients, along with the constant term. Multiple regression requires two or more predictor variables, and this is why it is called multiple regression.

The multiple regression equation explained above takes the following form:

Here, bi’s (i=1,2…n) are the regression coefficients, which represent the value at which the criterion variable changes when the predictor variable changes.

As an example, let’s say that the test score of a student in an exam will be dependent on various factors like his focus while attending the class, his intake of food before the exam and the amount of sleep he gets before the exam. Using this test one can estimate the appropriate relationship among these factors.

Multiple regression in SPSS is done by selecting “analyze” from the menu. Then, from analyze, select “regression,” and from regression select “linear.”

Assumptions:

· There should be proper specification of the model in multiple regression. This means that only relevant variables must be included in the model and the model should be reliable.

· Linearity must be assumed; the model should be linear in nature.

· Normality must be assumed in multiple regression. This means that in multiple regression, variables must have normal distribution.

· Homoscedasticity must be assumed; the variance is constant across all levels of the predicted variable.

There are certain terminologies that help in understanding multiple regression. These terminologies are as follows:

· The beta value is used in measuring how effectively the predictor variable influences the criterion variable, it is measured in terms of standard deviation.

· R, is the measure of association between the observed value and the predicted value of the criterion variable. R Square, or R², is the square of the measure of association which indicates the percent of overlap between the predictor variables and the criterion variable. Adjusted R² is an estimate of the R² if you used this model with a new data set.

(b) Coefficient of correlation vs. Coefficient of determination

Coefficient of Correlation (r):

· It measures the strength and the direction of a linear relationship between two variables (x and y) with possible values between -1 and 1.

· Positive Correlation: It indicates that two variables are in perfect harmony. They rise and fall together. +1 is perfect +ve correlation

· Negative Correlation: It indicates that two variables are perfect opposites. One goes up and other goes down. -1 is perfect -ve correlation

· No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0.

Coefficient of determination (r2):

· Coefficient of determination (r²) = Coefficient of Correlation (r) x Coefficient of Correlation (r)

· It provides percentage variation in y which is explained by all the x variables together

· Its value is (usually) between 0 and 1 and it indicates strength of Linear Regression model

· Higher the R² value, data points are less scattered so it is a good model. Lesser the R²value is more scattered the data points.

(c) Multiple Regression solution

(Y^ = a+ b₁X₁+ b₂X₂

	Y	X1	X2	X1Y	X2Y	X1X2	X1^2	X2^2	Y^2
	100	40	10	4000	1000	400	1600	100	10000
	80	30	10	2400	800	300	900	100	6400
	60	20	7	1200	420	140	400	49	3600
	120	50	15	6000	1800	750	2500	225	14400
	150	60	20	9000	3000	1200	3600	400	22500
	90	40	12	3600	1080	480	1600	144	8100
	70	20	8	1400	560	160	400	64	4900
	130	60	14	7800	1820	840	3600	196	16900
Total	800	320	96	35400	10480	4270	14600	1278	86800

Mean
Y	100
X1	40
X2	12

800 = 8a + 320b1 + 96b2----------------Eq (1)

35400 = 320a + 14600b1 + 4270b2 --------------- Eq (2)

10480 = 96a + 4270b1 + 1278b2

After solving the above equations, we find the

a = 17.336

b1 = 1.193

b2 = 2.912

Y = 17.336 + 1.193X1 + 2.912X2 -----Predicted Model

Given Adv. Outlay = 45

No. of salesmen = 15

So, the predicted sales (Y) = 17.336 +1.193 ×45 + 2.912×15 =

= 17.336+ 53.685+43.68 = 114.701

(d) Run test

A run is a sequence of similar or like events, items or symbols that is preceded by and followed by an event, item or symbol of a different type, or by none at all. Randomness of of the series is unlikely when there appear to be either too many or two few runs. In this case, a run test need to be carried out to determine the randomness. The Run Test when performed helps us to decide whether a sequence of events, items or symbol is the result of a random process.

Example of Runs

A data scientist carrying out a research interviewed 10 persons during a survey. We denote the genders of the poeple by M for men and W for women.

Assuming the respondents were chosen as follows:

Scenario 1

M M M M M F F F F F

Scenario 2

F M F M F M F M F M

Scenario 3

F F F M M F M M F F

· Scenario 1 has only 2 runs and therefore the scenario cannot be considered random because there are to few runs

· Scenario 2 has too many runs, 5 runs. And therefore, would not be considered as random

· Scenario 3 has 5 runs and therefore we need to perform a test to determine the randomness of the data.

…………………………………………….Jai Hind………………………………………………

Search This Blog

View

M.Com-1st Sememster-2023, Bhadrak Autonomous College, Statistics

Comments

Post a Comment

Popular posts from this blog

Application of AM, GM, HM, Median and Mode

Univariate Analysis

Earnings managment