M.Com-1st Sememster-2023, Bhadrak Autonomous College, Statistics

 

M.Com-1st Sememster-2023, Bhadrak Autonomous College

Dr. Ramesh Chandra Das

Asst. Professor of Commerce

E-mail- rameshchandradas99@gmail.com

Questions: (1)

(a)    Three fair coins are tossed 3000 times. Find the frequencies of the distribution of heads and tails and tabulate the results. Also calculate the Mean and Standard Deviation of the distribution.

(b)   The probability that a Poisson variate takes a positive value is (1-e-1.5). Find also the probability that x lies between -1.5 and +1.5

(c)    Airline passengers arrive randomly and independently of the passenger careening facility of a   major international airport. The mean arrival rate is on passenger per minute.

                                            i.            Calculate the probability of no arrival rate is on passenger per minute

                                          ii.            Compute the probability that three of fewer passengers arrive in a 1 minute period

(d)   Among the 10,000 random digits in how many cases do we expect that digit three appears at the most 950 times. (The area under normal curve for Z = 1.667 in 0.4525 approximately)

Question (2)

a)      A random sample of 400 items gives the mean 4.45 and the variance as 4. Can the sample be regarded as drawn from a normal population with mean 4? (At 5% level of significance)

b)      In a city 350 out of 600 persons were found to be vegetarian. On the basis of this data, can we say that majority of the population in the city is vegetarian?

c)      From a population size 240, a sample of 49 individuals is taken. From this sample, the mean is found to be 15.8 and standard deviation is 4.2. Construct at 98 percent confidence interval for the population mean.

d)     In a random sample of 1000 items, the mean weight is 45 kg with standard deviation of 15 kg. Assuming the normality of the distribution. Find the number of items weight between 40 and 60 kgs.

Question (3)

(a)    A book has 700 pages. The number of pages with various numbers of misprints is recorded below. At 5% significance level, are the misprints distributed according to Poisson law. 

No of Misprints (x)

0

1

2

3

4

5

Total

No of pages with x misprint

616

70

10

2

1

1

700

 

 

 

 

(b)   In a laboratory experiment two random sample gave the following results.

Sample

Size

Sample Mean

Sum square of deviation from mean

1

10

15

90

2

12

14

108

Test the equality of sample variance at 5% level of significance.

 

(c)    In a test given to two groups of students drawn from two normal populations, the marks obtained were as follows.

Group A

18

20

36

50

49

36

34

49

41

Group B

29

28

26

35

30

44

46

 

 

Examine at 5% level of significance, whether the two populations have the same variance.

(d)   The numbers of automobile accidents per week in a certain community were as follows.

12, 8, 20, 2, 14, 10, 10, 15, 6, 9, 4

Are these frequencies in agreement with the belief that accident condition were the same during this 10 = week period?

 

Question (4)|

(a)    What is multiple regression equations?

(b)   What is the difference between coefficient of multiple determination and coefficient of multiple correlation

 

(c)    Estimate the equation that best describe the effect of advertising outlay and number of salesman on sales

 

 

Sales (Rs Lakhs)

Ad. Outlay (Rs in Lakhs)

Salesman

100

80

60

120

150

90

70

130

40

30

20

50

60

40

20

60

10

10

7

15

20

12

8

14

Predict sales of advertising outlay in Rs 45 Lakhs and number of salesman are 15.

 

(d)   Give notes on

    Run test

 

 

 

 

 

 

 

Answer (1)

(a)    The question follows the binomial distributions, as there is two chances (head or tail) in tossing a coin, so Probability of getting Head in a single toss i.e., P(H) =1/2 = 0.05

And probability of getting tail in a single toss, i.e., P (T) = ½ = 0.05

 

Here the question is given for recording the frequencies of Heads and tails of 3 fair coin tosses in 3000 times.

 

 

Head (H) can occur in 3 fair coin tosses are 0, 1, 2, 3

⇒ Tail (T) can occur in 3 fair coin tosses are 0, 1, 2,3

Here P ( 0) Head under binomial distribution is

In  case of 0 Head    = 0.1250

In case of  1 Head                 ‘’                     = 0.3750

In case of  2 Head                 ‘’                    = 0.3750

In case of  2 Head                 ‘’                    = 0.1250

 

 

In  case of 0 Tail      = 0.1250

In case of  1 Tails                  ‘’                    = 0.3750

In case of  2 Tails                  ‘’                    = 0.3750

In case of  3 Tails                  ‘’                    = 0.1250

No of Heads (X)

No of Tails (3-X)

Probability

Expected Frequency

0

1

2

3

3

2

1

0

0.1250

0.3750

0.3750

0.1250

3000×0.1250 = 375

3000×0.3750 =1125

3000×0.3750 = 1125

3000×0.1250= 375

Total

 

=1

3000

 

Mean = np = 3×1/2 = 1.5

Standard Deviation = = 0.75

 

 

 

 

 

 

 

 

 

 

(b)    

 

(c)   

 

 

 

 

 

 

(d)    

Question-2

 

(a)     Given Sample size n = 400, sample mean (X̄) = 4.45

Sample SD (s) = √4 = 2

Null Hypothesis H0: µ = 4 (the sample has been drawn from the population mean µ = 4)

Alternative Hypothesis (H1):  µ ≠ 4 (Two tail test)

The sample has drawn from the population and sample sd is given not population sd.

The level of significance α = 0.05

Test statistics

Z = = 

The critical value Z0.05 (two tail) – 1.96

Here the calculated value (5) > Critical value (1.96), the null hypothesis is rejected, hence it can be concluded that sample is not drawn from the population mean.

(b)   Given the sample size (n) = 600

P = sample proportion =

Assume that more 50% (0.5) people are vegetarian

Null Hypothesis H0: P < 0.5 (proportion of vegetarian is less than 50%)

Alternative Hypothesis (H1):  µ > 0.5 (One tail test)

Standard error of Proportion (= 0.02012

Z =

The critical value of Z0.05 (on tail test) = 1.645

Here the calculated value (4.140) > Critical value (1.645), the null hypothesis is rejected, hence it can be concluded that proportion of vegetarian is greater than 50%)

(c)    Given sample size (n) = 49

Sample mean (X̄) = 15.8

Sample standard deviation sd (s) = 4.2

             

                 µ = 15.8 ± 2.326 () = 15.8 ±  1.3956= 14.4044 – 17.1956

(d)   The distribution is normal distribution, first need to calculate the probability. Then multiply the sample size with probability to find the number of items.

Question 3 (a)

No of misprints per pages (x)

No of pages (f)

fx

0

616

0

1

70

70

2

10

20

3

2

6

4

1

4

5

1

5

 

700

105

 Average number misprint per pages (X̄) =

Mean (m) = 0.15

The distribution follows the Poisson law, so the expected frequency can be calculated as follow

P (r=0) =  0.86071

No of misprints per pages (X)

Cumulated Probability

Absolute probability

Expected Frequency

0

1

2

3

4

5

0.86071

0.98981

0.99950

0.99998

1.0000

1.0000

 

0.86071

00.98981-0.86071= 0.1291

0.99950-0.98981=0.00969

0.99998-0.99998=0.00048

1.0000-0.99998=0.00002

1.0000-1.0000=0.0000

700×0.86071=602.497

700×0.1291=90.37

700×0.00969=6.783

700×0.00048=0.336

700×2E-05=0.014

700×0=0

Total

 

1.000

700

Null Hypothesis (H0): The distribution does not follow the Poisson distribution

Alternative Hypothesis (H1): The distribution does follow the Poisson distribution

Observed frequency

Expected frequency

(O-E)2/E

616

602.497

0.302626

70

90.37

4.591534

10

6.783

1.525739

2

0.336

8.240762

1

0.014

69.44257

1

0

0

Total

84.10323

 

The critical value of ꭓ2 , 0.05, df-5 =  11.07

Here the calculated value (84.10323) > Critical value (11.07), so the null hypothesis is rejected and study is significant at 0.05 level and can be interpreted that the distribution does follow the binomial distribution.

(b)  Given the mean of sample 1: 15, sample size = 10

Given the mean of sample 2: 14, sample size = 12

Variance of sample 1   :

Variance of sample 2 :

Null Hypothesis (H0): δ2 1  = δ2 2  

Alternative Hypothesis (H1): δ2 1 ≠ δ2 2  

Critical value of F, 0.05, 9,11 = 2.8962

Here the calculated value (1.018) < crtical value (2.8962). So the study fails to reject the null hypothesis. That means study is not significant. It can be interpreted that two samples have equal varience.

(c)

Group A

Group B

18

29

20

28

36

26

50

35

49

30

36

44

34

46

49

 

41

 

 

 

δ1=11.905

δ2 = 8.021

n=9

n=7

 

Null Hypothesis (H0): δ2 1  = δ2 2  

Alternative Hypothesis (H1): δ2 1 ≠ δ2 2  

Critical value of F, 0.05, 8,6 = 4.1468

Here the calculated value (1.431) < crtical value (4.1468). So the study fails to reject the null hypothesis. That means study is not significant. It can be interpreted that two samples have equal varience.

(d)    Null Hypothesis (H0): Number of automobile Accidents per week is not equal

Alternative Hypothesis (H1): Number of automobile Accidents per week is equal

Observed frequency

Expected frequency

(O-E)^2/E

12

10

0.4

8

10

0.4

20

10

10

2

10

6.4

14

10

1.6

10

10

0

10

10

0

15

10

2.5

6

10

1.6

9

10

0.1

4

10

3.6

Total

 

26.6

 

The critical value of ꭓ2 , 0.05, df-10 =  18.31

Here the calculated value (26.6) > Critical value (18.31), so the null hypothesis is rejected and study is significant at 0.05 level and can be interpreted that the Number of automobile Accidents per week is equal.

Question  4

(a)    What is Multiple regression Equation?

Multiple regression generally explains the relationship between multiple independent or predictor variables and one dependent or criterion variable.  A dependent variable is modeled as a function of several independent variables with corresponding coefficients, along with the constant term.  Multiple regression requires two or more predictor variables, and this is why it is called multiple regression.

The multiple regression equation explained above takes the following form:

                               

Here, bi’s (i=1,2…n) are the regression coefficients, which represent the value at which the criterion variable changes when the predictor variable changes.

 

As an example, let’s say that the test score of a student in an exam will be dependent on various factors like his focus while attending the class, his intake of food before the exam and the amount of sleep he gets before the exam.  Using this test one can estimate the appropriate relationship among these factors.

 

Multiple regression in SPSS is done by selecting “analyze” from the menu.  Then, from analyze, select “regression,” and from regression select “linear.”

 

Assumptions:

 

·         There should be proper specification of the model in multiple regression.  This means that only relevant variables must be included in the  model and the model should be reliable.

·         Linearity must be assumed; the model should be linear in nature.

·         Normality must be assumed in multiple regression.  This means that in multiple regression, variables must have normal distribution.

·         Homoscedasticity must be assumed; the variance is constant across all levels of the predicted variable.

There are certain terminologies that help in understanding multiple regression.  These terminologies are as follows:

 

·         The beta value is used in measuring how effectively the predictor variable influences the criterion variable, it is measured in terms of standard deviation.

·         R, is the measure of association between the observed value and the predicted value of the criterion variable.  R Square, or R2, is the square of the measure of association which indicates the percent of overlap between the predictor variables and the criterion variable.  Adjusted R2 is an estimate of the R2 if you used this model with a new data set.

(b)   Coefficient of correlation vs. Coefficient of determination

Coefficient of Correlation (r):

·         It measures the strength and the direction of a linear relationship between two variables (x and y) with possible values between -1 and 1.

·         Positive Correlation: It indicates that two variables are in perfect harmony. They rise and fall together. +1 is perfect +ve correlation

·         Negative Correlation: It indicates that two variables are perfect opposites. One goes up and other goes down. -1 is perfect -ve correlation

·         No correlation: If there is no linear correlation or a weak linear correlation, r is close to 0.

Coefficient of determination (r2):

·         Coefficient of determination (r2) = Coefficient of Correlation (r) x Coefficient of Correlation (r)

·         It provides percentage variation in y which is explained by all the x variables together

·         Its value is (usually) between 0 and 1 and it indicates strength of Linear Regression model

·         Higher the R2 value, data points are less scattered so it is a good model. Lesser the R2 value is more scattered the data points.

(c)    Multiple Regression solution

(Y^ = a+ b1X1+ b2X2

 

Y

X1

X2

X1Y

X2Y

X1X2

X1^2

X2^2

Y^2

 

100

40

10

4000

1000

400

1600

100

10000

 

80

30

10

2400

800

300

900

100

6400

 

60

20

7

1200

420

140

400

49

3600

 

120

50

15

6000

1800

750

2500

225

14400

 

150

60

20

9000

3000

1200

3600

400

22500

 

90

40

12

3600

1080

480

1600

144

8100

 

70

20

8

1400

560

160

400

64

4900

 

130

60

14

7800

1820

840

3600

196

16900

Total

800

320

96

35400

10480

4270

14600

1278

86800

 

 

 

 

 

 

 

 

 

 

Mean

 

 

 

 

 

 

 

 

 

Y

100

 

 

 

 

 

 

 

 

X1

40

 

 

 

 

 

 

 

 

X2

12

 

 

 

 

 

 

 

 

 

800 = 8a + 320b1 + 96b2----------------Eq (1)

35400 = 320a + 14600b1 + 4270b2 --------------- Eq (2)

10480 = 96a + 4270b1 + 1278b2

After solving the above equations, we find the

a = 17.336

b1 = 1.193

b2 = 2.912

 Y = 17.336 + 1.193X1 + 2.912X2 -----Predicted Model

Given Adv. Outlay = 45

No. of salesmen = 15

So, the predicted sales (Y) = 17.336 +1.193 ×45 + 2.912×15 =

  = 17.336+ 53.685+43.68 = 114.701

 

 

(d) Run test

A run is a sequence of similar or like events, items or symbols that is preceded by and followed by an event, item or symbol of a different type,  or by none at all. Randomness of of the series  is unlikely when there appear to be either too many or two few runs. In this case, a run test need to be carried out to determine the randomness. The Run Test when performed helps us to decide whether a sequence of events, items or symbol is the result of a random process.

 Example of Runs

A data scientist carrying out a research interviewed 10 persons during a survey. We denote the genders of the poeple by M for men and W for women.

Assuming the respondents were chosen as follows:

 

Scenario 1

M M M M M F F F F F

 

Scenario 2

F M F M F M F M F M

 

Scenario 3

F F F M M F M M F F

 

·         Scenario 1 has only 2 runs and therefore the scenario cannot be considered random because there are to few runs

·         Scenario 2 has too many runs, 5 runs. And therefore, would not be considered as random

·         Scenario 3 has 5 runs and therefore we need to perform a test to determine the randomness of the data.

 

 

 

 

 

 

 

…………………………………………….Jai Hind………………………………………………

Comments

Popular posts from this blog

Application of AM, GM, HM, Median and Mode

Univariate Analysis

Earnings managment