COVID 19 2020 Pandemic Data Analysis STATA

The attached CovidData2020.dta data set contains information about COVID-19 related variables1 and a few other country-level indicators for countries with at least a population of 1 million. In answering the questions below, consider this data set as a random sample from a population of countries from different time periods. Good luck!

  1. For the following variables, state the level of precision, and report an appropriate level of central tendency, and dispersion, if it exists.
    1. europe [5pts]
    2. casesperc [5pts]
    3. hdi2018cat [5pts]
  2. The UN Development Programme (UNDP) ranks countries into four categories of human development based on life expectancy, education, and income. The variable hdi2018 in the data set contains the raw scores for this measure, and hdi2018cat reports the four levels of development across countries.
    1. Provide a 99% confidence interval for the sample mean of hdi2018. Interpret this interval. [4pts]
    2. What factors affect the width of this confidence interval? Briefly explain. [4pts]
    3. Are there any differences between the European countries and the Middle Eastern countries in terms of their 2018 HDI scores? [4pts]
    4. Do your conclusions in 2-c change if you use the 4-category version of the HDI? [4pts]
    5. Is there a relationship between HDI and the median age in countries in the sample? Can you briefly explain why? [4pts]
  3. For this question, we will look at reported deaths due to COVID-19.
    1. Focusing on the Middle Eastern countries, are there any outlier countries for deaths per 1 million individuals (deaths1m), and tests per 1 million individuals (tests1m)? [4pts]
    2. Do you expect any differences between the European countries and the Middle Eastern countries in terms of totaldeaths, the total number of recorded deaths due to COVID19? Briefly explain. [4pts]
    3. State your null and alternative hypotheses. Conduct the appropriate test and interpret the results. [4pts]
    4. Does your answer change if you instead do the comparison based on total deaths per 1 million individuals (deaths1m)? Why? [4pts]
    5. What is the sampling distribution of the test statistic you used in 3-d? [5pts]
    6. What would be a Type I error and a Type II error in the context of your answer to question 3-d? Briefly explain. [5pts]
  4. One researcher argues that countries that test more individuals (tests1m) should have a lower death rate, measured as the proportion of deaths out of total identified cases.
    1. Is the researcher’s expectation supported in the sample? Conduct an appropriate test. [4pts]
    2. What is the p-value in your test in 4-a? What information does it provide? [5pts]
    3. Do your conclusions in 4-a change if you focus only on the countries in the Middle East? [4pts]
    4. Provide a scatterplot for the relationship in the Middle East and identify countries on the plot. [4pts]
  5. A scholar argues that high level of testing in the UAE resulted in a lower number of COVID19 related deaths in the country.
    1. Define the causal effect in the above argument. [4pts]
    2. Suppose that, hypothetically, one proposes conducting an experiment to assess the effect of wide- spread testing on recorded death rates in different countries due to COVID19. Briefly describe the key aspects of such an experiment that would increase its internal validity. [8pts]
  6. One researcher is interested in measuring governments’ success in keeping COVID19 under control. He proposes to use totalcases1m, or the total number of detected cases per 1 million individuals as a measure of government success. He argues that higher values of this variable would indicate failed government policies. Is this a valid measure of government success? Briefly explain. [10pts]