POLS 6482 ADVANCED MULTIVARIATE STATISTICS
Tenth Assignment
Due 12 November 2001


  1. This problem is a continuation of our analysis of the 105th and 106th congressional district data that we have analyzed in homeworks 3, 4, 5, 6, and 8. I made some additional corrections to the file so download the new Stata file below:

    105th and 106th Congressional District Data (HDMG106.DTA)

    1. Download the following Excel file:

      2000 Census Data For Congressional Districts (CD2000CENSUS.XLS)

      and merge it into HDMG106.DTA. (Note that this will require some ingenuity on your part!) Use the following variable names and definitions (note that the variable district also appears in the Excel file but it will be one of the variables you will need to use to do the merge):
        
      statenmlong     str20  %20s                   Name of state (long)
      total_pop       double %10.0g                 Total population of CD 2000
      white00         float  %9.0g                  Percent White 2000 
      black00         float  %9.0g                  Percent Black 2000
      asian00         float  %9.0g                  Percent Asian 2000
      hispanic00      float  %9.0g                  Percent Hispanic 2000
      owner00         float  %9.0g                  Percent Owner-Occupied Housing Units 2000
      
      Do the d and summ commands, and report the results.

    2. In STATA run the regressions:

      regress bush00 black00 south hispanic00 income owner00 dwnom1n dwnom2n dole96
      regress gore00 black00 south hispanic00 income owner00 dwnom1n dwnom2n clint96

      Analyze these two regressions. What do you think accounts for the differences between them. Be specific.

    3. In STATA compute the correlation matrix for the independent variables:

      correlate black00 hispanic00 income owner00 dwnom1n dwnom2n

      Examine the entries of the correlation matrix. Do you see anything that strikes you as odd? Be explicit.

    4. To obtain the eigenvalues and eigenvectors of the correlation matrix, use the STATA command:

      factor black00 hispanic00 income owner00 dwnom1n dwnom2n, pc

      To obtain a graph of the eigenvalues, use the STATA command:

      greigen, xlabel(1,2,3,4,5,6)

      Does this graph lead you to believe that there is a significant problem with multicollinearity with these independent variables? Why? Why not?

  2. This problem deals with congressional elections. Below you will find a dataset that includes variables created by David Lublin and Gary Jacobson. The observations are congressional districts for the 1960 to 1994 period. Some of the data are missing so when you run regressions you may not have the entire time period. To bring up the dataset in Stata you will have to increase the default memory size. To do this, use the command:

    set mem 20m

    which allocates 20 meg of memory for Stata to work with.

    Congressional Elections Data From Lublin and Jacobson (Stata Dataset)

    Download the dataset and bring it up in Stata. If you issue the d command you will see:
    
    . d
    
    Contains data from D:\statadat\lublin5.dta
      obs:         7,832                          
     vars:            39                          1 Nov 2001 11:35
     size:     1,057,320 (98.7% of memory free)
    -------------------------------------------------------------------------------
                  storage  display     value
    variable name   type   format      label      variable label
    -------------------------------------------------------------------------------
    year            int    %8.0g                  year
    congress        byte   %8.0g                  congress (87-104)
    icpsrid         long   %12.0g                 icpsr id #
    icpsrst         byte   %8.0g                  icpsr state code
    cdist1          byte   %8.0g                  cong. district (p&r)
    statenm         str7   %9s                    state name
    cdist2          byte   %8.0g                  cong. district (lublin)
    dempct          float  %9.0g                  demo. % two party vote
    blkpct          float  %9.0g                  black percent of pop.
    whpct           float  %9.0g                  white percent of pop.
    forpct          double %10.0g                 foreign born % of pop.
    south           byte   %8.0g                  south (1=confederacy + KY +OK,
                                                    0=north)
    incomewh        float  %9.0g                  white median family income
    incomebl        long   %12.0g                 black median family income
    hs25            float  %9.0g                  percent 25 and older completing
                                                    high school or more
    college         float  %9.0g                  percent 25 or older completed 4
                                                    yrs college or more
    party1          int    %8.0g                  party code (100=Dem, 200=Rep)
    blackrep        byte   %8.0g                  blackrep =1 if black
                                                    representative, 0 otherwise
    latinorp        byte   %8.0g                  latinorp=1 if mexican, 2=PR,
                                                    3=Cuban, 0 otherwise
    womanrep        byte   %8.0g                  woman representative (1=woman,
                                                    0=man)
    incumb1         byte   %8.0g                  incumbency (0=repub, 1=demo.,
                                                    2=open)
    votesd          long   %12.0g                 number of votes for democrat
    votesr          long   %12.0g                 number votes for republican
    demvshr         float  %9.0g                  democrats share two-party vote
    whowon          byte   %8.0g                  0 = repub won, 1= demo. won,
                                                    99=3rd party won
    incshr          float  %9.0g                  incumbents share 2-party vote,
                                                    99.9=unopposed
    incshrl         float  %9.0g                  incumbents share 2-party vote
                                                    last elect, 99.9=unpposed
    redist          byte   %8.0g                  redistricted: 0=district
                                                    unchange, 1=re-districting
    incumbst        byte   %8.0g                  incumbency status:  
                                                    0 = republican incumbent
                                                    1 = democratic incumbent
                                                    2 = open seat formerly held by democrat
                                                    3 = open seat formerly held by republican
                                                    4 = open seat, new (from redistricting)
                                                    5 = two incumbents (from redistricting)
                                                    9 = third-party incumbent
    challeng        byte   %8.0g                  challenger quality
                                                    0 = challenger has not held elective office  
                                                    1 = challenger has held elective office  
                                                    2 = only Democratic candidate for open seat has held office  
                                                    3 = only Republican candidate for open seat has held office  
                                                    4 = both candidates for open seat have held office  
                                                    5 = no challenger  
                                                    6 = no Democrat candidate (open)  
                                                    7 = no Republican candidate (open)  
    challenh        byte   %8.0g                  challenger misc. information
                                                    0 = Nothing special  (ignore)
                                                    1 = At Large or multi-candidate race  
                                                    2 = unopposed  
                                                    3 = incumbent switched parties since last election  
                                                    4 = challenger was state legislator  
                                                    5 = only Democrat was state legislator (open seat)  
                                                    6 = only Republican was state legislator (open seat)  
                                                    7 = both candidates for open seat were state legislators  
                                                    8 = challenger is former U.S. Representative  
                                                    9 = odd race, third party; in general, DO NOT USE  
    icpsrid2        long   %12.0g                 icpsr id number
    party2          int    %8.0g                  party id (100=Dem, 200=Repub)
    name            str11  %11s                   member name
    dwnom1          float  %9.0g                  dwnominate 1st dimension
    dwnom2          float  %9.0g                  dwnominate 2nd dimension
                                                    (multiply by .3)
    partynm         str13  %13s                   name of political party
    xincome         long   %12.0g                 median family income
    xhispct         float  %9.0g                  percent hispanic
    -------------------------------------------------------------------------------
    Sorted by:
    If you issue the summ command you will see:
    . summ
    
        Variable |     Obs        Mean   Std. Dev.       Min        Max
    -------------+-----------------------------------------------------
            year |    7832    1976.996   10.37915       1960       1994
        congress |    7832    95.49783   5.189574         87        104
         icpsrid |    7832     12325.2   7208.363          2      95120
         icpsrst |    7832    36.75447   21.00158          1         82
          cdist1 |    7832    9.979443   10.88324          1         99
         statenm |       0
          cdist2 |    7832    9.566394   9.151734          1         52
          dempct |    7832    56.98605   23.56704          0        100
          blkpct |    7595    11.12508   14.51647   .0194025   95.50033
           whpct |    7595    85.85531   15.74295   3.862633   99.89686
          forpct |    6723    5.732316    6.60551    .116483   58.52188
           south |    7832    .2858784   .4518606          0          1
        incomewh |    5859    17896.74   11820.14   2088.375      78717
        incomebl |    5856    12378.12   8885.767       1213      66320
            hs25 |    6723    57.11782    15.5113       14.8       92.3
         college |    6723    12.88006   7.004294        1.9       51.4
          party1 |    7830    140.3649   49.22029        100        329
        blackrep |    7832    .0390705   .1937751          0          1
        latinorp |    7832     .020429   .1716504          0          3
        womanrep |    7832     .046859   .2113504          0          1
         incumb1 |    7817    1.242548   .6364613          0          3
          votesd |    6382    83320.55   53081.79          0    1872351
          votesr |    6443    71985.16   57503.79          0    1786018
         demvshr |    7832    57.00632   23.59968          0        100
          whowon |    7832     .601762    .525787          0          9
          incshr |    7832    71.56447   18.54318       20.6     99.999
         incshrl |    7832    69.86711   17.08056       22.1     99.999
          redist |    7832    .2893258   .4534784          0          1
        incumbst |    7832    .8476762   .8639607          0          9
        challeng |    7832    1.135981   1.803747          0          9
        challenh |    7832    1.124362   2.017193          0          9
        icpsrid2 |    7832     12325.2   7208.363          2      95120
          party2 |    7832    140.4164   49.26602        100        329
            name |       0
          dwnom1 |    7832   -.0354424   .3335639      -1.07       1.37
          dwnom2 |    7832    .0107231   .5186352      -1.83       1.43
         partynm |       0
         xincome |    6723    15494.69   10600.03       1968      64199
         xhispct |    4780    6.610872   11.38954   .0137409   83.71677
           order |    7832      3916.5   2261.048          1       7832
    1. Your assignment is to build a model of the Democratic Vote Share. That is, use demvshr as your dependent variable (note, do not use dempct -- it has some errors in it!). You are free to use any independent variables you want but you must include median family income (xincome) in your specification. Whatever other independent variables you use, you must have a reasonable explanation for your specification!

    2. Note that xincome is in nominal dollars! To see the distribution of xincome use the graph command in Stata; namely:

      graph xincome congress

      To correct the xincome variable as well as the incomewh and incomebl variables, you need to apply a price deflator. For congress 88 - 91 use 100/90.6, for 93 - 97 use 100/125.3, for 98 - 102 use 100/289.1, and for 103 - 104 use 100/420.3. These transformations will correct the income variables to 1967 dollars.

    3. When you have settled on your specification and have finished your analysis using Stata, paste the variables that you settled on into EVIEWS and replicate your analysis using EVIEWS.