538 Masculinity EDA

I saw this tweet on a Friday night and wanted to dive right in:

Hey data science instructors! Are you looking for data that’s rich + real + realistic while still being accessible to novices? Then check out the latest update to the #fivethirtyeight #rstats package! 1/4 pic.twitter.com/XMaeZK2kOj
— Albert Y. Kim (@rudeboybert) August 2, 2019

I’m going to see what I can do tonight (Friday August 2nd, 2019- 9:50 PM start time)

Here is the original 538 post: https://fivethirtyeight.com/features/what-do-men-think-it-means-to-be-a-man/

Start with a basic set-up. I haven’t previously worked with the fivethirtyeight package but I’ll install and load it, and the tidyverse for my standard data manipulation.

?fivethirtyeight
data("masculinity_survey")
dat <- masculinity_survey # I replace the long title with this casue I'm lazy here. 
head(dat)

## # A tibble: 6 x 12
##   question response overall age_18_34 age_35_64 age_65_over white_yes
##   <fct>    <fct>      <dbl>     <dbl>     <dbl>       <dbl>     <dbl>
## 1 "In gen… Very ma…    0.37     0.290      0.42        0.37      0.34
## 2 "In gen… Somewha…    0.46     0.47       0.46        0.47      0.5 
## 3 "In gen… Not ver…    0.11     0.13       0.09        0.13      0.11
## 4 "In gen… Not at …    0.05     0.1        0.02        0.03      0.04
## 5 "In gen… No answ…    0.01     0          0.01        0.01      0.01
## 6 How imp… Very im…    0.16     0.18       0.17        0.13      0.11
## # … with 5 more variables: white_no <dbl>, children_yes <dbl>,
## #   children_no <dbl>, straight_yes <dbl>, straight_no <dbl>

nrow(dat)

## [1] 189

ncol(dat)

## [1] 12

Okay, it appears there are 189 observations across 12 variables. First column is question, and second column is responses.

unique(dat$question)

##  [1] In general, how masculine or "manly" do you feel?                                                                                          
##  [2] How important is it to you that others see you as masculine?                                                                               
##  [3] Where have you gotten your ideas about what it means to be a good man? (Select all that apply.)                                            
##  [4] Do you think that society puts pressure on men in a way that is unhealthy or bad for them?                                                 
##  [5] Ask a friend for professional advice                                                                                                       
##  [6] Ask a friend for personal advice                                                                                                           
##  [7] Express physical affection to male friends, like hugging, rubbing shoulders                                                                
##  [8] Cry                                                                                                                                        
##  [9] Get in a physical fight with another person                                                                                                
## [10] Have sexual relations with women, including anything from kissing to sex                                                                   
## [11] Have sexual relations with men, including anything from kissing to sex                                                                     
## [12] Watch sports of any kind                                                                                                                   
## [13] Work out                                                                                                                                   
## [14] See a therapist                                                                                                                            
## [15] Feel lonely or isolated                                                                                                                    
## [16] Which of the following do you worry about on a daily or near daily basis? (Select all that apply.)                                         
## [17] Which of the following categories best describes your employment status?                                                                   
## [18] AMONG EMPLOYED: In which of the following ways would you say is an advantage to be a man at your work right now? (Select all that apply).  
## [19] AMONG EMPLOYED: In which of the following ways would you say is a disadvantage to be a man at your work right now? (Select all that apply).
## [20] AMONG EMPLOYED: Have you seen or heard of a sexual harassment incident at your work? If so, how did you respond? (Select all that apply.)  
## [21] AMONG EMPLOYED: How much have you heard about the #MeToo movement?                                                                         
## [22] AMONG EMPLOYED WHO'VE HEARD OF METOO: As a man, would you say you think about your behavior at work differently in the wake of #MeToo?     
## [23] Do you typically feel as though you're expected to make the first move in romantic relationships?                                          
## [24] How often do you try to be the one who pays when on a date?                                                                                
## [25] AMONG THOSE WHO TRY TO PAY ON FIRST DATE: Which of the following are reasons why you try to pay on dates? (Select all that apply).         
## [26] When you want to be physically intimate with someone, how do you gauge their interest? (Select all that apply.)                            
## [27] Over the past 12 months, when it comes to sexual boundaries, which of the following things have you done? (Select all that apply.)         
## [28] Have you changed your behavior in romantic relationships in the wake of #MeToo movement?                                                   
## [29] Are you now married, widowed, divorced, separated, or have you never been married?                                                         
## [30] Do you have any children? (Select all that apply.)                                                                                         
## [31] Would you describe your sexual orientation as:                                                                                             
## [32] Are you:                                                                                                                                   
## [33] What is the last grade of school you completed?                                                                                            
## [34] age                                                                                                                                        
## 34 Levels: age ...

Alright, inspecting the data I see that there are 34 questions most with a Likert scale (1-5) score in the response, some grouping variables, and then each response has the summary statistics of the people that responded to that question.

#Demographics Some questions drive into demographics. If the survey is biased, I’ll believe the results are less generalizable. Let’s look into these first.

summary_questions <- c("age", "What is the last grade of school you completed?", "Are you:", 
                      "Would you describe your sexual orientation as:", "Do you have any children? (Select all that apply.)", "Are you now married, widowed, divorced, separated, or have you never been married?", "Which of the following categories best describes your employment status?")

summary_graph <- function(df, question_fact){
  df %>% 
    filter(question==question_fact) %>% 
    ggplot(aes(x=response, y=overall *100, fill=response)) +
    geom_col() +
    ylab("Percent of sample") + 
    NULL
    
}

#Test out my new function
summary_graph(df=dat, question_fact=summary_questions[1])

#It's doing what I thought it would. 

#Run it on all my demographics variables
lapply(summary_questions, FUN=summary_graph, df=dat)

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

Our sample is likely representative because that’s how they would want the survey to be. These conform to my general (and very rough) knowledge of demographics in the USA.

Looking at the questions more, I think I can subdivide into themes:

Sexuality
Role in society
Emotional health
Activities

#Emotional health

eh_questions <- c("Cry", "Get in a physical fight with another person", "Feel lonely or isolated", "See a therapist")

summary_graph_nolegend <- function(df, question_fact){
  df %>% 
    filter(question==question_fact) %>% 
    ggplot(aes(x=response, y=overall *100, fill=response)) +
    geom_col() +
    ylab("Percent of sample") + 
    theme(legend.position="none",
          text = element_text(size=10)) + 
    ggtitle(question_fact) + 
    NULL
    
}
#Test out my new function
summary_graph_nolegend(df=dat, question_fact=eh_questions[1])

#It's doing what I thought it would. 

#Run it on all my demographics variables
x <- lapply(summary_questions, FUN=summary_graph_nolegend, df=dat)
#x_legend <- cowplot::get_legend(summary_graph(df=dat, question_fact=eh_questions[1]))
#x <- x %>% append(x_legend)
#?plot_grid # Had to check how plotlsit variable worked. 
cowplot::plot_grid(plotlist=x, ncol=2, rel_widths = c(1,1,0.5))

I want to drill down into some of the other variables for the ‘lonely/isolated’ question. Will need to rearrange the data a little bit for this.

question_fact = "Feel lonely or isolated"

tidy_dat <- dat %>% 
  gather(key="demographic", value="value", -question:-overall) %>% 
   mutate(response = fct_relevel(response, c("No answer", "Never, and not open to it", "Never, but open to it", "Rarely", "Sometimes", "Often"), after=0))

#Save this as a variable so I can review later.
#Lots of the response variables are Likert scale formatter so I order them. It doesn't affect the other variables. 

tidy_dat %>% 
  filter(question==question_fact) %>% 
  filter(grepl(demographic, pattern="age")) %>% 
  ggplot(aes(x=response, y=value *100, fill=response)) +
  geom_col() +
  ylab("Percent of sample") + 
  theme(text = element_text(size=10),
        axis.text.x = element_blank()) + 
  ggtitle(question_fact) + 
  facet_wrap(~demographic) +

  NULL

This graph is slightly informative. It could be improved by moving the factors to some kind of ascending order that makes sense.

tidy_dat %>% 
  filter(question==question_fact) %>% 
  filter(grepl(demographic, pattern="age")) %>% 
  ggplot(aes(x=response, y=value *100, fill=response)) +
  geom_col() +
  ylab("Percent of sample") + 
  theme(text = element_text(size=10),
        axis.text.x = element_blank()) + 
  ggtitle(question_fact) + 
  facet_wrap(~demographic) +
  viridis::scale_fill_viridis(discrete=T, direction=-1) +
  NULL

#?viridis::scale_fill_viridis() #This is a great colour palette for R. I switch direction to be -1 to get darker colours at the 'darker' end of the spectrum  and switch to discrete values

Here, I facet_wrap the data by age demographic and then add a colour scale that’s more meaningful to the data. If we start thinking in bins, >50% of the 18-34 group is sometimes or often lonely comapred to <30% in the >=65 group.

Why is this?

#Role in Society

question_fact <- "Ask a friend for personal advice"



sub_question <- function(df=tidy_dat, question_fact=NA, demographic_pattern=NA){
  df %>% 
  filter(question==question_fact) %>% 
  filter(grepl(demographic, pattern=demographic_pattern)) %>% 
  ggplot(aes(x=response, y=value *100, fill=response)) +
  geom_col() +
  ylab("Percent of sample") + 
  theme(text = element_text(size=10),
        axis.text.x = element_blank()) + 
  ggtitle(question_fact) + 
  facet_wrap(~demographic) +
  viridis::scale_fill_viridis(discrete=T, direction=1) +
  NULL
}


sub_question(df=tidy_dat, question_fact="Ask a friend for personal advice", 
             demographic_pattern = "straight")

#I could lapply this because there's only 4 
demographic_patterns <- c("straight", "age", "white", "children")

lapply(demographic_patterns, FUN=sub_question, df=tidy_dat, question_fact="Ask a friend for personal advice")

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

It seems a lot of the responses while useful on a Likert scale might be better as roughly yes or no answers. Never-Rarely seem like 0, and Sometimes and Often seem like 1s.

tidy_dat <- tidy_dat %>% 
  mutate(response_binary = if_else(response=="Sometimes"|response=="Often", 1, 0))

#Sexuality

#Multi-answer questions Many of the questions have multiple answers possible (“Answer all that apply”).

aata_questions <- unique(dat$question)[grepl(unique(dat$question), pattern="all that apply")]
aata_plots <- function(df, question_fact, demographic_pattern){
  df %>% 
    filter(question==question_fact) %>% 
    filter(grepl(demographic, pattern=demographic_pattern)) %>% 
    ggplot(aes(x=response, y=value *100, fill=response)) +
    geom_col() +
    coord_flip() + 
    ylab("Percent of sample") + 
    theme(text = element_text(size=10),
          axis.text.x = element_blank(),
          legend.position="none") + 
    ggtitle(question_fact) + 
    facet_wrap(~demographic) +
    viridis::scale_fill_viridis(discrete=T, direction=1) +
    NULL
}

aata_plots(df=tidy_dat, question=aata_questions[1], demographic_pattern="age")

# lapply(aata_questions, FUN=aata_plots, df=tidy_dat, demographic_pattern="age")

That’s all for now, but I’ll update another day.

Written on August 2, 2018