Friday, February 23, 2024

There is no "average person" (re-post and edit)

At one point I thought having separate blogs for separate topics would be the way to go. But I've since decided that one blog for each general "life area" (e.g., professional stuff) is easier to manage. Since this blog is my professional and survey/statistics blog, I'm re-posting some of the entries from those other blogs I started. This is a re-post and edit of the blog post here  https://statpractice.blogspot.com/2015/04/there-is-no-average-person.html

We've all heard phrases like the ones below: 

  • "The average American uses..."
  • "Average LA residents drove..."
  • "The average family has 2.4 kids."

I've heard phrases like this too many times this week , and I have to put my contempt for them in writing (I guess it was a bad week back in 2015). My wife knows how much I hate these phrase, and it's a running joke for her to point it out to me when we hear it.

Why do I hate this phrase? It's partly because it's not accurate. When someone says "The average American owns 1.6 cars" what they mean to say is "Americans own an average of 1.6 cars." Such phrasing has become notoriously ridiculous in such cases like cars and kids for which it's impossible (or at least very unlikely) to own half a car or have half a kid. But it sneaks in through other statistics that make more sense as fractions (e.g., "The average Californian uses 54.3 gallons of water a day").

I'm a stickler for grammar and accuracy, but that's not the main reason why I hate this phrasing. It goes deeper than that.

  1. It makes interpretation of the statistic difficult for a lay audience, and makes it easy to criticize the statistic (or statistics in general). "No one has 2.4 kids!" "Who are these 'average' people?!"
  2. There is no "average American" (or any member of any population) but there can be a multivariate average of multiple characteristics among any group of people or objects. In this "The average person..." phrasing, we're not talking about that, though. If you think through the grammar of the sentence, it implies that they've created some sort of multivariate average from some sample and are reporting a statistic for that group. Really all they're doing is misapplying the term "average" to the people (e.g., Americans, Californians) when it should be applied to the outcome (e.g., cars, kids, gallons of water).
  3. Use and misuse of statistics in popular media is already a thorny area. Why contribute to it when a simple grammatical change can contribute to out global statistical literacy.
This obviously isn't most frequent or problematic statistical literacy issue out there, but maybe that's why it bugs me so much. Easy to avoid and could be one baby step toward global statistical literacy. 

Let me know what you think! What other statistical concept or term misuses bother you most?      

Thursday, July 29, 2021

Why is everyone using periods in slide bullets (???) and why it must stop!

Have you noticed this too? Over the past few years it seems like everyone's slides have periods after every bullet. Not just bullets that are grammatical sentences, but every one! I've only been at my current company for about 4 years, so I'm not sure if it's a company style or a scourge that Power Point has brought to our PPT-filed lives.

 

There seems to be some online discussion about it (https://www.quora.com/Do-you-use-periods-in-PowerPoint-presentations), and I've seen a couple recommendations advising to only use them with grammatical sentences. But this is something I've always had a gut (negative) feeling about, so here's my attempt at the argument against periods in bullets…as bullets of course! :)

 

1.       They are extra visual content that doesn't carry meaning. As such, they violate principals of parsimony and avoiding visual clutter (Tufte's data-to-ink ratio)

2.       You just don't need them. Think about what a period does. It separate sentences (individual thoughts or ideas) in paragraph (a series of running text). You don't have that in bullets (except these, and I'm using periods here). In bullets, particularly within slides, thoughts are separated by bulleting and structured by indenting.

3.       It makes it look like you don't know what a sentence is. Of course, so does having no period when there should be one. But overall, I prefer that imprecision to the other. Plus, good bullets shouldn't be full sentences (generally-speaking). If you're writing a lot of text that needs to be punctuated, like these bullets, you should probably edit your slides one more time.

 

 

With that in mind (and realizing that the "bullets" above make it look like I don't know what a sentence is), here's a revised version of them, as if edited for a PPT presentation.

 

 

1.       Extra visual content and clutter

        • Doesn't carry meaning
        • Reduces "data-to-ink" ratio (Tufte)

2.       Don't need them…plain and simple

        •  Periods separate sentences (thoughts or ideas) in a paragraph (running text)
        • Bullets are separated by vertical space (hard return)
          • Structured by indenting


3.       Looks like presenter doesn't know what a sentence is 

        • Trade-off: No period when there should be one (preferred)
        • Good bullets shouldn't (usually) be full sentence 
          • Reduce bullets to essential text to communicate ideas (review and edit)

  

Wednesday, February 4, 2015

My ResearchGate questions

I've become a big fan of the social networking site ResearchGate.com. It's a great place to publicize your research, and follow the research of friends and colleagues. Lately, I've been using it as a place to post questions (and answer some) that otherwise would lead me down online rabbit holes looking for answers. If you're on ResearchGate you can see the questions I've posted, so this blog post is more for myself. I've posted so many (and on related topics), that I've decided it's easiest to list them here for my own reference. Otherwise I lose track of them, and I feel bad leaving a loose end and not saying thanks to those who have posted. Here they are in chronological order (roughly) and grouped by topic.

1) Interviewer-respondent interaction, sociolinguistics, and psycholinguistics

https://www.researchgate.net/post/What_are_your_favorite_books_and_papers_on_quantitative_statistical_analysis_in_sociolinguistics_or_psycholinguistics

2) Professional/career issues

CV
https://www.researchgate.net/post/Can_anyone_recommend_a_CV_category_for_edited_but_not_peer-reviewed_articles

Discipline v. Field 
https://www.researchgate.net/post/What_is_the_difference_between_the_words_discipline_and_field

OneNote software
https://www.researchgate.net/post/How_to_make_OneNote_operate_more_efficiently

Reference management software
https://www.researchgate.net/post/Will_you_fill_out_my_survey_on_reference_management_software


3) Statistics generally (including data management)

Data analysis workflow
https://www.researchgate.net/post/Statistical_analysis_and_data_management_workflow

Data documentation
https://www.researchgate.net/post/What_are_your_favorite_model_surveys_for_the_extensiveness_and_instructiveness_of_data_collection_and_or_data_analysis_documentation

Emacs
https://www.researchgate.net/post/Has_anyone_put_together_an_Emacs_starter_kit_for_social_scientists_like_Kieran_Healys_but_for_PC

MLM
https://www.researchgate.net/post/What_are_your_favorite_contemporary_papers_on_multilevel_binary_and_multinomial_logistic_regression_with_survey_weighted_data

https://www.researchgate.net/post/Best_applied_statistics_books_on_nonlinear_mixed_models

p-values and multiple testing error
https://www.researchgate.net/post/Can_anyone_help_with_this_possible_misunderstanding_about_p-values_in_multiple_testing

Reliability analysis
https://www.researchgate.net/post/What_are_the_proper_techniques_for_analyzing_reliability_of_categorical_variables_with_a_large_number_of_categories

Variable recoding
https://www.researchgate.net/post/Do_you_like_to_use_0_or_1_as_the_base_category_for_your_categorical_nominal_and_ordinal_variables

4) Bullying and peer victimization

https://www.researchgate.net/post/Has_anyone_conducted_a_survey_of_school_teachers_and_administrators_asking_about_how_they_develop_or_choose_their_bullying_prevention_programs

5) Nonresponse 

Participation request sequencing
https://www.researchgate.net/post/Within-household_interviews_Better_to_ask_for_more_upfront_or_later

Age and nonresponse
https://www.researchgate.net/post/What_is_the_relationship_between_age_and_nonresponse_bias


6) Sexuality research

https://www.researchgate.net/post/Does_anyone_have_a_survey_collecting_sexual_orientation_data_on_teens_or_older_adults

https://www.researchgate.net/post/Are_gay_and_lesbian_people_less_likely_than_straight_people_to_report_their_sexual_orientation_in_surveys 


7) About ResearchGate itself

https://www.researchgate.net/post/Should_ResearchGate_develop_an_app_or_mobile_site


8) Surveying minors 

Parental consent
https://www.researchgate.net/post/Why_do_parents_refuse_to_let_their_teenage_kids_participate_in_surveys

Incentives
https://www.researchgate.net/post/What_is_an_effective_amount_to_offer_a_teenager_13-18_to_do_a_15_min_phone_survey

Thursday, January 22, 2015

Sources for Survey Questions and Measures

[FYI: I've decided that posting my links lists via blog posts, and editing them as I find more links, is easier than putting them on a static website or using a link service like Diigo. This is my first attempt at that, so the list will be updated and may change shape in the future. I'll add links and take suggestions on display if you have any. Thanks in advance.]

There are a lot of places to find survey questions online. The main two sources are organized, searchable databases (sometimes by topic) and survey documentation websites (where you can view the questionnaires themselves). You can also find them scattered on websites, in publications, and other places. I always advise researchers to remember that they can design their own questions or modify existing ones, and the fact that a question has been used by other researchers doesn't essentially make it a good question (unless your sole purpose is replication).

Here are some of the question repositories and sites I've found most helpful (or recommended by others). Databases and static documentation sites are intermingled below, and sorted by topic when possible.

Descriptions to come as I have time and a chance to use them. Feel free to suggest additions.


General Demographic, Social, and Omnibus Survey, and Poll, Questions

American Community Survey Questionnaire Archive (U.S. Census Bureau)

European Social Survey (ESS) Questionnaire




ICPSR Web Site
We usually think of ICPSR as a data warehouse, but I've found that when you search their database you get question text with summaries of responses. So it serves dual purpose.

iPOLL Databank


A product of NCHS. Holds cognitive interview and pretesting results as well as question text.

Survey Monkey's International Question Bank | SurveyMonkey Blog






Health Survey Questions

Behavioral Risk Factor Surveillance System (BRFSS) State-Added Question Database
This is the best place to find state-added BRFSS questions by state and topic

Behavioral Risk Factor Surveillance System (BRFSS) Questionnaires (CDC)


California Health Interview Survey (CHIS) Questionnaires

Cancer Questions from Grid-enabled Measures (GEM) Database

National Health Interview Survey (NHIS) - Questionnaires

NQF: Quality Positioning System 
Used by hospitals for quality of care questions

Patient Reported Outcome Measurement System (PROMIS) Database

Health outcomes measures database sponsored by NIH. See more about PROMIS here 


SHADAC's State Reform Survey Item Matrix (SRSIM)
A well-designed Excel spreadsheet with question text from various states

Religion Survey Questions

Congregational Survey Question Bank

Sexuality Survey Questions

Questionnaire for Gay Sentiment Study

Sexual orientation questions on LGBTData.com
The most comprehensive resource for LGBT questions and data sources I've found yet. 

Thursday, November 13, 2014

Reflections on programming (cross-post)

For some unknown obsessive-compulsive reason, I've taken to sorting my infrequent blog posts by topic. This one linked below wasn't really about survey methodology per se, so I posted it elsewhere. Thought some of you may be interested, though.

http://researchefficiency.blogspot.com/2014/11/the-long-road-that-is-short.html

Tuesday, July 8, 2014

A beginners guide to response rates

One of the most common types of questions I get in survey practices is "What is a good response rate?" or "Is my survey's response rate good enough? Do I have nonresponse bias?" Survey methodologists reading this are probably taking a deep breath and figuring out where to start their response. Here are are few things I think everyone should know about response rates (non-technical...I will post later on AAPOR response rate calculation). 

1) The answer depends on what you mean by "good". 

"Good" can mean "high enough to publish in a specific journal," or, "high overall (e.g. 80-90% or more)," or what people usually want to know, "Are my results biased?"

"Good" might also mean "Do (will) I have enough cases for key analyses?". 

In my mind, "good" should mean "relative to other surveys in the same mode with similar design features". We just can't expect 50% RR's from RDD surveys and shouldn't get upset when we don't see them. 

2) Any statement about survey "goodness" or data quality has to be conditional on the amount of resources spent/spendable. 

'nuff said.

3) Response rates are good for some things, but not others.

Good for:

a) Tracking an ongoing survey's performance over time
b) Comparing surveys that are conducted under the same or very similar "essential survey conditions" (e.g., mode, contact materials and protocols, costs/resources)
c) Planning survey costs, inference (CI's and power analyses), and number of completed cases
d) Making initial assessments of approximate representation of key subgroups

Not good for:

a) Assessing nonresponse bias. See work by Groves, Groves & Peytcheva and others. This is lesson number 1 or 2 in survey methodology training, but often isn't intuitive outside our field until explained. Statisticians usually understand this inherently, but substantive researchers may not. Easily taught though. 

On a related note, I was just reviewing notes from Jill Montequilla and Kristen Olson's short course "Practical Tools for Nonresponse Bias Studies." If you want to learn more about how to asses NR bias, I recommend taking the course. 

Friday, July 4, 2014

My other blog...

I've debated about how much to post here that isn't specifically about survey methodology, so I started a secondary blog at
http://researchefficiency.blogspot.com
It seems like a lot of the things I want to blog about lately are research practice, coding, project management, efficiency, etc. The new blog will be the outlet for those kinds of topics (with some cross-posting of course). See my recent post on developing a personal code library and an earlier one on Excel shortcuts.
http://researchefficiency.blogspot.com/2014/07/developing-your-personal-statistical.html

Tuesday, June 3, 2014

Training in survey methodology and practice


There's an upcoming DC AAPOR event on survey methodology training in DC on June 13 (http://dc-aapor.org/upcomingevents.php).  I can't attend so I thought I'd share some of my own thoughts on the matter here. I think about this topic from three different perspectives. 

As a survey methodology instructor and trainer of future methodologists: 
  1. Instructors should distinguish clearly between whether their training (e.g., their course or degree curriculum) is about "survey research practice" or "survey methodology" or what fraction of each. Those seeking practice training can be turned off by methodological debates and esoterica, and the line between esoterica and fundamentals isn't always clear, particularly in an interdisciplinary field like survey methodology. Survey methodology is an applied, yet scientific field, and should have a balance of both perspectives.
  2. While a methodology focus trains the next generation of scientists and leaders, it may not give one a good enough broad-based training in concrete techniques because the focus is on isolating and filling gaps in small areas of the field. That doesn't mean that graduate programs can't have both. For example an MS program could have an applied track, for those who want to go to work after training, and a "theory" (for lack of a better word) track for those who want to go to the PhD.
  3. Include official, sanctioned specializations (see student point 2 below) outside of survey methodology programs.
  4. Use Bloom's taxonomy of learning when planning courses. I've used this in my own and it helps operationalize clear course outcomes and goals,  structure the course to meet them. Otherwise we just end up teaching what we happened to learn in the way we happened to learn it, and may not be optimizing instruction and student experiences for the outcomes we want them to have. 
As a student:
  1. Stats v. Social Science focus: I'm sure opinions are split on this (my own opinion is split depending on the context). Groves (as you might expect) wanted us each to be strong in all of it (and "Do it better than we did." A tall order). On the other hand, the broader you go in topics, the less focused you can be. I'm glad I pushed my statistical boundaries and learned things I never thought I would learn. Although I still consider myself more of a social scientists, I can practice at a level of statistics I never thought I would. The counter argument is that it's been hard to focus on one or two problems and get research done. If you're going to go broad, make sure you get things out and published regularly so you don't end up with a scattered CV.
  2. Talks are fun but only pubs really matter (can't emphasize that enough now that I'm out). Take extra time to publish before life gets in the way (e.g., an extra 6 mos or year before defending or a postdoc instead of "regular job". I guess this means faculty should be giving you room to publish (either co-authoring or solo papers based on class projects). MPSM/JPSM have good models for this in the Practicum, TSE and Design Seminar courses.
     
  3. Read and study outside survey methodology: Not just to find your field of application, but to find areas that will advance survey methodology. For example, I took social psych courses and read the communications and linguistics literature in my graduate work. I still try to keep a portion of an eye on decision science and other psychological and behavior economic research that has something to say about measurement and nonresponse "decisions". I'm sure there are parallels in statistical work (e.g., estimation techniques or applied problems that aren't in the main view of usual survey statistics).

As an employer:
  1. I expect (or hope) that students coming out of formal survey methodology training (v. social, psych, or education research methods in another field/discipline or from stats programs) will have a balance of conceptual perspective and concrete skills. For example, I expect JPSM, MPSM, and SRAM students (or those who take my course) to have a handle the TSE framework and terminology, at least at a level that facilitates discussion so we can quickly/easily zero in on whether we're talking about coverage error, sampling error, or what. I don't know if every SM program is teaching this (or a similar) model, but we need something that moves us from niche jargon to relatively standard technical terminology. I'm probably biased, but I feel like TSE does that (well as some of the other frameworks out there). Terminology and models are a core part of the science of survey methodology in my mind, but I also expect grads to be able to DO things.
  2. I expect soc and stat side students to have decent quant skills (both interpretation and production). More so if on stats side. It doesn't seem right to me (given the current social science paradigm) to turn out students that can't do basic analysis, basic experiment design, or understand the basics of survey weights and variance estimation. Students should seek this kind of training if their program does't provide it. I would expect even MS students to have a working knowledge of these things and be able to refresh as needed.
  3. If I was hiring an MS level soc-side person I would expect these classes
    1. Data collection methods
    2. Questionnaire design
    3. Applied Sampling
    4. Cognition - or course on social aspects of measurement
    5. Practicum courses
      1. Covering nonresponse avoidance and sampling techniques...really "how to"
    6. Intro stats (2 semesters, through at least linear and logistic regression)
    7. Analysis of complex sample survey data
  4. If I was hiring an MS level stat person I would expect these classes
    1. Data collection methods
    2. Applied Sampling
    3. Sampling theory (or something more mathematical than applied sampling)
    4. Missing data/imputation
    5. Practicum courses
      1. Covering analysis
    6. Intro stats (3 semesters, through at least linear and logistic regression)
    7. Analysis of complex sample survey data
    8. Advanced variance estimation
    9. Introduction to latent variable models
      1. Pref with some exposure to complex survey data
    10. Introduction to longitudinal analysis
      1. Pref with some exposure to complex survey data

Friday, May 23, 2014

Methods of efficiency

I'm convinced that micro-level behaviors, habits, and actions are just as important to becoming a productive researcher as having big ideas. I've been working on improving those things over the past year. You could call these things the "methods of doing research work" but many apply to other kinds of creative work, technical work, and project management. We don't talk about them a lot in professional circles because they're not the big/sexy ideas that change the world in one fell swoop. However, they are the mitochondria of our research cells, and I think we should share tips and tricks like this much more often for the larger benefit of the field.

My wife and I had breakfast with our friends Mario and Ana this past week and we barely got to share personal stories because we were sharing efficiency tips and tricks the whole time. Here are two pieces of software I've come to love (M & A, one is the thing I couldn't recall the name of and another I just found this week). Both reduce the keying/mousing you have to do, which seems small but adds up. Autohotkey lets you program scripts and macros for any key combination or mouse movement so is VERY versatile and great for be jobs that require repeated key/mouse movements. Breevy (just started using today) lets you record keyboard shortcuts and text-expansion phrases like you can do in Word with Autocomplete/correct, but works across all programs in Windows. Sure beats programming specific kb shortcuts within individual programs.

Mention other favorites if you have them.

Job Opening at NASS

I'm not sure I'm brazen enough to believe that my blog reaches more people than the SRMS and AAPOR listservs, but I thought I'd post this NASS job opening to help out a colleague. NASS has always seemed like a fun and innovative place to me. And the federal home of Likert of course :)

*********************************************************
Hello all,

We are looking to fill a senior level mathematical statistician position.  It will be a great opportunity for the right person.

The description is below, and the job will be open for applications until June 5.  Please pass along to any other interested candidates.  Thanks!


The U.S.D.A.’s National Agricultural Statistics Service (NASS) is searching for a senior mathematical statistician (ST-1529-00) who will serve as the Research and Development Division’s Deputy Director for Science and Planning. The National Agricultural Statistics Service (NASS) is the data collection and dissemination arm of the U.S. Department of Agriculture. NASS gathers and publishes a vast array of information on every facet of U.S. agriculture, including production, economics, demographics, and the environment. The incumbent serves NASS as a research statistician in mathematical statistics for agricultural surveys and censuses, geospatial techniques, statistical modeling for estimation and process measurement. Primary qualifications include a senior science degree of technical skill in mathematical statistics and probability sampling, especially in the area of geospatial analyses, model-based estimators, and non-sampling errors. Research activities include advanced survey sampling design and estimation methods and theory; geospatial estimation methods and theory; measurement error models; nonsampling error models; list, area and multiple frame sampling methods and theory; forecasting techniques; statistical modeling for estimation; and multivariate and quality control methods. The incumbent will lead teams of mathematical statisticians and serves in research management and science leadership by advising the Administrator of NASS, Director of Research and Development Division and Director of the Statistical Methodology on statistical issues and methodology affecting NASS programs. The incumbent’s research efforts will be focused about 90 percent internally, on improving the Agency’s census and survey estimation programs and research, and about 10 percent split between liaison activities with the statistical community as a whole and reimbursable consulting with external entities. More information on the position and how to apply may be found using the following link: https://www.usajobs.gov/GetJob/ViewDetails/370711100




Jaki S. McCarthy
Senior Cognitive Research Methodologist
USDA's National Agricultural Statistics Serivce
Research and Development Division