My First Hackathon

When I was first getting started with programming and data science, one piece of advice I heard repeatedly was to attend a hackathon. Initially, I was pretty skeptical– I pictured staying up all night fueled on Red Bull vying to win a prize. This was unappealing because 1.) I’m not overly competitive, and 2.) I like to sleep. But when I found a hackathon weekend focused on “civic hacking,” or using publicly available data for civic good, I figured it was worth a shot.

coffee mug and laptop

I also pictured coffee. Lots of coffee

And so I found myself one weekend in June at the 2015 SF Day of Civic Hacking at SF State. One of the projects pitched was around the health impacts of climate change, which appealed to me. My hope for the day was to pair with a more experienced Python developer and hopefully glean some knowledge from them. Well, instead I found myself in a group with a few other people who were interested in the topic, but none of us were particularly confident in our programming skills. I was the most familiar with R and Shiny apps, so I became the de facto software developer– definitely not what I had expected!

We hoped to build a visualization for policy makers that compared climate change to changes in community health, which would ideally increase concern about global warming. We had a lot of ideas about how we could so, or what our visualization might look like. However, when we got down to it, finding relevant data (and cleaning it) took a lot longer than any of us anticipated.

Our final product by the end of the weekend was a simple R Shiny app that showed changing average temperatures and number of West Nile Virus cases for each county in California.As a caveat, this visualization is primarily hypothesis-generating, as a simple correlation certainly shouldn’t imply a causal link. We initially hoped to include more measures of health than WNV incidence, but we easily found the data for WNV so included that in the prototype.  I also think the tool might be more useful if it were more granular (i.e. more local than county), which might help it ‘come alive’ to people that climate change is having an effect on their community.

 

Health and Climate Impacts

Ok so it’s not the prettiest visualization you’ve ever seen.

To my astonishment– we placed 3rd in the competition. In a weekend full of surprises, this was certainly a happy one. I chalk it up in large part to having a working prototype. Sure it wasn’t perfect, and maybe other people’s ideas were more impressive, but we at least had a small tool that you could click on and interact with.

In the end, I found the weekend pretty fun and I did learn a lot about going from an idea to a working product. I also gained more confidence in building a quick and dirty Shiny app (code is here if you’re interested).

Personal Belief Exemptions in California

I am a confessed health nerd, and have been working on immunization projects for several years, so it’s only natural that one of my first data projects was around vaccination stats.

In Fall of 2014, I took an excellent introduction to R through the Berkeley Extension school. I know several academics who use R in their research, and though I had only been briefly exposed to it in grad school, I was interested in really learning it. So I enrolled in the course, a little unsure about what I was going to find. Ultimately I think I really benefited from having a well-organized overview of working with data programmatically, and it was the first time I was introduced to a few key computer science concepts. The instructor was even kind enough to coach me through writing my first for loop.

The course culminated in a final project, in which we were to clean and analyze a dataset of our choosing and generate a few data visualizations with it. I knew that the state of California had an open data portal with some health-related data sets and I was intrigued to find school-level data on immunization rates and personal belief exemptions for Kindergarten and 7th- grade students.

Personal belief exemptions (PBE) occur when a parent requests exemption from the immunization requirement for school entry because all or some immunizations are contrary to the parent’s belief. These exemptions are pretty controversial, and have almost certainly contributed to outbreaks of vaccine preventable diseases. In June of 2015, California eliminated these non-medical exemptions, turning California from one of the most lenient to one of the strictest states in enforcing vaccine requirements for school entry.

You can see my code here and download the datasets from the California Department of Public Health.

My main findings for this project included the fact that there is a statistically significant difference in PBE rates between public and private schools in California, and between small vs. large schools (though both are low in the aggregate– more on that later). These are probably two sides of the same coin, as private schools tend to be smaller than public schools.

CountyRates

At the time, I was interested in finding which counties had the highest PBE rates in their school systems. While one usually hears about high PBE rates in private schools in LA or Marin County, it was interesting that the counties with the highest PBE rates are mostly sparsely-populated counties in northern California. Looking back on it now, those are also counties with fewer schools (and students) overall, so it’s probably not the best method of assessing risk. Overlaying PBE rates onto a map of California would probably be a more interesting way to visually represent this data, but alas that was beyond my skills at the time. Maybe I’ll find the time soon to go back and do that.

There are also major caveats with relying on school records for assessing immunization rates– are those students actually unvaccinated, or are there just incomplete records at the school?

I’ve since read  and contributed to other research on this issue (that used much more advanced methods than here). Immunization rates often have extremely local consequences and aggregating across counties is not necessarily that helpful. Unvaccinated/undervaccinated communities are often quite small but concentrated. While the rate in a given county may seem low, there are often specific schools with shockingly high PBE rates. That results in pretty high risk for people in those schools with many unvaccinated students (who are mostly interacting with other at-risk, unvaccinated individuals).

While this project certainly wasn’t anything groundbreaking and I’m sure there are much better statistical approaches to the dataset, it was a nice way to mingle my first forays into data science and analysis with my interest in public health. Also can I just say that I love ggplot? Trying to plot things in Python makes me really appreciate how much simpler I found plotting in R.