My First Hackathon

When I was first getting started with programming and data science, one piece of advice I heard repeatedly was to attend a hackathon. Initially, I was pretty skeptical– I pictured staying up all night fueled on Red Bull vying to win a prize. This was unappealing because 1.) I’m not overly competitive, and 2.) I like to sleep. But when I found a hackathon weekend focused on “civic hacking,” or using publicly available data for civic good, I figured it was worth a shot.

coffee mug and laptop

I also pictured coffee. Lots of coffee

And so I found myself one weekend in June at the 2015 SF Day of Civic Hacking at SF State. One of the projects pitched was around the health impacts of climate change, which appealed to me. My hope for the day was to pair with a more experienced Python developer and hopefully glean some knowledge from them. Well, instead I found myself in a group with a few other people who were interested in the topic, but none of us were particularly confident in our programming skills. I was the most familiar with R and Shiny apps, so I became the de facto software developer– definitely not what I had expected!

We hoped to build a visualization for policy makers that compared climate change to changes in community health, which would ideally increase concern about global warming. We had a lot of ideas about how we could so, or what our visualization might look like. However, when we got down to it, finding relevant data (and cleaning it) took a lot longer than any of us anticipated.

Our final product by the end of the weekend was a simple R Shiny app that showed changing average temperatures and number of West Nile Virus cases for each county in California.As a caveat, this visualization is primarily hypothesis-generating, as a simple correlation certainly shouldn’t imply a causal link. We initially hoped to include more measures of health than WNV incidence, but we easily found the data for WNV so included that in the prototype.  I also think the tool might be more useful if it were more granular (i.e. more local than county), which might help it ‘come alive’ to people that climate change is having an effect on their community.

 

Health and Climate Impacts

Ok so it’s not the prettiest visualization you’ve ever seen.

To my astonishment– we placed 3rd in the competition. In a weekend full of surprises, this was certainly a happy one. I chalk it up in large part to having a working prototype. Sure it wasn’t perfect, and maybe other people’s ideas were more impressive, but we at least had a small tool that you could click on and interact with.

In the end, I found the weekend pretty fun and I did learn a lot about going from an idea to a working product. I also gained more confidence in building a quick and dirty Shiny app (code is here if you’re interested).

Personal Belief Exemptions in California

I am a confessed health nerd, and have been working on immunization projects for several years, so it’s only natural that one of my first data projects was around vaccination stats.

In Fall of 2014, I took an excellent introduction to R through the Berkeley Extension school. I know several academics who use R in their research, and though I had only been briefly exposed to it in grad school, I was interested in really learning it. So I enrolled in the course, a little unsure about what I was going to find. Ultimately I think I really benefited from having a well-organized overview of working with data programmatically, and it was the first time I was introduced to a few key computer science concepts. The instructor was even kind enough to coach me through writing my first for loop.

The course culminated in a final project, in which we were to clean and analyze a dataset of our choosing and generate a few data visualizations with it. I knew that the state of California had an open data portal with some health-related data sets and I was intrigued to find school-level data on immunization rates and personal belief exemptions for Kindergarten and 7th- grade students.

Personal belief exemptions (PBE) occur when a parent requests exemption from the immunization requirement for school entry because all or some immunizations are contrary to the parent’s belief. These exemptions are pretty controversial, and have almost certainly contributed to outbreaks of vaccine preventable diseases. In June of 2015, California eliminated these non-medical exemptions, turning California from one of the most lenient to one of the strictest states in enforcing vaccine requirements for school entry.

You can see my code here and download the datasets from the California Department of Public Health.

My main findings for this project included the fact that there is a statistically significant difference in PBE rates between public and private schools in California, and between small vs. large schools (though both are low in the aggregate– more on that later). These are probably two sides of the same coin, as private schools tend to be smaller than public schools.

CountyRates

At the time, I was interested in finding which counties had the highest PBE rates in their school systems. While one usually hears about high PBE rates in private schools in LA or Marin County, it was interesting that the counties with the highest PBE rates are mostly sparsely-populated counties in northern California. Looking back on it now, those are also counties with fewer schools (and students) overall, so it’s probably not the best method of assessing risk. Overlaying PBE rates onto a map of California would probably be a more interesting way to visually represent this data, but alas that was beyond my skills at the time. Maybe I’ll find the time soon to go back and do that.

There are also major caveats with relying on school records for assessing immunization rates– are those students actually unvaccinated, or are there just incomplete records at the school?

I’ve since read  and contributed to other research on this issue (that used much more advanced methods than here). Immunization rates often have extremely local consequences and aggregating across counties is not necessarily that helpful. Unvaccinated/undervaccinated communities are often quite small but concentrated. While the rate in a given county may seem low, there are often specific schools with shockingly high PBE rates. That results in pretty high risk for people in those schools with many unvaccinated students (who are mostly interacting with other at-risk, unvaccinated individuals).

While this project certainly wasn’t anything groundbreaking and I’m sure there are much better statistical approaches to the dataset, it was a nice way to mingle my first forays into data science and analysis with my interest in public health. Also can I just say that I love ggplot? Trying to plot things in Python makes me really appreciate how much simpler I found plotting in R.

 

Bring it on, 2016!

Welcome to the first post in my blogging experiment!

I can tell 2016 is going to be a big year for me. I have a lot of upcoming changes in my life, which is good (if a tiny bit scary). So with all that’s going to happen, why I am also trying to start blogging? That’s a good question. I’m not sure I have a better answer than I want to. I want to document my thoughts, reactions, trials, and successes as I make a career transition.

So to kick it all off, more about me, where I’m coming from and where I’m heading:

Yosemite Falls

Yosemite Falls

I’m originally from the midwest, but spent 5 years on the east coast after college before moving to northern California three years ago. Now I call the Bay Area home. I miss the east coast sometimes, but the beauty (and mild weather) of California has certainly grown on me.

I am a total health nerd. I have a Master’s degree in Public Health from Boston University and have been working in public health research for the last several years. I work for a group that uses informal, online data sources to better detect infectious diseases. My pet project is focused on immunization and making it easier for people to get their flu shots. It’s pretty fascinating stuff, and over time I’ve become more and more interested in the technical side of it all. I anticipate I will continue to love (and likely write about) health tech and all the exciting possibilities in that field.

As such, I’ve spent the last year or so learning more and more about data science. I’ve been learning to program (dabbling in R, now focusing on Python), brushing up on stats classes, and ever so gently dipping my toes into machine learning. I was swimming along this way, until the end of 2015 when I decided to take the plunge: I applied to a couple data science immersive programs (AKA bootcamps). I’m so excited to announce that I will be joining the Galvanize Data Science Feb ’16 cohort on their SF SOMA campus.

So now I’m steeling myself for the intensive whirlwind that I’m sure the program will prove to be, as I gain a solid foundation of core data science skills in just 13 short weeks. When researching the program, I read through a couple former student’s blogs and found it really useful to hear their first-person experiences. Let’s see where it takes me!