I’ve officially halfway completed the data science immersive (DSI) program at Galvanize. I actually had the past week off from classes as a chance to review and solidify what we’ve learned so far and also to start thinking about what I’d like to do for my capstone project. The capstone project serves as (a start to) a data science portfolio. We will eventually present our projects to recruiters from various tech companies, so it’s really a chance to show off what I’ve learned over the course of the DSI.
I’m mulling over a few different ideas– ideally, I’d love to do a health-themed project as that’s a topic I’m both interested in and have subject matter expertise in. Unfortunately, finding good datasets can be a challenge. I’d previously been advised to get experience working with patient-level or claims data, but obviously there are fewer open sources of patient data since it’s a pretty clear privacy issue. CMS (Centers for Medicare & Medicaid Services) does have some limited, de-identified datasets available, so perhaps I can find an interesting question to answer based on that data. It’s funny– it feels like a bit of a backwards approach (starting with a dataset instead of a question) but it’s reasonable, given our time constraints. We have about two and a half weeks to construct our projects, so you can’t get too hung up on constructing unique datasets.