Entering the Project Phase

We’ve finished up the structured curriculum at galvanize and are now working on our capstone projects. I had a bit of flip flopping in picking a project topic. While I’m (obviously) really interested in healthcare applications of data science, there are fewer open data sources available. I poked around a few available datasets but ultimately decided on a non-healthcare project that I think will challenge me more and give me more opportunity to show off my newly-developed skills. Also it’s food-related, which is probably right up there with healthcare on my interests list.

table set for brunch with waffles on plate

Yum!

For the past week, I worked on building a cross-domain recommender system that will allow a user to input one of their favorite restaurants in SF and receive a recipe they might enjoy. So far, I have a simple model that will compute the similarity between the given restaurant’s menu and all the recipes in my database, based on text analysis of the menu’s item descriptions and the recipes ingredient list. I also have a Flask app up and running (only on my local machine so far, so no links to it yet but soon!) I’m pretty happy with my progress this week, as I still have the upcoming week to play around with the model. I’d like to try some more sophisticated text mining techniques that will hopefully result in better recommendations.

If you’re interested in building simple web apps or websites, I definitely recommend checking out Bootstrap templates. See Start Bootstrap for some free, downloadable templates. With just a tiny bit of HTML and CSS knowledge, you can customize the templates and make your site look really slick. It took me maybe an hour to go from a barebones, text-only website to a nicely formatted, image-rich design. Can’t wait to share the finished product!

Halfway there!

I’ve officially halfway completed the data science immersive (DSI) program at Galvanize. I actually had the past week off from classes as a chance to review and solidify what we’ve learned so far and also to start thinking about what I’d like to do for my capstone project. The capstone project serves as (a start to) a data science portfolio. We will eventually present our projects to recruiters from various tech companies, so it’s really a chance to show off what I’ve learned over the course of the DSI.

portfolio

I’m mulling over a few different ideas– ideally, I’d love to do a health-themed project as that’s a topic I’m both interested in and have subject matter expertise in. Unfortunately, finding good datasets can be a challenge. I’d previously been advised to get experience working with patient-level or claims data, but obviously there are fewer open sources of patient data since it’s a pretty clear privacy issue. CMS (Centers for Medicare & Medicaid Services) does have some limited, de-identified datasets available, so perhaps I can find an interesting question to answer based on that data. It’s funny– it feels like a bit of a backwards approach (starting with a dataset instead of a question) but it’s reasonable, given our time constraints. We have about two and a half weeks to construct our projects, so you can’t get too hung up on constructing unique datasets.