Analytics work needs to be discoverable. Let’s talk how to get there [and how a doc workspace like Hyperquery can help].

Back when I was a data scientist, I spent a substantial amount of time doing product analytics work — opportunity sizing, experiment deep dives, ad-hoc checks. But although I worked across a wide range of tools — Jupyter/Python, tidyverse, superset, internal tools, even Java UDFs — the bulk of this…

And why your data science team needs it.

🗃️ First, how Airbnb’s data discovery tool changed my life.

In my career, I’ve been fortunate enough to work on some fun problems: I studied the mathematics of rivers during my Ph.D. at MIT, worked on uplift models and open-sourced pylift at Wayfair, and implemented novel homepage targeting models & CUPED improvements at Airbnb. But in all of this work…

Notes from Industry

The way we share our analytics work is terrible. Let’s talk about what we can do to make this better and reduce your ad-hoc overload.

“Hey, quick question — I know you’re busy but can you re-run this analysis when you have a sec? And if you could also help us dig up queries for these old decisions as well that’d be great.” (Image by author)

You are an analytics machine. You take in coffee and output actionable insights. But you have one kryptonite: the infamous ad-hoc request.

“Hey can you quickly pull these numbers again…”
“Hey where was that analysis that you made…”
“Hey do you have the query for this?”
“Hey is this the table you used…

Why analytics work should often prioritize discoverability and reproducibility, not version control and code review.

[Image from Freepik]

Process is critical to scaling an org, and we’ve gotten the processes for analytics wrong.

A critical aspect of scaling organizations is process. Process allows you to normalize and incorporate best practices to ensure things work smoothly and scalably even when no one is minding the controls. But process in analytics organizations is something that is frequently overlooked, and too often we default to the…

Office Hours

Here are some practical steps you can take to get there.

Look at this poor, confused executive trying to make decisions without analytics support. Mountains? Clocks? Chess? What? 🤔 [image from freepik]

“We are a data-driven company.”

I hear you groaning at your screen already. Let’s talk about what this phrase means these days, what we as data scientists and analysts really want it to mean, and how to bridge that gap. Read this before you respond to your next ad-hoc request.

😤 What does “data-driven” mean these days, and why is it so dangerous?

Queries need context unavailable in IDEs. [You’ll have a better time writing in hyperquery.]

This looks like an IDE, yeah? So, not this. [Image from Freepik]

I’m just going to say it: the traditional IDE format is not great for writing queries for analytics work. I’ll start by explaining why, then tell you what you can do about it.

First, my explanation — there are two things that you really need to know when you’re writing…

Robert Yi

Chief Product Officer / Co-founder at Hyperquery. Formerly: ds @ Airbnb, Wayfair; Ph.D. @ MIT, physics @ Harvard. twitter.com/imrobertyi

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store