Flexible piping in python with Pipey

Why you should be piping and how to do it.

Robert Yi
3 min readJan 29, 2020

--

What is piping?

We’ve all had to write python code with heavy nesting, like this:

print(abs(sum([1,2,-4])))

Piping (with Pipey) lets you write this using a pipe operator >> as follows:

[1,2,-4] >> Sum >> Abs >> Print

This syntactic sugar is called piping, and it allows you to pass the output of one command as the input of the next without nesting functions inside functions inside functions. It’s not natively supported in Python, so we wrote a library to support it.

Piping orders commands so they follow the flow of logic, making code substantially more readable and declarative. Piping is one of the best things about bash:

ls | grep "filename" > tmp.txt

And it’s also one of the best things about R:

data %>% transform(a = b/c) %>% head(100)

While some libraries already exist that enable this sort of declarative syntactic sugar (pipe, dfply, dplython), we weren’t always satisfied with the syntax choices (while | is the natural choice to parallel bash piping, it is too common of an operator to function in a general framework) and limited functionality (they often only support single arguments, or particular input types, such as pandas dataframes). So we decided to write our own, with careful and intentional choices so that it might function as an unopinionated, flexible, pythonic framework.

Should you be piping? (pros/cons)

As a general programming paradigm, piping has one somewhat substantial pitfall — it’s hard to debug. Your 8 lines of code are now condensed into 1, so when your traceback points to that line, it can be unclear what’s broken. Granted, the python traceback deals with this pretty well, so it’s generally not a huge problem.

And nonetheless, we’d argue that this drawback is outweighed by a level of concision and maintainability it lends to heavily chained workflows. In particular, for data work, which is often done in development environments which contain hidden states (such as jupyter or R markdown), reproducibility of code can be difficult to consistently achieve. Piping mitigates this danger by (1) enforcing a consistent order of operations, and (2) disallowing hidden states.

Consequently, the piping paradigm is naturally reproducible, production-ready, and stable as soon as it is written.

How to pipe

Pipey is just a framework, meant to make it really easy to make things that can be piped. To start, let’s install pipey:

pip install pipey

Then, import pipey’s Pipeable object:

from pipey import Pipeable

Now let’s revisit the example given at the beginning of this article:

print(abs(sum([1,2,-4])))

You can define piping-compatible versions of these functions w/Pipeable:

Sum = Pipeable(sum)
Abs = Pipeable(abs)
Print = Pipeable(print)

Allowing you to rewrite your nested commands simply:

[1,2,3] >> Sum >> Abs >> Print

Easy, right?

Alternatively, you might want to make this entire sequence more reusable, and roll the whole thing up into a function, in which case you can use Pipeable as a decorator.

@Pipeable
def SumAbsPrint(arraylike):
return arraylike >> Abs >> Sum >> Print

which could then be invoked with

[1,2,3] >> SumAbsPrint

Args and kwargs also work out of the box — simply pass them to the function on the receiving end of the pipe:

@Pipeable
def Func(a, b=2, c=3):
pass
[1,2,3] >> Func(b=0, c=0)

For a full overview of Pipey’s functionality, see the docs on the repo page. That’s all for now. Happy piping!

--

--

Robert Yi
Robert Yi

Written by Robert Yi

Chief Product Officer, Hyperquery (hyperquery.ai). Former ds @ Airbnb, Wayfair; Ph.D. @ MIT, physics @ Harvard. twitter.com/imrobertyi Also at think.ryi.me

Responses (5)