Prepare your Data. Rinse. Repeat.

March 14, 2016 by Ken Kaczmarek

As everyone knows by now, data is the new black. Conventional wisdom says businesses will either be data-driven or be at a serious competitive disadvantage.

But, as everyone also knows by now, it takes significant effort to get raw data refined before you can extract its sweet analytic elixirs. is a productivity tool that falls squarely into this world of data preparation. However, data preparation is a pretty broad category itself — and can mean a lot of things to a lot of people.

So, we recently set out to create a ~1 minute explainer video to help highlight what brings to this data preparation party:

It’s a Mad, Mad, Mad, Mad World

Whether you call it data processing or data preparation or ETL, the basic workflow is the same:

  • Get the data
  • Change the data
  • Use the data

It’s a seemingly basic set of steps, although in practice data preparation gets hairy pretty quickly. To make this process more efficient, you can try approaching it in a variety of ways.

One way might be to focus on the ultimate goals, such as data mapping or data discovery or data cleaning or data enrichment and so on.

Another way would be look at the types of data and the users involved – say a librarian working on cleaning up a set of civic data or a data scientist making sense out of social media data or an IT person setting up a data feed for sending purchase order data to a supplier.

Go With the Flow

Another way to approach data preparation is to determine whether the project is a one-off effort with a single set of data (static) or something that is needed again and again (repeatable). This is where comes in.

As a bunch of data geeks, we certainly love data exploration and have done (more than!) our fair share of one-off data projects. In fact, because it’s such an important facet of the process, we’ve built exploratory functionality into the app itself (our “workspaces”). This is really useful for getting your hands dirty and prototyping processes interactively.

But — when you get down to it — the core of is the flow.

If you’re a team of folks with a bunch of data sets flying around and are having to repeatedly transform/combine/clean them, we aim to make this process a snap. As we see it, the best kind of data process is one that’s quick to set up and valuable enough to keep running again and again.

Got a need for repeatable data flows? Give us a shout; we’d be happy to help.