Ognjen Regoje bio photo

Ognjen Regoje
But you can call me Oggy

I make things that run on the web (mostly).
More ABOUT me and my PROJECTS.

me@ognjen.io LinkedIn

Challenges when starting data analysis as a first-time founder

#analysis #data #stats #supplybunny

One of the first challenges a first time-founder encounters is having to do data analysis.

In the course of building Supplybunny we tried to make decisions based on data as much as possible. I’ve had to first brush up on my statistics and then actually build the systems to calculate and present the data. I hope sharing the challenges we faced, and how we tackled them, would be useful to others in a similar position.

While my experience is from a startup environment I think other types of organizations would also benefit from it when starting initiatives.

Get a refresher on what different formulae mean

Before getting started, it’s important to get a refresher on the different formulae. Since statistics are an entire field of mathematics and there isn’t the time to take a full university course you need a crash course.

I learned a lot from a guide called “Statistics for Developers”. Google returns a lot of similar results, but I couldn’t track down the exact article I had read.

Coursera has lots of good ones as well.

For startups, I got a lot of value from a series of posts regarding due diligence at Social Capital by Jonathan Hsu

Not agreeing on the same definitions of numbers in advance

Very early on we noticed that occasionally we didn’t exactly agree on the definitions of the numbers that we wanted to track.

Some examples

  • “Average” is a loaded word that can have a few meanings. Arithmetic mean is the most common definition but not the most useful. We often found median to often be more indicative.
  • Secondly, the jargon you use can also be ambiguous. For instance “average order value” can be interpreted to mean the amount the customer paid, the amount the supplier receives or the amount Supplybunny receives.
  • Another example is what formula might be used when combining different groups. For instance, the average order for a group of suppliers might mean just an average of all orders, or an average of averages.

What to do about it

We got into the habit of saying the exact formula that we wanted and scrutinizing each term in the formula.

Then after the implementation I’d go through the actual code and say how it was calculated exactly to make sure it matches what we proposed originally.

This way not only did we agree on what we were tracking but we made sure that we used the correct formula.

Not collecting numbers on time

Before starting any initiative it’s important to have the data that paints a picture of the current situation. If you do not have a baseline, you cannot tell if your initiative is achieving the desired results.

But roadmaps are often short and change frequently, particularly in a startup. As a result we often couldn’t predict what numbers we’d need the following week or month.

Not collecting numbers on time might also mean that you can’t retroactively calculate them since the data point wasn’t collected in the first place.

For instance if the customer requests a quotation and then changes the quantity, unless you captured that change there’d be no way to tell what the original order was.

Similarly, if you didn’t capture who signed the supplier up in the first place it’d be very difficult to tell to which team member their orders “belong to”.

What to do about it

While building a feature I’d often add additional tracking that we didn’t need at the moment but that we might want to look at in the future.

Firstly, this was helpful because even if we could not get the exact numbers we needed, we might at least have some idea. Secondly, it also made me think about the business context and potential for future development.

We also implemented export to excel for a lot of things. That way the relevant users could calculate ad-hoc numbers that the stats dashboard did not have yet.

I also got quite good at SQL and calculating numbers on the fly.

And finally, if we really didn’t have the needed data we’d delay making the decision for a week and I’d immediately implement the needed tracking. We preferred to delay by a week rather then go in totally blind.

Not knowing what numbers to collect

After starting an initiative it’s sometimes difficult to tell what numbers would be good indicators of the performance of the initiative.

For instance, if you’re doing something to attract new suppliers the obvious number to track is the number of new suppliers. But that is not sufficient because it does not provide an indicator of the quality. You might be signing up a large number of suppliers most of whom don’t transact at all.

What to do about it

We started considering every initiative as a funnel. At every step of the funnel we tried to understand what is an indicator of success of the previous step.

For the example above, we considered the number of signups as one step. The completion of the profile as another. The addition of products as the next. And so an and so forth.

We would also consider orthogonal data. For instance, out of the suppliers that signed up, what is their breakdown by category. Out of the suppliers that didn’t complete their profile, what fields were left empty. Out of the suppliers that added their products, how many did they add.

Eventually, this became part of our process when working on something new. It’s worth nothing that it is a skill and that you will get better over time.

Not knowing what number would correlate the best

In a system that has at least some complexity a lot of numbers would be interrelated. It’s often difficult to tell beforehand how they affect each other and which ones are the best indicators.

For instance, if you’re measuring the success of a marketing campaign the sales of products that are not discounted might increase as well. Furthermore, since the number of complaints is often proportional to the number of orders you process, your complaint rates are going to increase as well.

The challenge then is to cut through the noise and figure out where those complaints are originating from.

What to do about it

Learning how different numbers are interrelated is part of learning and experience with the business. As you understand it more, you’ll get to understand the relationships between different parts better.

Learning this is best accomplished by regularly looking at your data and trying to find correlations. Then calculating how strong those correlations are.

Using too many tools

In the beginning we made use of several disconnected tools. Most often this was because we were attracted to a particular feature that that a tool had. We ended up with detailed tracking but with data that could not be reconciled easily into a single set.

For instance, we made use of Google Analytics to track marketing related information but Mixpanel for events for tracking A/B tests.

What to do about it

Eventually, we realized that most tools have similar feature sets. There was rarely, if ever, a single feature that made using a necessary.

Instead, we decided that we’d use fewer tools but we’d use each more completely.

This meant that we integrated with Google Analytics much better. We used events extensively, for A/B testing as well as marketing. We also integrated the ecommerce features. This made it much easier to get a complete picture of things we were trying to track.

In conclusion

In order to effectively get started on a data analysis project you should:

  1. Get a refresher course on statistics
  2. Agree on specific definitions
  3. Understand that data gathering needs to begin before development
  4. Track more than you think you need
  5. Consider the previous, next and orthogonal numbers
  6. Make full use of fewer tools
  7. Continuously analyze and learn

One, a former colleague, asked what are the main challenges people encountered when implementing data analytics that inspired this post.