Use Python and JAX to efficiently build infinitely wide networks for deeper network insights on your finite machine

What happens when your neural networks stretch into infinity. Image: ESA/Hubble & NASA,

One notorious problem with deep learning and deep neural networks (DNNs) is, that they can become black boxes. Lets say, that we have fitted a network with good test performance on a given classification problem. However we are now stuck. We cannot make sense of the final weights that have been learned or adequately visualize the problem space!
Another issue arises in the real world. In practical applications of applying neural networks we often fall back to train ensembles of networks. We use the averaged output of many models. This can be more powerful than the output of one single network…

Efficiently exploring the parameter-search through Bayesian Optimization with skopt in Python. TL;DR: my hyperparameters are always better than yours.

Explore vast canyons of the problem space efficiently — Photo by Fineas Anton on Unsplash

In this post, we will build a machine learning pipeline using multiple optimizers and use the power of Bayesian Optimization to arrive at the most optimal configuration for all our parameters. All we need is the sklearn Pipeline and Skopt.
You can use your favorite ML models, as long as they have a sklearn wrapper (looking at you XGBoost or NGBoost).

About Hyperparameters

The critical point for finding the best models that can solve a problem are not just the models. We need to find the optimal parameters to make our model work optimally, given the dataset. This is called finding or…

A simple, yet meaningful probabilistic Pyro model to uncover change-points over time.

Solitude. A photo by Sasha Freemind on Unsplash

One profound claim and observations by the media is, that the rate of suicides for younger people in the UK have risen from the 1980s to the 2000s. You might find it generally on the news , in publications or it is just an accepted truth by the population. But how can you make this measurable?

Making an assumption tangible

In order to make this claim testable we look for data and find an overview of the suicide rates, specifically England and Wales, at the Office for National Statistics (UK) together with an overall visualization.

Generally, one type of essential questions to ask…

Make the best of missing data the Bayesian way. Improve model performance and make comparative benchmarks using Monte Carlo methods.

A Missing frame ready to throw you off your model. Photo by Vilmos Heim on Unsplash.

The Ugly Data

How do you handle missing data, gaps in your data-frames or noisy parameters?
You have spent hours at work, in the lab or in the wild to generate or curate a dataset given an interesting research question or hypothesis. Terribly enough, you find that some of the measurements for a parameter are missing!
Another case that might throw you off is unexpected noise that was introduced at some point in the experiment and has doomed some of your measurements to be extreme outliers. …

Build better Data Science workflows with probabilistic programming languages and counter the shortcomings of classical ML.

The tools to build, train and tune your probabilistic models. Photo by Patryk Grądys on Unsplash.

We should always aim to create better Data Science workflows.
But in order to achieve that we should find out what is lacking.

Classical ML workflows are missing something

Classical Machine Learning is pipelines work great. The usual workflow looks like this:

  1. Have a use-case or research question with a potential hypothesis,
  2. build and curate a dataset that relates to the use-case or research question,
  3. build a model,
  4. train and validate the model,
  5. maybe even cross-validate, while grid-searching hyper-parameters,
  6. test the fitted model,
  7. deploy the model for the use-case,
  8. answer the research question or hypothesis you posed.

As you might have noticed, one severe shortcoming is…

Modeling U.S. cancer-death rates with two Bayesian approaches: MCMC in STAN and SVI in Pyro.

Modeling death-rates across U.S. counties — Photo by Joey Csunyo on Unsplash

Single parameter models are an excellent way to get started with the topic of probabilistic modeling. These models comprise of one parameter that influences our observation and which we can infer from the given data. In this article we look at the performance and compare two well established frameworks — the statistical language STAN and the Pyro Probabilistic Programming Language (PPL).

Kidney Cancer Data

One old and established dataset is the cases of kidney cancer in the U.S. from 1980–1989, which is available here (see [1]). Given are U.S. counties, their total population and the cases of reported cancer-deaths.
Our task is to infer…

The way from data novice to professional

Clock with reverse numeral, Jewish Town-hall Clock, Prague - by Richard Michael (all rights reserved).

There exists the idea that practicing something for over 10000 h (ten-thousand-hours) lets you acquire enough proficiency with the subject. The concept is based on the book Outliers by M. Gladwell. The mentioned 10k hours are how much time you spend practicing or studying a subject until you have a firm grasp and can be called proficient. Though this amount of hours is somewhat arbitrary, we will take a look on how those many hours can be spent to gain proficiency in the field of Data Science.

Imagine this as a learning budget in your Data-apprenticeship journey. If I were…

One reason why Bayesian Modeling works with real world data. The approximate light-house in the sea of randomness.

Photo by William Bout on Unsplash

When you want to gain more insights into your data you rely on programming frameworks that allow you to interact with probabilities. All you have in the beginning is a collection of data-points. It is just a glimpse into the underlying distribution from which your data comes. However, you not only want simple data-points in the end. What you want is elaborate, talkative density distributions with which you can perform tests. For this, you use probabilistic frameworks like TensorFlow Probability, Pyro or STAN to compute posteriors of probabilities.
As we will see, the computation of this is not always feasible and…

The answer to: “Why is my model running forever?” or the classic: “I think it might have converged?”

The driver behind a lot of models that the average Data Scientist or ML-engineer uses daily relies on numerical optimization methods. Studying the optimization and performance of different functions helps to gain a better understanding of how the process works.
The challenge we face on a daily basis is that someone gives us a model of how they think the world or their problem works. Now, you as a Data Scientist have to find the optimal solution to the problem. For example, you look at an energy-function and want to find the absolute, global minimum for your tool to work or…

Connect the dots over time and forecast with confidence(-intervals).

Connecting dots across time — photo by israel palacio on Unsplash

Have you ever wondered how to account for uncertainties in time-series forecasts?
Have you ever thought there should be a way to generate data-points from previously seen data and make judgement calls about certainties? I know I have.
If you want to build models that capture probabilities and hold confidences we recommend using a probabilistic programming framework like Pyro.
In a previous article we have looked at NGBoosting and have applied it to the M5 forecasting challenge on Kaggle. As a quick recap — the M5 forecasting challenge asks us to predict how the sales of Walmart items will develop over time. It…

Richard Michael

I am a Data Scientist and M.Sc. student in Bioinformatics at the University of Copenhagen. You can find more content on my weekly blog

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store