How we used Bayesian models to balance customer experience and courier earnings at Glovo

Javier Mas Adell ’17 (Data Science)

Neon sign depicts Bayes' Theorem

Glovo is a three-sided marketplace composed of couriers, customers, and partners. Balancing the interests of all sides of our platform is at the core of most strategic decisions taken at Glovo. To balance those interests optimally, we need to understand quantitatively the relationship between the main KPIs that represent the interests of each side.

I recently published an article on Glovo’s Engineering blog where I explain how we used Bayesian modeling to help us tackle the modeling problems we were facing due to the inherent heterogeneity and volatility of Glovo’s operations. The example in the article talks about balancing interests on two of the three sides of our marketplace: the customer experience and courier earnings.

The skillset I developed during the Barcelona GSE Master’s in Data Science is what’s enabled me to do work like this that requires knowledge of machine learning and other fields like Bayesian statistics and optimization.

Connect with the author

portrait

Javier Mas Adell ’17 is Lead Data Scientist at Kannact. He is an alum of the Barcelona GSE Master’s in Data Science.

Tackling domestic violence using large-scale empirical analysis

New paper in Journal of Empirical Legal Studies co-authored by Ria Ivandić ’13 (Economics)

A woman holds a sign in front of her face that reads, "Love shouldn't hurt."
Photo by Anete Lusina from Pexels

In England, domestic violence accounts for one-third of all assaults involving injury. A crucial part of tackling this abuse is risk assessment – determining what level of danger someone may be in so that they can receive the appropriate help as quickly as possible. It also helps to set priorities for police resources in responding to domestic abuse calls in times when their resources are severely constrained. In this research, we asked how we can improve on existing risk assessment, a research question that arose from discussions with policy makers who questioned the lack of systematic evidence on this.

Currently, the risk assessment is done through a standardised list of questions – the so-called DASH form (Domestic Abuse, Stalking and Harassment and Honour- Based Violence) – which consists of 27 questions that are used to categorise a case as standard, medium or high risk. The resulting DASH risk scores have limited power in predicting which cases will result in violence in the future.  Following this research, we suggest that a two-part procedure would do better both in prioritising calls for service and in providing protective resources to victims with the greatest need. 

In our predictive models, we use individual-level records on domestic abuse calls, crimes, victim and perpetrator data from the Greater Manchester Police to construct the criminal and domestic abuse history variables of the victim and perpetrator. We combine this with DASH questionnaire data in order to forecast reported violent recidivism for victim-perpetrator pairs.  Our predictive models are random forests, which are a machine-learning method consisting of a large number of classification trees that individually classify each observation as a predicted failure or non-failure. Importantly, we take the different costs of misclassification into account.  Predicting no recidivism when it actually happens (a false negative) is far worse in terms of social costs than predicting recidivism when it does not happen (a false positive). While we set the cost of incurring a false negative versus a false positive at 10:1, this is a parameter that can be adjusted by stakeholders. 

We show that machine-learning methods are far more effective at assessing which victims of domestic violence are most at risk of further abuse than conventional risk assessments. The random forest model based on the criminal history variables together with the DASH responses significantly outperforms the models based on DASH alone. The negative prediction error – that is, the share of cases that would be predicted not to have violence yet violence occurs in the future – is low at 6.3% as compared with an officer’s DASH risk score alone where the negative prediction error is 11.5%.  We also examine how much each feature contributes to the model performance. There is no single feature that clearly outranks all others in importance, but it is the combination of a wide variety of predictors, each contributing their own ‘insight’, which makes the model so powerful.

Following this research, we have been in discussion with police forces across the United Kingdom and policy makers working on the Domestic Abuse Bill to think how our findings could be incorporated in the response to domestic abuse. We hope this research acts as a building block to increasing the use of administrative datasets and empirical analysis to improve domestic violence prevention.

This post is based on the following article:

Grogger, J., Gupta, S., Ivandic, R. and Kirchmaier, T. (2021), Comparing Conventional and Machine-Learning Approaches to Risk Assessment in Domestic Abuse Cases. Journal of Empirical Legal Studies, 18: 90-130. https://doi.org/10.1111/jels.12276 

Media coverage

Connect with the author

Ria Ivandić ’13 is a Researcher at LSE’s Centre for Economic Performance (CEP). She is an alum of the Barcelona GSE Master’s in Economics.

Machine Learning for the Sustainable Management of Main Water Supply Assets

Maryam Rahbaralam ’19 (Data Science)

big data

Maryam Rahbaralam ’19 (Data Science) presented “Machine Learning for the Sustainable Management of Main Water Supply Assets” with Jaume Cardús (Aigües de Barcelona) during the Pioneering Fields and Applications (Strong AI) session at the 2019 Big Data and AI Congress in Barcelona.

Abstract

The developed machine learning model gives the prediction of the probability of failure for each pipe section of the water supply network, allowing an early renewal of those in more detrimental conditions in terms of social, environmental and economic consequences.

Video

Maryam Rahbaralam ’19 is a Data Scientist at the Barcelona Supercomputing Center (BSC). She is an alum of the Barcelona GSE Master’s in Data Science.

LinkedIn | Twitter

Using H20 for competitive data science

Reposted from H2o


In this special H2O guest blog post, Gaston Besanson and Tim Kreienkamp talk about their experience using H2O for competitive data science. They are both students in the new Master of Data Science Program at the Barcelona Graduate School of Economics and used H2O in an in-class Kaggle competition for their Machine Learning class. Gaston’s team came in second, scoring 0.92838 in overall accuracy, slightly surpassed by Tim’s team with 0.92964, on a subset of the famous “Forest Cover” dataset.

What is your background prior to this challenge?

Tim: We both are students in the Master of Data Science at the Graduate School of Economics in Barcelona. I come from a business background. I took part in a few Kaggle challenges before, but didn’t have a formal machine learning background before this class.

Gaston: I have a mixed background in Economics, Finance and Law. With no prior experience on Kaggle or Machine Learning other than Andrew Ng’s online course :).

Could you give a brief introduction to the dataset and the challenges associated with it?

Tim: The good thing about this dataset is that it is relatively “clean” (no missing values etc) and small (7 mb of training data). This allows for fast iteration and testing out a couple of different methods and hunches relatively quickly (relatively – a classmate of ours spent $300 on AWS trying to train support vector machines). The main challenge I see in the multiclass nature – this always makes it harder as basically one has to train 7 models (due to the one-vs-all nature of multiclass classification).

Gaston: Yes, this dataset is a classic on Kaggle: Forest Cover Type Prediction. Which, as Tim said and adding to it, there are 7 types of trees and 54 features (10 quantitative variables, like Elevation, and 44 binary variables: 4 binary wilderness areas and 40 binary soil type variables). What come to our attention was the highly unbalanced that was the dataset. Class 1 and 2 represented 80% of the training data.

What feature engineering and preprocessing techniques did you use?

Gaston: Our team added an extra layer to this competition that was to predict as best as possible the type of tree in a region with the purpose of minimizing the fires. Even though we used the same loss for each type of misclassification – in other words, all trees are equally important -, we decided to create new features. We created six new variables to try to identify features important to fire risk. And, we applied a normalization on both the training and the test sets to the 60 features.

Tim: We included some difference and interaction terms. However, we didn’t scale the numerical features or use any unsupervised dimension reduction techniques. I briefly tried to do supervised feature learning with H2O Deep Learning – it gave me really impressive results in cross-validation, but broke down on the test set.

Editor’s note: L1/L2/Dropout regularization or fewer neurons can help avoid overfitting

Which supervised learning algorithms did you try and to what success?

Tim: I tried H2O’s implementation of Gradient Boosting, Random Forest, Deep Learning (MLP with stochastic gradient descent), and the standard R implementation of SVM and k-NN. k-NN performed poorly, so did SVM – Deep Learning overfit, as I already mentioned. The tree based methods both performed very well in our initial tests. We finally settled for Random Forest, since it gave the best results and was faster to train than Gradient Boosting.

Gaston: We tried KNN, SVM, Random Forest all from different packages, with not that great results. And finally we used H2O’s implementation of GBM – we ended up using this model because it introduces a lot of freedom into the model design. The model we used had the following attributes: Number of trees: 250; Maximum Depth: 18; Minimum Rows: 10; Shrinkage: 0.1.

What feature selection techniques did you try?

Tim: We didn’t try anything fancy (like LASSO) for this challenge. Instead, we decided to take advantage of the fact that random forests can compute feature importances. I used this to code my own recursive elimination procedure. At each iteration, a random forest was trained and cross-validated (ten fold). The feature importances are computed, the worst two features are discarded, and the next iteration begins with the remaining features. The resulting cross validation errors at each stage made up a nice “textbook-like” curve, where the error first decreased with fewer features and at the end made a sharp increase again. We then chose the set of features that gave the second-best cross validation error, to not overfit by feature selection.

Gaston: Actually, we did not do any feature selection other than removing the variables that did have a variance, which if I am not mistaken was one in the original dataset (before feature creation). Neither turns the binary variables into one categorical (one for wilderness areas and one for soil type). We had a naïve approach of sticking with the story of fire risk no matter what; maybe next time we will change the approach.

Why did you use H2O and what were the major benefits?

Tim: We were constrained by our teachers in the sense that we could only use R – that forced me out of my scikit-learn comfort zone. So I looked for something as accurate and fast. As an occasional Kaggler, I am familiar with Arno’s forum post, and so I decided to give H2O a shot – and I didn’t regret it at all. Apart from the nice R interface, the major benefit is the strong parallelization – this way we were able to make the most of our AWS academic grants.

Gaston: I came across H2O just by searching the web and reading about alternatives within R possibilities after the GBM package proved really untestable. Just to add to what Tim said, I think H2O will be my weapon of choice in the near future.

For a more detailed description of the methods used and results obtained, see the report of Gaston’s and Tim’s teams.