On how to use Spark the right way when using the bootstrap method

Image for post
Image for post
Photo by Joshua Chai on Unsplash

It’s more complicated than you think.

If you’re here, there’s a small chance that you’ve found my article on one of my social media pages. More likely, you found out how hard it was to compute bootstrap, and since you have access to spark, you thought, “why not use it to increase this to warp speed” (or some other less nerdy concept of fast)? So, you’ve googled it.

Well, there are a few obstacles, but it’s possible. Let’s discuss what are the steps of bootstrapping and how not to naively use spark while calculating it.

The Naive Approach

First, the basics. I assume…

Image for post
Image for post
C’est ne pas un pipe

Por muitos anos, o pacote caret (Classification And REgression Training) foi a principal referência em Machine Learning para aqueles que, rebeldes, continuam a tomar o lado da linguagem R na disputa mais perene da comunidade de ciência de dados.

Agora, um novo passo se forma desse lado das trincheiras, com o lançamento em abril do site tidymodels.org. O tidymodels busca ser uma evolução do Caret, propondo-se como um framework de diferentes pacotes para modelagem de machine learning que, crucialmente, aderem aos princípios do tidyverse.

O que diabos é Tidyverse?

Brevemente, para os não iniciados, o Tidyverse é um conjunto de pacotes para ciência de dados…

Celso M. Cantanhede Silva

Data Scientist and Analyst

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store