My overall approach to coding, including data science and machine learning.

Programming languages

Mainly Python, although I’ve also played with ‣ and am curious about Julia.

Programming IDE

Visual Studio Code, in my opinion, doesn't even seem to have proper competition at the moment. It's just so good 😍 Even for notebooks, with its IntelliSense autocomplete, extensions and proper debugging, it's a more productive and practical option than JupyterLab. With this and the option to use scripts as notebooks, it almost solves all of Joel Grus' famous complaints (I don't like notebooks).

I have a paid ‣ subscription as I think that it speeds up my coding, particularly in solving small coding tasks, commenting and making docstrings.

More on my IDE setup here: VS Code for data science

Data science frameworks

For most stuff, I use Pandas. When I have larger datasets, I should opt for either:

Machine learning

For deep learning models, PyTorch. For smaller and faster ML models, scikit-learn and XGBoost. For NLP, Hugging Face and spaCy.

TensorFlow can be a valid alternative to PyTorch. Jax also seems to be gaining steam, as well as some Julia packages.

Comet is great for logging model training experiments and performing hyperparameter tuning.

SHAP is my go-to in terms of model interpretability (by the way, I adapted it to RNN-type models, as you can see in my article Interpreting recurrent neural networks on multivariate time series and in my Master's Thesis Presentation).

RaySGD (which is part of Ray) is a great tool to do efficient distributed training. PyTorch Lightning is a good alternative.

GenAI