Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion () objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, a zero memory copy policy, and lazy computations for best performance (no memory wasted).

Why vaex

Performance: works with huge tabular data, processes rows/second
Lazy / Virtual columns: compute on the fly, without wasting ram
Memory efficient no memory copies when doing filtering/selections/subsets.
Visualization: directly supported, a one-liner is often enough.
User friendly API: you will only need to deal with the DataFrame object, and tab completion + docstring will help you out: ds.mean<tab>, feels very similar to Pandas.
Lean: separated into multiple packages
- vaex-core: DataFrame and core algorithms, takes numpy arrays as input columns.
- vaex-hdf5: Provides memory mapped numpy arrays to a DataFrame.
- vaex-viz: Visualization based on matplotlib.
- vaex-jupyter: Interactive visualization based on Jupyter widgets / ipywidgets, bqplot, ipyvolume and ipyleaflet.
- vaex-astro: Astronomy related transformations and FITS file support.
- vaex-server: Provides a server to access a DataFrame remotely.
- vaex-distributed: (Deprecated) Now part of vaex-enterprise.
- vaex: Meta package that installs all of the above.
Jupyter integration: vaex-jupyter will give you interactive visualization and selection in the Jupyter notebook and Jupyter lab.

Installation

Using conda:

conda install -c conda-forge vaex

Using pip:

pip install --upgrade vaex

Getting started

We assume that you have installed vaex, and are running a Jupyter notebook server. We start by importing vaex and asking it to give us an example dataset.

Instead, you can download some larger datasets, or read in your csv file.

Untitled

Using square brackets[] <api.rst#vaex.dataframe.DataFrame.__getitem__>__, we can easily filter or get different views on the DataFrame.

df_negative = df[df.x < 0]  # easily filter your DataFrame, without making a copy
df_negative[:5][['x', 'y']]  # take the first five rows, and only the 'x' and 'y' column (no memory copy!)