Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid up to a billion () objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, a zero memory copy policy, and lazy computations for best performance (no memory wasted).
ds.mean<tab>
, feels very similar to Pandas.vaex-core
: DataFrame and core algorithms, takes numpy arrays as input columns.vaex-hdf5
: Provides memory mapped numpy arrays to a DataFrame.vaex-viz
: Visualization based on matplotlib.vaex-jupyter
: Interactive visualization based on Jupyter widgets / ipywidgets, bqplot, ipyvolume and ipyleaflet.vaex-astro
: Astronomy related transformations and FITS file support.vaex-server
: Provides a server to access a DataFrame remotely.vaex-distributed
: (Deprecated) Now part of vaex-enterprise.vaex
: Meta package that installs all of the above.Using conda:
conda install -c conda-forge vaex
Using pip:
pip install --upgrade vaex
We assume that you have installed vaex, and are running a Jupyter notebook server. We start by importing vaex and asking it to give us an example dataset.
Instead, you can download some larger datasets, or read in your csv file.
Using square brackets[] <api.rst#vaex.dataframe.DataFrame.__getitem__>
__, we can easily filter or get different views on the DataFrame.
df_negative = df[df.x < 0] # easily filter your DataFrame, without making a copy
df_negative[:5][['x', 'y']] # take the first five rows, and only the 'x' and 'y' column (no memory copy!)