Collection of datasets used in Vega and Vega-Lite examples. This data lives at https://github.com/vega/vega-datasets and https://cdn.jsdelivr.net/npm/vega-datasets ...
RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text documents coming from 84 CommonCrawl snapshots and processed using the CCNet pipeline. Out of ...
This powerful new capability eliminates the need for teams to toggle between multiple applications, streamlines access to up-to-date information, and ensures that teams are working from a unified data ...