Go to file

Dennis Wuitz b728592a7a fix huggingface datasets		2024-11-16 16:39:56 +01:00
.direnv	flake update	2024-11-15 11:59:52 +01:00
containers	add wvls command	2024-07-15 23:48:47 +02:00
docs	add README.md	2024-04-29 14:37:07 +02:00
modules	fixup pkgs 3	2024-10-27 22:49:57 +01:00
scripts	add Overlays	2024-07-25 22:30:48 +02:00
.envrc	add wvls command	2024-07-15 23:48:47 +02:00
.gitignore	add Overlays	2024-07-25 22:30:48 +02:00
flake.lock	flake update	2024-11-15 11:59:52 +01:00
flake.nix	flake update	2024-11-15 11:59:52 +01:00
mk-flake.nix	fix huggingface datasets	2024-11-16 16:39:56 +01:00
README.md	add multigpu support servers rent	2024-07-03 02:45:27 +02:00

README.md

Nix AI

Look through options here

Introducing Nix AI

Enhancing GPU Utilization and Reproducibility in Deep Learning Projects

Nix AI is a novel project built upon the Nix package manager, designed to optimize GPU utilization at runtime for deep learning tasks. By leveraging Nix's capabilities, Nix AI ensures efficient utilization of GPU resources, thereby enhancing the speed and efficency of ai development and training.

One of the key features of Nix AI is its integration with Hydra, a powerful job scheduling system. Hydra enables the efficient management of training queues, allowing for parallel execution of tasks and effective utilization of computational resources. This ensures that deep learning experiments are conducted in a timely and resource-efficient manner.

Furthermore, Nix AI emphasizes reproducibility through its declarative approach. By defining the entire computational environment—including software dependencies, configurations, and environment variables—in a declarative manner, Nix AI facilitates easy reproduction of experimental setups. This ensures that research findings can be validated and replicated with confidence, fostering scientific rigor and collaboration within the deep learning community.

In summary, Nix AI offers a robust solution for enhancing GPU utilization, optimizing job scheduling, and promoting reproducibility in deep learning projects. By combining the power of Nix and Hydra with a declarative approach, Nix AI provides researchers and practitioners with a reliable platform for advancing the state-of-the-art in artificial intelligence.

Here is a brief schema of the new training pipeline:

Key Features of Nix AI and Nix

Contextualizing Machine Learning on NixOS

Nix AI emerges as a specialized toolset tailored explicitly for NixOS, a Linux distribution distinguished by its declarative and reproducible package management system. By aligning with the Nix package manager, Nix AI endeavors to streamline the intricacies of installing, configuring, and deploying ML models within NixOS environments. This initiative addresses a critical need within the NixOS ecosystem, facilitating smoother integration and operationalization of ML workflows.

Operational Dynamics of Nix AI

Nix AI operates as an integrated software library and ecosystem meticulously engineered to facilitate the development of machine learning projects within NixOS environments. Fundamentally, Nix AI leverages the capabilities of the Nix package manager, a cornerstone of NixOS, to furnish a seamless and efficient platform for managing dependencies, environments, and workflows associated with ML endeavors.

Foundational Role of Nix Flakes in Reproducibility

At the heart of Nix AI lies the concept of Nix flakes, offering a declarative and reproducible mechanism for defining package dependencies and configurations. Leveraging Nix flakes, Nix AI ensures the reproducibility of ML projects across disparate environments, thereby fostering collaboration and experimentation sans concerns pertaining to inconsistent dependencies or divergent configurations.

Containerization for Isolation and Flexibility

When not using NixOS, Nix AI embraces containerization to afford isolated and reproducible environments for ML development and deployment. By encapsulating ML workflows within lightweight and portable containers, Nix AI facilitates seamless experimentation with diverse libraries, frameworks, and configurations, safeguarding system integrity and ensuring uniformity across the development lifecycle.

Automated Setup and Configuration

Nix AI streamlines the initialization and customization process for ML development via automated scripts and utilities. Whether provisioning Hydra clusters for distributed training or configuring build machines for continuous integration, Nix AI automates mundane tasks, enabling practitioners to channel their efforts towards model refinement and innovation.

Extensibility and Customization

Designed with modularity and extensibility in focus, Nix AI empowers practitioners to tailor ML environments to suit bespoke requirements. Whether integrating with external libraries, extending existing functionalities, or crafting custom workflows, Nix AI provides the flexibility and modularity requisite for addressing diverse use cases and evolving research imperatives.

Installation

By using Nix the Package Manager

Install Nix and Systemd Containers

apt install xz-utils systemd-container
sh <(curl -L https://nixos.org/nix/install) --daemon

Add/Change the following lines to the nix.conf file

experimental-features = nix-command flakes
substituters = https://attic.wavelens.io/main?priority=5&want-mass-query=true https://cache.nixos.org/
trusted-public-keys = main:3VVGDhOgY/x5hn7XIkVhqjEjHvOnU7o1cPlrWv91Mko= cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY=

Install Nix Extra Containers

git clone https://github.com/erikarvstedt/extra-container
extra-container/util/install.sh

Clone Nix AI

git clone https://git.wavelens.io/public/nix-ai
cd nix-ai
nix flake check

Setup Nix AI Containers for Hydra

nix run .#buildContainer_hydra -- create --start
machinectl shell hydra /bin/sh
su hydra-queue-runner
cd ~

Edit the .ssh/config file to include the following lines:

Host [BUILDER_IP]
  HostName [BUILDER_IP]
  User builder
  Port 65535
  IdentityFile ~/.ssh/id_builder

Create a new SSH key for the builder user:

mkdir ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/id_builder -C builder

After adding the public key to the authorized_keys file, you can now connect to the builder machine and then switch to the hydra user:

ssh [BUILDER_IP]
exit # exit the builder machine
exit # exit the hydra-queue-runner user
su hydra
cd ~

Create a new SSH key for the git server:

ssh-keygen -t ed25519 -f ~/.ssh/id_git -C git

Create ~/.ssh/config file with the following content:

Host [GIT SERVER DOMAIN]
  HostName [GIT SERVER DOMAIN]
  User git
  Port 22
  IdentityFile ~/.ssh/id_git

Copy the public key to use it as a deploy key for the git repository:

cat ~/.ssh/id_git.pub
exit # exit the hydra user

Create an Admin account for the Hydra:

hydra-create-user admin --password-prompt --role admin

Setup Nix AI Containers for a BuildMachine

nix run .#buildContainer_builder -- create --start
machinectl shell builder@builder /bin/sh

Add the public key to the authorized_keys file:

mkdir ~/.ssh
echo "[BUILDER PUBLIC KEY]" >> ~/.ssh/authorized_keys

Finalize by setting up the Nix AI environment

The Hydra should be running on http://[HYDRA_IP]:4444. You can access the Hydra interface by visiting this URL in your browser.

Download the Nix AI Template for your ML project:

git clone https://git.wavelens.io/public/nix-ai-template

You can now start developing your ML project using Nix AI!

By using NixOS the Linux Distribution

[Coming Soon]

Updating

By using Nix the Package Manager

Step 1: Backup the current Systemd Container /var/lib folder and the /etc/ssh folder

Step 2: Pull the latest changes from the Nix AI repository

Step 3: Run the following commands to update the Nix AI Containers

nix run .#buildContainer_hydra -- update

or if you want to update the BuildMachine

nix run .#buildContainer_builder -- update

Step 4: If necessary apply the Backup to the Systemd Container /var/lib folder and the /etc/ssh folder

You have successfully updated your Nix AI Containers!

By using NixOS the Linux Distribution

Run the following command to update Nix AI:

cd [NixOS Configuration Folder]
nix flake update
nixos-rebuild switch --flake .#

You have successfully updated your NixOS System!