Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Developing with Phymes

Setting Up Your Build Environment

Install the Rust tool chain:

https://www.rust-lang.org/tools/install

An example bash script for installing the Rust tool chain for Linux is the following:

apt update
DEBIAN_FRONTEND=noninteractive apt install --assume-yes git clang curl libssl-dev llvm libudev-dev make pkg-config protobuf-compiler
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
rustup toolchain install stable --target x86_64-unknown-linux-gnu
rustup default stable
rustc --version

Also, make sure your Rust tool chain is up-to-date, because we always use the latest stable version of Rust to test this project.

rustup update stable

Setting up GPU acceleration with CUDA

Install CUDA for linux:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

GPU acceleration with CUDA is currently only support for Linux (including WSL2) at this time. An example bash script for installing CUDA for WSL2 is the following:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cuda-toolkit-12-6

Please replace the repo and cuda versions accordingly. Check the Cuda installation

nvcc --version
nvidia-smi --query-gpu=compute_cap --format=csv

Install CuDNN backend for linux:

https://docs.nvidia.com/deeplearning/cudnn/installation/latest/linux.html

An example bash script for install CuDNN for Linux is the following:

wget https://developer.download.nvidia.com/compute/cudnn/9.5.1/local_installers/cudnn-local-repo-ubuntu2404-9.5.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.5.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.5.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cudnn

Please replace the repo and cuda versions accordingly.

Setting up NVIDIA NIMs for local deployment

Obtain an NGC API key following the instructions.

Install the NVIDIA Container Toolkit following the instructions

Check that the installation was successful by running the following:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

The NGC catalogue can be viewed using NGC CLI. Install NGC following the instructions

Alternatively, the NGC catalogue can be viewed online. For example, the open-source Llama3.2 model can be deployed locally following the instructions, and alternatively accessed via the NVIDIA NIMs API if available (see the NIMs LLM [API](NIMs LLM API https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html) for OpenAPI schema).

Setting up WASM build environment

Add the following wasm32 compilation targets from the nightly Rust toolchain:

rustup update nightly
rustup target add wasm32-unknown-unknown --toolchain nightly
rustup target add wasm32-wasip2 --toolchain nightly

In addition, we recommend using wasmtime for running wasi components

curl https://wasmtime.dev/install.sh -sSf | bash

Setting up Dioxus

The front-end application is built using dioxus to enable creating web, desktop, and mobile applications using Rust

cargo install dioxus-cli

How to compile

This is a standard cargo project with workspaces. To build the different workspaces, you need to have rust and cargo and you will need to specify workspaces using the using the -p, --project flag:

cargo build -p phymes-core

CPU, GPU, and WASM-specific compilation features are gated behind feature flags wsl, wsl-gpu, and wasip2 respectively. The use of embedded Candle or OpenAI API token services are gated behind the feature flag candle, which indicates to use embedded candle models.

The following will build the phymes-agents workspace with different configurations of CPU and GPU acceleration for Tensor and Token services:

# Native CPU for tensor operations and local/remote OpenAI API token services
cargo build -p phymes-agents --features wsl --release

# Native CPU for tensor operations and embedded Candle for token services
cargo build -p phymes-agents --features wsl,candle --release

# GPU support for tensor operations and local/remote OpenAI API token services
cargo build -p phymes-agents --features wsl-gpu --release

# GPU support for tensor operations and embedded Candle for token services
cargo build -p phymes-agents --features wsl-gpu,candle --release

Please ensure that all CUDA related environmental variables are setup correctly for GPU acceleration. Most errors related to missing CUDA or CuDNN libraries are related to missing environmental variables particularly on WSL2.

export PATH=$PATH:/usr/local/cuda/bin:/usr/lib/x86_64-linux-gnu/
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs

The following will build the phymes-agents workspace as a WASIp2 component:

cargo build -p phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release

Mixing and matching features that are compilation target specific and compilation targets will result in build errors.

You can also use rust's official docker image:

docker run --rm -v $(pwd):/phymes -it rust /bin/bash -c "cd /phymes && rustup component add rustfmt && cargo build -p phymes-core"

From here on, this is a pure Rust project and cargo can be used to run tests, benchmarks, docs and examples as usual.

Setting up the cache for running tests and examples

Many of the tests (and examples if running without the GPU or on WASM) depend upon a local cache of model assets to run. The following bash script can be used to prepare the local assets:

# ensure your home environmental variable is set
echo $HOME

# make the cache directory
mkdir -p $HOME/.cache/hf

# copy over the cache files from the root of the GitHub repository
cp -a .cache/hf/. $HOME/.cache/hf/

# download the model assets manually from HuggingFace
curl -L -o $HOME/.cache/hf/models--sentence-transformers--all-MiniLM-L6-v2/model.safetensors  https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/model.safetensors?download=true -sSf
curl -L -o $HOME/.cache/hf/models--sentence-transformers--all-MiniLM-L6-v2/pytorch_model.bin  https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/pytorch_model.bin?download=true -sSf
curl -L -o $HOME/.cache/hf/models--Qwen--Qwen2-0.5B-Instruct/qwen2.5-0.5b-instruct-q4_0.gguf  https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf?download=true -sSf
curl -L -o $HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/smollm2-135m-instruct-q4_k_m.gguf  https://huggingface.co/Segilmez06/SmolLM2-135M-Instruct-Q4_K_M-GGUF/resolve/main/smollm2-135m-instruct-q4_k_m.gguf?download=true -sSf
curl -L -o $HOME/.cache/hf/models--Alibaba-NLP--gte-Qwen2-1.5B-instruct/gte-Qwen2-1.5B-instruct-Q4_K_M.gguf  https://huggingface.co/tensorblock/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-Q4_K_M.gguf?download=true -sSf

Setting up local OpenAI API endpoints

Instead of using token credits with remote OpenAI API endpoints, it is possible to run the tests and examples locally using self-hosted open-source NVIDIA NIMs. Modify the following code depending upon the model(s) to be locally deployed:

# Text Generation Inference with Llama 3.2 (terminal 1)
export NGC_API_KEY=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
export LOCAL_NIM_CACHE=$HOME/.cache/nim
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8000:8000 nvcr.io/nim/meta/llama-3.2-1b-instruct:1.8.6

# Text Embedding Inference with Llama 3.2 (terminal 2)
export NGC_API_KEY=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
export LOCAL_NIM_CACHE=$HOME/.cache/nim
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8001:8000 nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:latest

Note that the tests and examples assume that the local OpenAI API endpoints for NVIDIA NIMs are http://0.0.0.0:8000/v1 for Text Generation Inference (TGI, Chat) and http://0.0.0.0:8001/v1 for Text Embedding Inference (TEI, Embed), respectively. The defaults can be overwritten by setting the environmental variables for the TEI and TGI endpoints, respectively.

# URL of the local TGI NIMs deployment
export CHAT_API_URL=http://0.0.0.0:8000/v1

# URL of the local TEI NIMs deployment
export EMBED_API_URL=http://0.0.0.0:8001/v1

Also, be sure to add your NGC_API_KEY to the environmental variables before running tests or examples in a different terminal.

# NVIDIA API Key
export NGC_API_KEY=nvapi-...

Running the tests

Run tests using the Rust standard cargo test command:

# run all unit and integration tests with default features
cargo test

# run tests for the phymes-core crate with all features enabled
cargo test -p phymes-core --all-features

# run a specific test for the phymes-core crate with the wsl feature enabled
# and printing to the console
cargo test test_session_update_state -p phymes-core --features wsl -- --no-capture

# run the doc tests
cargo test --doc

You can find up-to-date information on the current CI tests in .github/workflows. The phymes-server, phymes-core, and phymes-agents crates have unit tests. Please note that many of the tests will in the phymes-agents crate do not run on the CPU due to the amount of time that it takes to run them. To run all tests in the phymes-agents create, either enable GPU acceleration with Candle using --features wsl-gpu,candle feature flag, or with OpenAI API local/remote token services using --feature wsl,openai_api or --feature wsl-gpu,openai_api feature flags depending upon GPU availability.

# run tests for the phymes-core crate
cargo test --package phymes-core --features wsl --release

# run tests for the phymes-agents crate with GPU acceleration with Candle assets
cargo test --package phymes-agents --features wsl-gpu,candle --release
# or run tests for the phymes-agents crate on the CPU with OpenAI API token services
cargo test --package phymes-agents --features wsl --release

# run tests for the phymes-server crate
cargo test --package phymes-server --features wsl --release

The tests can also be ran for WASM components. However, the WASM debug output is essentially useless, so it is recommend to debug the tests natively before testing on WASM

# build tests for the phymes-core crate
cargo test --package phymes-core --target wasm32-wasip2 --no-default-features --features wasip2 --no-run

# build tests for the phymes-core crate using wasmtime
# be sure to replace the -26200b790e92721b with your systems unique hash
wasmtime run target/wasm32-wasip2/debug/deps/phymes-core-26200b790e92721b.wasm

# run tests for the phymes-agents crate with GPU acceleration
cargo test --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --no-run

# build tests for the phymes-agents crate using wasmtime
# be sure to replace the -9ce9c7c7142d7db7 with your systems unique hash
wasmtime --dir=$HOME/.cache/hf --env=HOME=$HOME target/wasm32-wasip2/debug/deps/phymes-agents-9ce9c7c7142d7db7.wasm

# run tests for the phymes-server crate
cargo test -p phymes-server --features wasip2-candle --no-default-features --target wasm32-wasip2 --no-run

# build tests for the phymes-server crate using wasmtime
# be sure to replace the -48a453bb50fd01da with your systems unique hash
wasmtime --dir=$HOME/.cache --env=HOME=$HOME target/wasm32-wasip2/debug/deps/phymes_server-48a453bb50fd01da.wasm

Running the examples

Run examples using the Rust standard cargo run command. A few simple examples are provided for the phymes-core and phymes-agents crates to provide new users a starting point for building application using the crates

# run examples for the phymes-core crate
cargo run --package phymes-core --features wsl --release --example addrows

# run examples for the phymes-agents crate with GPU acceleration with Candle assets
cargo run --package phymes-agents --features wsl-gpu,candle --release --example chat -- --candle-asset SmoLM2-135M-chat
cargo run --package phymes-agents --features wsl-gpu,candle --release --example chatagent

# or run examples for the phymes-agents crate on the CPU with OpenAI API token services
cargo run --package phymes-agents --features wsl --release --example chat -- --openai-asset Llama-3.2-1b-instruct
cargo run --package phymes-agents --features wsl --release --example chatagent

The examples can also be ran using WASM. However, all assets needed to run the example need to be provided locally unlike native where we can rely on the HuggingFace API to download and cache models for us. The following bash script can be used to build the examples in wasm and run the examples using wasmtime:

# build examples for the phymes-core crate
cargo build --package phymes-core --target wasm32-wasip2 --no-default-features --features wasip2 --release --example addrows

# run the examples for the phymes-core crate
wasmtime run target/wasm32-wasip2/release/examples/addrows.wasm

# build the chat example for the phymes-agents crate
cargo build --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release --example chat

# run the chat example for the phymes-agents crate
wasmtime --dir="$HOME/.cache/hf" --env=HOME=$HOME target/wasm32-wasip2/release/examples/chat.wasm --weights-config-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/config.json" --weights-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/smollm2-135m-instruct-q4_k_m.gguf" --tokenizer-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/tokenizer.json" --tokenizer-config-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/tokenizer_config.json" --candle-asset "SmoLM2-135M-chat"

# build the chatagent example for the phymes-agents crate
cargo build --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release --example chatagent

# run the chatagent example for the phymes-agents crate
wasmtime --dir="$HOME/.cache/hf" --env=HOME=$HOME target/wasm32-wasip2/release/examples/chatagent.wasm

Clippy lints

We use clippy for checking lints during development, and CI runs clippy checks.

Run the following to check for clippy lints:

cargo clippy -p phymes-core --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-agents --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-server --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-app --tests --examples -- -D warnings

If you use Visual Studio Code with the rust-analyzer plugin, you can enable clippy to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.

One of the concerns with clippy is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a clippy lint, you may disable the lint and briefly justify it.

Search for allow(clippy:: in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.

  • If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
  • If you have several lints on a function or module, you may disable the lint on the function or module.
  • If a lint is pervasive across multiple modules, you may disable it at the crate level.

Rustfmt Formatting

We use rustfmt for formatting during development, and CI runs rustfmt checks.

Run the following to check for rustfmt changes (before submitting a PR!):

cargo fmt --all -- --check

The individual workspaces can then be formatted using rustfmt:

cargo fmt -p phymes-core --all
cargo fmt -p phymes-agents --all
cargo fmt -p phymes-server --all
cargo fmt -p phymes-app --all

Rustdocs and mdBook for documentation

We use doc for API documentation hosted on crates.io and mdBook for the guide and tutorial static website with a mermaid preprocessor mdbook-mermaid is used for generating mermaid diagrams hosted on GitHub Pages.

Run the following to create the API documentation using doc:

cargo doc --document-private-items --no-deps -p phymes-core --features wsl
cargo doc --document-private-items --no-deps -p phymes-agents --features wsl
cargo doc --document-private-items --no-deps -p phymes-server --features wsl
cargo doc --document-private-items --no-deps -p phymes-app

Please visit the mdBook guide for installation and usage instructions. Also, please visit mdbook-mermaid for installation instructions. Run the following to create the the guide and tutorials using mdBook:

mdbook build phymes-book

Running Benchmarks

In progress...

Running benchmarks are a good way to test the performance of a change. As benchmarks usually take a long time to run, we recommend running targeted tests instead of the full suite.

# run all benchmarks
cargo bench

# run phymes-core benchmarks
cargo bench -p phymes-core

# run benchmark for the add_rows function within the phymes-core crate
cargo bench -p phymes-core --bench add_rows

To set the baseline for your benchmarks, use the --save-baseline flag:

git checkout main

cargo bench -p phymes-core --bench add_rows -- --save-baseline main

git checkout feature

cargo bench -p phymes-core --bench add_rows -- --baseline main

Running the CI locally

Continuous integration and deployment are orchestrated using GitHub actions on each pull request (PR) to the main branch. Unfortunately, debugging the CI/CD can be quite difficult and time consuming, so we recommend testing locally using a self-hosted runner. Since caching is not supported with act, alternative GitHub Action files for downloading resources is provided in the .github.act folder.

First, follow the instructions for downloading, configuring, and using the self-hosted runner.

Second, be sure to change runs-on: ubuntu-latest to runs-on: self-hosted in the YAML for all workflow files for each job.

Third, Run the actions-runner. Now, when you open a PR, the CI will run locally on your machine.