Developing with Phymes
Setting Up Your Build Environment
Install the Rust tool chain:
https://www.rust-lang.org/tools/install
An example bash script for installing the Rust tool chain for Linux is the following:
apt update
DEBIAN_FRONTEND=noninteractive apt install --assume-yes git clang curl libssl-dev llvm libudev-dev make pkg-config protobuf-compiler
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
rustup toolchain install stable --target x86_64-unknown-linux-gnu
rustup default stable
rustc --version
Also, make sure your Rust tool chain is up-to-date, because we always use the latest stable version of Rust to test this project.
rustup update stable
Setting up GPU acceleration with CUDA
Install CUDA for linux:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
GPU acceleration with CUDA is currently only support for Linux (including WSL2) at this time. An example bash script for installing CUDA for WSL2 is the following:
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cuda-toolkit-12-6
Please replace the repo and cuda versions accordingly. Check the Cuda installation
nvcc --version
nvidia-smi --query-gpu=compute_cap --format=csv
Install CuDNN backend for linux:
https://docs.nvidia.com/deeplearning/cudnn/installation/latest/linux.html
An example bash script for install CuDNN for Linux is the following:
wget https://developer.download.nvidia.com/compute/cudnn/9.5.1/local_installers/cudnn-local-repo-ubuntu2404-9.5.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.5.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.5.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cudnn
Please replace the repo and cuda versions accordingly.
Setting up NVIDIA NIMs for local deployment
Obtain an NGC API key following the instructions.
Install the NVIDIA Container Toolkit following the instructions
Check that the installation was successful by running the following:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
The NGC catalogue can be viewed using NGC CLI
. Install NGC
following the instructions
Alternatively, the NGC catalogue can be viewed online. For example, the open-source Llama3.2 model can be deployed locally following the instructions, and alternatively accessed via the NVIDIA NIMs API if available (see the NIMs LLM [API](NIMs LLM API https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html) for OpenAPI schema).
Setting up WASM build environment
Add the following wasm32 compilation targets from the nightly Rust toolchain:
rustup update nightly
rustup target add wasm32-unknown-unknown --toolchain nightly
rustup target add wasm32-wasip2 --toolchain nightly
In addition, we recommend using wasmtime for running wasi components
curl https://wasmtime.dev/install.sh -sSf | bash
Setting up Dioxus
The front-end application is built using dioxus to enable creating web, desktop, and mobile applications using Rust
cargo install dioxus-cli
How to compile
This is a standard cargo project with workspaces. To build the different workspaces, you need to have rust
and cargo
and you will need to specify workspaces using the using the -p
, --project
flag:
cargo build -p phymes-core
CPU, GPU, and WASM-specific compilation features are gated behind feature flags wsl
, wsl-gpu
, and wasip2
respectively. The use of embedded Candle or OpenAI API token services are gated behind the feature flag candle
, which indicates to use embedded candle models.
The following will build the phymes-agents
workspace with different configurations of CPU and GPU acceleration for Tensor and Token services:
# Native CPU for tensor operations and local/remote OpenAI API token services
cargo build -p phymes-agents --features wsl --release
# Native CPU for tensor operations and embedded Candle for token services
cargo build -p phymes-agents --features wsl,candle --release
# GPU support for tensor operations and local/remote OpenAI API token services
cargo build -p phymes-agents --features wsl-gpu --release
# GPU support for tensor operations and embedded Candle for token services
cargo build -p phymes-agents --features wsl-gpu,candle --release
Please ensure that all CUDA related environmental variables are setup correctly for GPU acceleration. Most errors related to missing CUDA or CuDNN libraries are related to missing environmental variables particularly on WSL2.
export PATH=$PATH:/usr/local/cuda/bin:/usr/lib/x86_64-linux-gnu/
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs
The following will build the phymes-agents workspace as a WASIp2 component:
cargo build -p phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release
Mixing and matching features that are compilation target specific and compilation targets will result in build errors.
You can also use rust's official docker image:
docker run --rm -v $(pwd):/phymes -it rust /bin/bash -c "cd /phymes && rustup component add rustfmt && cargo build -p phymes-core"
From here on, this is a pure Rust project and cargo
can be used to run tests, benchmarks, docs and examples as usual.
Setting up the cache for running tests and examples
Many of the tests (and examples if running without the GPU or on WASM) depend upon a local cache of model assets to run. The following bash script can be used to prepare the local assets:
# ensure your home environmental variable is set
echo $HOME
# make the cache directory
mkdir -p $HOME/.cache/hf
# copy over the cache files from the root of the GitHub repository
cp -a .cache/hf/. $HOME/.cache/hf/
# download the model assets manually from HuggingFace
curl -L -o $HOME/.cache/hf/models--sentence-transformers--all-MiniLM-L6-v2/model.safetensors https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/model.safetensors?download=true -sSf
curl -L -o $HOME/.cache/hf/models--sentence-transformers--all-MiniLM-L6-v2/pytorch_model.bin https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/pytorch_model.bin?download=true -sSf
curl -L -o $HOME/.cache/hf/models--Qwen--Qwen2-0.5B-Instruct/qwen2.5-0.5b-instruct-q4_0.gguf https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf?download=true -sSf
curl -L -o $HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/smollm2-135m-instruct-q4_k_m.gguf https://huggingface.co/Segilmez06/SmolLM2-135M-Instruct-Q4_K_M-GGUF/resolve/main/smollm2-135m-instruct-q4_k_m.gguf?download=true -sSf
curl -L -o $HOME/.cache/hf/models--Alibaba-NLP--gte-Qwen2-1.5B-instruct/gte-Qwen2-1.5B-instruct-Q4_K_M.gguf https://huggingface.co/tensorblock/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-Q4_K_M.gguf?download=true -sSf
Setting up local OpenAI API endpoints
Instead of using token credits with remote OpenAI API endpoints, it is possible to run the tests and examples locally using self-hosted open-source NVIDIA NIMs. Modify the following code depending upon the model(s) to be locally deployed:
# Text Generation Inference with Llama 3.2 (terminal 1)
export NGC_API_KEY=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
export LOCAL_NIM_CACHE=$HOME/.cache/nim
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8000:8000 nvcr.io/nim/meta/llama-3.2-1b-instruct:1.8.6
# Text Embedding Inference with Llama 3.2 (terminal 2)
export NGC_API_KEY=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
export LOCAL_NIM_CACHE=$HOME/.cache/nim
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8001:8000 nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:latest
Note that the tests and examples assume that the local OpenAI API endpoints for NVIDIA NIMs are http://0.0.0.0:8000/v1
for Text Generation Inference (TGI, Chat) and http://0.0.0.0:8001/v1
for Text Embedding Inference (TEI, Embed), respectively. The defaults can be overwritten by setting the environmental variables for the TEI and TGI endpoints, respectively.
# URL of the local TGI NIMs deployment
export CHAT_API_URL=http://0.0.0.0:8000/v1
# URL of the local TEI NIMs deployment
export EMBED_API_URL=http://0.0.0.0:8001/v1
Also, be sure to add your NGC_API_KEY
to the environmental variables before running tests or examples in a different terminal.
# NVIDIA API Key
export NGC_API_KEY=nvapi-...
Running the tests
Run tests using the Rust standard cargo test
command:
# run all unit and integration tests with default features
cargo test
# run tests for the phymes-core crate with all features enabled
cargo test -p phymes-core --all-features
# run a specific test for the phymes-core crate with the wsl feature enabled
# and printing to the console
cargo test test_session_update_state -p phymes-core --features wsl -- --no-capture
# run the doc tests
cargo test --doc
You can find up-to-date information on the current CI tests in .github/workflows. The phymes-server, phymes-core, and phymes-agents crates have unit tests. Please note that many of the tests will in the phymes-agents crate do not run on the CPU due to the amount of time that it takes to run them. To run all tests in the phymes-agents create, either enable GPU acceleration with Candle using --features wsl-gpu,candle
feature flag, or with OpenAI API local/remote token services using --feature wsl,openai_api
or --feature wsl-gpu,openai_api
feature flags depending upon GPU availability.
# run tests for the phymes-core crate
cargo test --package phymes-core --features wsl --release
# run tests for the phymes-agents crate with GPU acceleration with Candle assets
cargo test --package phymes-agents --features wsl-gpu,candle --release
# or run tests for the phymes-agents crate on the CPU with OpenAI API token services
cargo test --package phymes-agents --features wsl --release
# run tests for the phymes-server crate
cargo test --package phymes-server --features wsl --release
The tests can also be ran for WASM components. However, the WASM debug output is essentially useless, so it is recommend to debug the tests natively before testing on WASM
# build tests for the phymes-core crate
cargo test --package phymes-core --target wasm32-wasip2 --no-default-features --features wasip2 --no-run
# build tests for the phymes-core crate using wasmtime
# be sure to replace the -26200b790e92721b with your systems unique hash
wasmtime run target/wasm32-wasip2/debug/deps/phymes-core-26200b790e92721b.wasm
# run tests for the phymes-agents crate with GPU acceleration
cargo test --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --no-run
# build tests for the phymes-agents crate using wasmtime
# be sure to replace the -9ce9c7c7142d7db7 with your systems unique hash
wasmtime --dir=$HOME/.cache/hf --env=HOME=$HOME target/wasm32-wasip2/debug/deps/phymes-agents-9ce9c7c7142d7db7.wasm
# run tests for the phymes-server crate
cargo test -p phymes-server --features wasip2-candle --no-default-features --target wasm32-wasip2 --no-run
# build tests for the phymes-server crate using wasmtime
# be sure to replace the -48a453bb50fd01da with your systems unique hash
wasmtime --dir=$HOME/.cache --env=HOME=$HOME target/wasm32-wasip2/debug/deps/phymes_server-48a453bb50fd01da.wasm
Running the examples
Run examples using the Rust standard cargo run
command. A few simple examples are provided for the phymes-core and phymes-agents crates to provide new users a starting point for building application using the crates
# run examples for the phymes-core crate
cargo run --package phymes-core --features wsl --release --example addrows
# run examples for the phymes-agents crate with GPU acceleration with Candle assets
cargo run --package phymes-agents --features wsl-gpu,candle --release --example chat -- --candle-asset SmoLM2-135M-chat
cargo run --package phymes-agents --features wsl-gpu,candle --release --example chatagent
# or run examples for the phymes-agents crate on the CPU with OpenAI API token services
cargo run --package phymes-agents --features wsl --release --example chat -- --openai-asset Llama-3.2-1b-instruct
cargo run --package phymes-agents --features wsl --release --example chatagent
The examples can also be ran using WASM. However, all assets needed to run the example need to be provided locally unlike native where we can rely on the HuggingFace API to download and cache models for us. The following bash script can be used to build the examples in wasm and run the examples using wasmtime:
# build examples for the phymes-core crate
cargo build --package phymes-core --target wasm32-wasip2 --no-default-features --features wasip2 --release --example addrows
# run the examples for the phymes-core crate
wasmtime run target/wasm32-wasip2/release/examples/addrows.wasm
# build the chat example for the phymes-agents crate
cargo build --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release --example chat
# run the chat example for the phymes-agents crate
wasmtime --dir="$HOME/.cache/hf" --env=HOME=$HOME target/wasm32-wasip2/release/examples/chat.wasm --weights-config-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/config.json" --weights-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/smollm2-135m-instruct-q4_k_m.gguf" --tokenizer-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/tokenizer.json" --tokenizer-config-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/tokenizer_config.json" --candle-asset "SmoLM2-135M-chat"
# build the chatagent example for the phymes-agents crate
cargo build --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release --example chatagent
# run the chatagent example for the phymes-agents crate
wasmtime --dir="$HOME/.cache/hf" --env=HOME=$HOME target/wasm32-wasip2/release/examples/chatagent.wasm
Clippy lints
We use clippy
for checking lints during development, and CI runs clippy
checks.
Run the following to check for clippy
lints:
cargo clippy -p phymes-core --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-agents --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-server --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-app --tests --examples -- -D warnings
If you use Visual Studio Code with the rust-analyzer
plugin, you can enable clippy
to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.
One of the concerns with clippy
is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a clippy
lint, you may disable the lint and briefly justify it.
Search for allow(clippy::
in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.
- If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
- If you have several lints on a function or module, you may disable the lint on the function or module.
- If a lint is pervasive across multiple modules, you may disable it at the crate level.
Rustfmt Formatting
We use rustfmt
for formatting during development, and CI runs rustfmt
checks.
Run the following to check for rustfmt
changes (before submitting a PR!):
cargo fmt --all -- --check
The individual workspaces can then be formatted using rustfmt
:
cargo fmt -p phymes-core --all
cargo fmt -p phymes-agents --all
cargo fmt -p phymes-server --all
cargo fmt -p phymes-app --all
Rustdocs and mdBook for documentation
We use doc
for API documentation hosted on crates.io and mdBook for the guide and tutorial static website with a mermaid preprocessor mdbook-mermaid is used for generating mermaid diagrams hosted on GitHub Pages.
Run the following to create the API documentation using doc
:
cargo doc --document-private-items --no-deps -p phymes-core --features wsl
cargo doc --document-private-items --no-deps -p phymes-agents --features wsl
cargo doc --document-private-items --no-deps -p phymes-server --features wsl
cargo doc --document-private-items --no-deps -p phymes-app
Please visit the mdBook guide for installation and usage instructions. Also, please visit mdbook-mermaid for installation instructions. Run the following to create the the guide and tutorials using mdBook
:
mdbook build phymes-book
Running Benchmarks
In progress...
Running benchmarks are a good way to test the performance of a change. As benchmarks usually take a long time to run, we recommend running targeted tests instead of the full suite.
# run all benchmarks
cargo bench
# run phymes-core benchmarks
cargo bench -p phymes-core
# run benchmark for the add_rows function within the phymes-core crate
cargo bench -p phymes-core --bench add_rows
To set the baseline for your benchmarks, use the --save-baseline flag:
git checkout main
cargo bench -p phymes-core --bench add_rows -- --save-baseline main
git checkout feature
cargo bench -p phymes-core --bench add_rows -- --baseline main
Running the CI locally
Continuous integration and deployment are orchestrated using GitHub actions on each pull request (PR) to the main
branch. Unfortunately, debugging the CI/CD can be quite difficult and time consuming, so we recommend testing locally using a self-hosted runner. Since caching is not supported with act
, alternative GitHub Action files for downloading resources is provided in the .github.act
folder.
First, follow the instructions for downloading, configuring, and using the self-hosted runner.
Second, be sure to change runs-on: ubuntu-latest
to runs-on: self-hosted
in the YAML for all workflow files for each job.
Third, Run the actions-runner. Now, when you open a PR, the CI will run locally on your machine.