Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

PHYMES: Parallel HYpergraph MEssaging Streams

Introduction

🤔 What is PHYMES?

PHYMES (Parallel HYpergraph MEssaging Streams) is a subject-based message passing algorithm based on directed hypergraphs which provide the expressivity needed to model the heterogeneity and complexity of the real world. More details in the guide.

🤔 What can PHYMES do?

PHYMES can be used to build scalable Agentic AI workflows, (hyper)-graph algorithms, and world simulators. Examples for building a chat bot, a tool calling agent, and document RAG agent are provided using embedded token/tensor services or local/remote token/tensor services using OpenAI compatible APIs.

🤔 Why PHYMES?

🔐 written 100% in Rust for performance, safety, and security.
🌎 deployable on any platform (Linux, MacOs, Win, Android, and iOS) and in the browser (WebAssembly).
💪 scalable to massive data sets using columnar in memory format, parallel and stream native processing, and GPU acceleration.
🧩 interoperable with existing stacks by interfacing with cross-platform Arrow and WASM/WASI.
🔎 instrumented with tracing and customizable metrics to debug (hyper-)graph workflows faster.

🤔 Who and what inspired PHYMES?

PHYMES takes inspiration from real world networks including biological networks. The implementation of PHYMES takes inspiration from DataFusion, Pregel, and PyG.

🙏 PHYMES would not be possible if it were not for the amazing open-sources projects that it is built on top of including Arrow and Candle with full-stack support from Tokio, Dioxus, and [Wasmtime].

Repository

The phymes-core, phymes-agents, phymes-server, phymes-app crates form a full-stack application that can run Agentic AI workflows, (Hyper-)Graph algorithms, and/or Simulate complex real world networks at scale using a web, desktop, or mobile interface.

CrateDescriptionLatest API DocsREADME
phymes-coreCore hypergraph messaging functionalitydocs.rsREADME
phymes-agentsSupport for AI agents and GPU accelerated data analyticsdocs.rsREADME
phymes-serverServer that runs the Agentic AI hypergraph messaging servicesdocs.rsREADME
phymes-appFrontend UI for dynamically interacting with the Agentic AI hypergraph messaging servicesdocs.rsREADME

This book will introduce the core concepts behind phymes, tutorials for using the underlying libraries, and tutorials for building and running the full-stack application.

Getting started

This user guide will cover how to install the full-stack phymes application, develop with phymes, and deploy applications with phymes.

Installation of Phymes

Installation

Precompiled bundles for different Arch, OS, CUDA versions, and Token and Tensor services (e.g. for Agentic AI workflows) are provided on the releases page.

ArchOSCUDAToken service
x86_64-unknown-linux-gnuubuntu22.04, ubuntu24.0412.6.2, 12.9.1candle, openai_api
wasm32-wasip2n/an/acandle
wasm32-unknown-unknownn/an/acandle

Token services for agentic AI workflows can embedded in the application using candle or accessed locally e.g., self-hosted NVIDIA NIMs docker containers or remotely e.g., OpenAI, NVIDIA NIMs, etc. that adhere to the OpenAI API schema using openai_api. Tensor services are embedded in the application using candle with CPU vectorization and GPU acceleration support.

To install the phymes application, download the precompiled bundle that matches your system and needs, and unzip the bundle. Double click on phymes-server to start the server, then navigate to http://127.0.0.1:4000/ to view the web application.

Alternatively, you can make REST API requests against the server using e.g., curl.

# Sign-in and get our JWT token
curl -X POST -u EMAIL:PASSWORD http://localhost:4000/app/v1/sign_in
# mock response {"email":"EMAIL","jwt":"JWTTOKEN","session_plans":["Chat","DocChat","ToolChat"]}

# Chat request
# Be sure to replace EMAIL and JWTTOKEN with your actual email and JWT token!
# Note that the session_name = email + session_plan
curl -H "Content-Type: application/json" -H "Authorization: Bearer JWTTOKEN" -d '{"content": "Write a python function to count prime numbers", "session_name": "EMAILChat", "subject_name": "messages"}' http://localhost:4000/app/v1/chat

Before running the phymes-server, setup the environmental variables as needed to access the local or remote OpenAI API token service endpoints.

# OpenAI API Key
export OPENAI_API_KEY=sk-proj-...

# NVIDIA API Key
export NGC_API_KEY=nvapi-...

# URL of the local/remote TGI OpenAI or NIMs deployment
export CHAT_API_URL=http://0.0.0.0:8000/v1

# URL of the local/remote TEI OpenAI or NIMs deployment
export EMBED_API_URL=http://0.0.0.0:8001/v1

WASM builds of phymes-server can be ran as stateless functions for embedded application using wasmtime or serverless applications.

# Sign-in and get our JWT token
wastime phymes-server.wasm -- --route app/v1/sign_in --basic-auth EMAIL:PASSWORD
# mock response {"email":"EMAIL","jwt":"JWTTOKEN","session_plans":["Chat","DocChat","ToolChat"]}

# Chat request
# Be sure to replace EMAIL and JWTTOKEN with your actual email and JWT token!
# Note that the session_name = email + session_plan
wastime phymes-server.wasm curl -- --route app/v1/chat --bearer-auth JWTTOKEN --data '{"content": "Write a python function to count prime numbers", "session_name": "EMAILChat", "subject_name": "messages"}'

See [contributing] guide for detailed installation and build instructions.

Developing with Phymes

Setting Up Your Build Environment

Install the Rust tool chain:

https://www.rust-lang.org/tools/install

An example bash script for installing the Rust tool chain for Linux is the following:

apt update
DEBIAN_FRONTEND=noninteractive apt install --assume-yes git clang curl libssl-dev llvm libudev-dev make pkg-config protobuf-compiler
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
. "$HOME/.cargo/env"
rustup toolchain install stable --target x86_64-unknown-linux-gnu
rustup default stable
rustc --version

Also, make sure your Rust tool chain is up-to-date, because we always use the latest stable version of Rust to test this project.

rustup update stable

Setting up GPU acceleration with CUDA

Install CUDA for linux:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

GPU acceleration with CUDA is currently only support for Linux (including WSL2) at this time. An example bash script for installing CUDA for WSL2 is the following:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cuda-toolkit-12-6

Please replace the repo and cuda versions accordingly. Check the Cuda installation

nvcc --version
nvidia-smi --query-gpu=compute_cap --format=csv

Install CuDNN backend for linux:

https://docs.nvidia.com/deeplearning/cudnn/installation/latest/linux.html

An example bash script for install CuDNN for Linux is the following:

wget https://developer.download.nvidia.com/compute/cudnn/9.5.1/local_installers/cudnn-local-repo-ubuntu2404-9.5.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.5.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.5.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt -y install cudnn

Please replace the repo and cuda versions accordingly.

Setting up NVIDIA NIMs for local deployment

Obtain an NGC API key following the instructions.

Install the NVIDIA Container Toolkit following the instructions

Check that the installation was successful by running the following:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

The NGC catalogue can be viewed using NGC CLI. Install NGC following the instructions

Alternatively, the NGC catalogue can be viewed online. For example, the open-source Llama3.2 model can be deployed locally following the instructions, and alternatively accessed via the NVIDIA NIMs API if available (see the NIMs LLM [API](NIMs LLM API https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html) for OpenAPI schema).

Setting up WASM build environment

Add the following wasm32 compilation targets from the nightly Rust toolchain:

rustup update nightly
rustup target add wasm32-unknown-unknown --toolchain nightly
rustup target add wasm32-wasip2 --toolchain nightly

In addition, we recommend using wasmtime for running wasi components

curl https://wasmtime.dev/install.sh -sSf | bash

Setting up Dioxus

The front-end application is built using dioxus to enable creating web, desktop, and mobile applications using Rust

cargo install dioxus-cli

How to compile

This is a standard cargo project with workspaces. To build the different workspaces, you need to have rust and cargo and you will need to specify workspaces using the using the -p, --project flag:

cargo build -p phymes-core

CPU, GPU, and WASM-specific compilation features are gated behind feature flags wsl, wsl-gpu, and wasip2 respectively. The use of embedded Candle or OpenAI API token services are gated behind the feature flag candle, which indicates to use embedded candle models.

The following will build the phymes-agents workspace with different configurations of CPU and GPU acceleration for Tensor and Token services:

# Native CPU for tensor operations and local/remote OpenAI API token services
cargo build -p phymes-agents --features wsl --release

# Native CPU for tensor operations and embedded Candle for token services
cargo build -p phymes-agents --features wsl,candle --release

# GPU support for tensor operations and local/remote OpenAI API token services
cargo build -p phymes-agents --features wsl-gpu --release

# GPU support for tensor operations and embedded Candle for token services
cargo build -p phymes-agents --features wsl-gpu,candle --release

Please ensure that all CUDA related environmental variables are setup correctly for GPU acceleration. Most errors related to missing CUDA or CuDNN libraries are related to missing environmental variables particularly on WSL2.

export PATH=$PATH:/usr/local/cuda/bin:/usr/lib/x86_64-linux-gnu/
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs

The following will build the phymes-agents workspace as a WASIp2 component:

cargo build -p phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release

Mixing and matching features that are compilation target specific and compilation targets will result in build errors.

You can also use rust's official docker image:

docker run --rm -v $(pwd):/phymes -it rust /bin/bash -c "cd /phymes && rustup component add rustfmt && cargo build -p phymes-core"

From here on, this is a pure Rust project and cargo can be used to run tests, benchmarks, docs and examples as usual.

Setting up the cache for running tests and examples

Many of the tests (and examples if running without the GPU or on WASM) depend upon a local cache of model assets to run. The following bash script can be used to prepare the local assets:

# ensure your home environmental variable is set
echo $HOME

# make the cache directory
mkdir -p $HOME/.cache/hf

# copy over the cache files from the root of the GitHub repository
cp -a .cache/hf/. $HOME/.cache/hf/

# download the model assets manually from HuggingFace
curl -L -o $HOME/.cache/hf/models--sentence-transformers--all-MiniLM-L6-v2/model.safetensors  https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/model.safetensors?download=true -sSf
curl -L -o $HOME/.cache/hf/models--sentence-transformers--all-MiniLM-L6-v2/pytorch_model.bin  https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/pytorch_model.bin?download=true -sSf
curl -L -o $HOME/.cache/hf/models--Qwen--Qwen2-0.5B-Instruct/qwen2.5-0.5b-instruct-q4_0.gguf  https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf?download=true -sSf
curl -L -o $HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/smollm2-135m-instruct-q4_k_m.gguf  https://huggingface.co/Segilmez06/SmolLM2-135M-Instruct-Q4_K_M-GGUF/resolve/main/smollm2-135m-instruct-q4_k_m.gguf?download=true -sSf
curl -L -o $HOME/.cache/hf/models--Alibaba-NLP--gte-Qwen2-1.5B-instruct/gte-Qwen2-1.5B-instruct-Q4_K_M.gguf  https://huggingface.co/tensorblock/gte-Qwen2-1.5B-instruct-GGUF/resolve/main/gte-Qwen2-1.5B-instruct-Q4_K_M.gguf?download=true -sSf

Setting up local OpenAI API endpoints

Instead of using token credits with remote OpenAI API endpoints, it is possible to run the tests and examples locally using self-hosted open-source NVIDIA NIMs. Modify the following code depending upon the model(s) to be locally deployed:

# Text Generation Inference with Llama 3.2 (terminal 1)
export NGC_API_KEY=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
export LOCAL_NIM_CACHE=$HOME/.cache/nim
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8000:8000 nvcr.io/nim/meta/llama-3.2-1b-instruct:1.8.6

# Text Embedding Inference with Llama 3.2 (terminal 2)
export NGC_API_KEY=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
export LOCAL_NIM_CACHE=$HOME/.cache/nim
docker run -it --rm --gpus all --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8001:8000 nvcr.io/nim/nvidia/llama-3.2-nv-embedqa-1b-v2:latest

Note that the tests and examples assume that the local OpenAI API endpoints for NVIDIA NIMs are http://0.0.0.0:8000/v1 for Text Generation Inference (TGI, Chat) and http://0.0.0.0:8001/v1 for Text Embedding Inference (TEI, Embed), respectively. The defaults can be overwritten by setting the environmental variables for the TEI and TGI endpoints, respectively.

# URL of the local TGI NIMs deployment
export CHAT_API_URL=http://0.0.0.0:8000/v1

# URL of the local TEI NIMs deployment
export EMBED_API_URL=http://0.0.0.0:8001/v1

Also, be sure to add your NGC_API_KEY to the environmental variables before running tests or examples in a different terminal.

# NVIDIA API Key
export NGC_API_KEY=nvapi-...

Running the tests

Run tests using the Rust standard cargo test command:

# run all unit and integration tests with default features
cargo test

# run tests for the phymes-core crate with all features enabled
cargo test -p phymes-core --all-features

# run a specific test for the phymes-core crate with the wsl feature enabled
# and printing to the console
cargo test test_session_update_state -p phymes-core --features wsl -- --no-capture

# run the doc tests
cargo test --doc

You can find up-to-date information on the current CI tests in .github/workflows. The phymes-server, phymes-core, and phymes-agents crates have unit tests. Please note that many of the tests will in the phymes-agents crate do not run on the CPU due to the amount of time that it takes to run them. To run all tests in the phymes-agents create, either enable GPU acceleration with Candle using --features wsl-gpu,candle feature flag, or with OpenAI API local/remote token services using --feature wsl,openai_api or --feature wsl-gpu,openai_api feature flags depending upon GPU availability.

# run tests for the phymes-core crate
cargo test --package phymes-core --features wsl --release

# run tests for the phymes-agents crate with GPU acceleration with Candle assets
cargo test --package phymes-agents --features wsl-gpu,candle --release
# or run tests for the phymes-agents crate on the CPU with OpenAI API token services
cargo test --package phymes-agents --features wsl --release

# run tests for the phymes-server crate
cargo test --package phymes-server --features wsl --release

The tests can also be ran for WASM components. However, the WASM debug output is essentially useless, so it is recommend to debug the tests natively before testing on WASM

# build tests for the phymes-core crate
cargo test --package phymes-core --target wasm32-wasip2 --no-default-features --features wasip2 --no-run

# build tests for the phymes-core crate using wasmtime
# be sure to replace the -26200b790e92721b with your systems unique hash
wasmtime run target/wasm32-wasip2/debug/deps/phymes-core-26200b790e92721b.wasm

# run tests for the phymes-agents crate with GPU acceleration
cargo test --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --no-run

# build tests for the phymes-agents crate using wasmtime
# be sure to replace the -9ce9c7c7142d7db7 with your systems unique hash
wasmtime --dir=$HOME/.cache/hf --env=HOME=$HOME target/wasm32-wasip2/debug/deps/phymes-agents-9ce9c7c7142d7db7.wasm

# run tests for the phymes-server crate
cargo test -p phymes-server --features wasip2-candle --no-default-features --target wasm32-wasip2 --no-run

# build tests for the phymes-server crate using wasmtime
# be sure to replace the -48a453bb50fd01da with your systems unique hash
wasmtime --dir=$HOME/.cache --env=HOME=$HOME target/wasm32-wasip2/debug/deps/phymes_server-48a453bb50fd01da.wasm

Running the examples

Run examples using the Rust standard cargo run command. A few simple examples are provided for the phymes-core and phymes-agents crates to provide new users a starting point for building application using the crates

# run examples for the phymes-core crate
cargo run --package phymes-core --features wsl --release --example addrows

# run examples for the phymes-agents crate with GPU acceleration with Candle assets
cargo run --package phymes-agents --features wsl-gpu,candle --release --example chat -- --candle-asset SmoLM2-135M-chat
cargo run --package phymes-agents --features wsl-gpu,candle --release --example chatagent

# or run examples for the phymes-agents crate on the CPU with OpenAI API token services
cargo run --package phymes-agents --features wsl --release --example chat -- --openai-asset Llama-3.2-1b-instruct
cargo run --package phymes-agents --features wsl --release --example chatagent

The examples can also be ran using WASM. However, all assets needed to run the example need to be provided locally unlike native where we can rely on the HuggingFace API to download and cache models for us. The following bash script can be used to build the examples in wasm and run the examples using wasmtime:

# build examples for the phymes-core crate
cargo build --package phymes-core --target wasm32-wasip2 --no-default-features --features wasip2 --release --example addrows

# run the examples for the phymes-core crate
wasmtime run target/wasm32-wasip2/release/examples/addrows.wasm

# build the chat example for the phymes-agents crate
cargo build --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release --example chat

# run the chat example for the phymes-agents crate
wasmtime --dir="$HOME/.cache/hf" --env=HOME=$HOME target/wasm32-wasip2/release/examples/chat.wasm --weights-config-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/config.json" --weights-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/smollm2-135m-instruct-q4_k_m.gguf" --tokenizer-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/tokenizer.json" --tokenizer-config-file "$HOME/.cache/hf/models--HuggingFaceTB--SmolLM2-135M-Instruct/tokenizer_config.json" --candle-asset "SmoLM2-135M-chat"

# build the chatagent example for the phymes-agents crate
cargo build --package phymes-agents --target wasm32-wasip2 --no-default-features --features wasip2,candle --release --example chatagent

# run the chatagent example for the phymes-agents crate
wasmtime --dir="$HOME/.cache/hf" --env=HOME=$HOME target/wasm32-wasip2/release/examples/chatagent.wasm

Clippy lints

We use clippy for checking lints during development, and CI runs clippy checks.

Run the following to check for clippy lints:

cargo clippy -p phymes-core --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-agents --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-server --tests --examples --features wsl -- -D warnings
cargo clippy -p phymes-app --tests --examples -- -D warnings

If you use Visual Studio Code with the rust-analyzer plugin, you can enable clippy to run each time you save a file. See https://users.rust-lang.org/t/how-to-use-clippy-in-vs-code-with-rust-analyzer/41881.

One of the concerns with clippy is that it often produces a lot of false positives, or that some recommendations may hurt readability. We do not have a policy of which lints are ignored, but if you disagree with a clippy lint, you may disable the lint and briefly justify it.

Search for allow(clippy:: in the codebase to identify lints that are ignored/allowed. We currently prefer ignoring lints on the lowest unit possible.

  • If you are introducing a line that returns a lint warning or error, you may disable the lint on that line.
  • If you have several lints on a function or module, you may disable the lint on the function or module.
  • If a lint is pervasive across multiple modules, you may disable it at the crate level.

Rustfmt Formatting

We use rustfmt for formatting during development, and CI runs rustfmt checks.

Run the following to check for rustfmt changes (before submitting a PR!):

cargo fmt --all -- --check

The individual workspaces can then be formatted using rustfmt:

cargo fmt -p phymes-core --all
cargo fmt -p phymes-agents --all
cargo fmt -p phymes-server --all
cargo fmt -p phymes-app --all

Rustdocs and mdBook for documentation

We use doc for API documentation hosted on crates.io and mdBook for the guide and tutorial static website with a mermaid preprocessor mdbook-mermaid is used for generating mermaid diagrams hosted on GitHub Pages.

Run the following to create the API documentation using doc:

cargo doc --document-private-items --no-deps -p phymes-core --features wsl
cargo doc --document-private-items --no-deps -p phymes-agents --features wsl
cargo doc --document-private-items --no-deps -p phymes-server --features wsl
cargo doc --document-private-items --no-deps -p phymes-app

Please visit the mdBook guide for installation and usage instructions. Also, please visit mdbook-mermaid for installation instructions. Run the following to create the the guide and tutorials using mdBook:

mdbook build phymes-book

Running Benchmarks

In progress...

Running benchmarks are a good way to test the performance of a change. As benchmarks usually take a long time to run, we recommend running targeted tests instead of the full suite.

# run all benchmarks
cargo bench

# run phymes-core benchmarks
cargo bench -p phymes-core

# run benchmark for the add_rows function within the phymes-core crate
cargo bench -p phymes-core --bench add_rows

To set the baseline for your benchmarks, use the --save-baseline flag:

git checkout main

cargo bench -p phymes-core --bench add_rows -- --save-baseline main

git checkout feature

cargo bench -p phymes-core --bench add_rows -- --baseline main

Running the CI locally

Continuous integration and deployment are orchestrated using GitHub actions on each pull request (PR) to the main branch. Unfortunately, debugging the CI/CD can be quite difficult and time consuming, so we recommend testing locally using a self-hosted runner. Since caching is not supported with act, alternative GitHub Action files for downloading resources is provided in the .github.act folder.

First, follow the instructions for downloading, configuring, and using the self-hosted runner.

Second, be sure to change runs-on: ubuntu-latest to runs-on: self-hosted in the YAML for all workflow files for each job.

Third, Run the actions-runner. Now, when you open a PR, the CI will run locally on your machine.

Deploying Phymes

The phymes-core, phymes-agents, phymes-server, phymes-app crates form a full-stack application that can run Agentic AI workflows, Graph algorithms, or Simulate networks at scale using a web, desktop, or mobile interface. Both the frontend and server need to be built in --release mode for improved performance and security.

Web

First, build the frontend application using dioxus

dx bundle -p phymes-app --release

Second, build the server with the desired features.

# GPU support and Candle token service
cargo build --package phymes-server --features wsl-gpu,candle --release

# Or OpenAI API
cargo build --package phymes-server --features wsl --release

Third, move the server executable to the same directory as the web assets

mv target/release/phymes-server target/dx/phymes-app/release/web/public/phymes-server

Fourth, run the application and navigate to http://127.0.0.1:4000/

cd target/dx/phymes-app/release/web/public
./phymes-server

Desktop

First, build the frontend application

cargo build -p phymes-app --features desktop --release

Second, build the phymes-server application with the desired features.

# GPU support and Candle token service
cargo build --package phymes-server --features wsl-gpu,candle --release

# Or OpenAI API
cargo build --package phymes-server --features wsl --release

Third, launch the phymes-app executable

./target/release/phymes-app

Fourth, launch the phymes-server executable

./target/release/phymes-server

Mobile

In progress...

WASM

First, build the phymes-server application with Candle token services.

cargo build --package phymes-server --no-default-features --features wasip2-candle --target wasm32-wasip2 --release

Second, iteratively query the phymes-server using wasmtime.

# Sign-in and get our JWT token
wastime --dir=$HOME/.cache phymes-server.wasm --route app/v1/sign_in --basic_auth EMAIL:PASSWORD
# mock response {"email":"EMAIL","jwt":"JWTTOKEN","session_plans":["Chat","DocChat","ToolChat"]}

# Get information about the different subjects
wastime --dir=$HOME/.cache phymes-server.wasm curl --route app/v1/subjects_info --bearer_auth JWTTOKEN --data '{"session_name":"myemail@gmail.comChat","subject_name":""}'

# Chat request
# Be sure to replace EMAIL and JWTTOKEN with your actual email and JWT token!
# Note that the session_name = email + session_plan
wastime --dir=$HOME/.cache phymes-server.wasm curl --route app/v1/chat --bearer_auth JWTTOKEN --data '{"content": "Write a python function to count prime numbers", "session_name": "EMAILChat", "subject_name": "messages"}'

Deploying with local or remote OpenAI API compatible token service endpoints

OpenAI API compatible token service endpoints are supported for local (e.g., NVIDIA NIMs) or remote (e.g., OpenAI or NVIDIA NIMs). Please see the guide for local NVIDIA NIMs token service deployment.

Before running the phymes-server, setup the environmental variables as needed to access the local or remote token service endpoint as described in the guide, or specify the endpoint urls in the CandleEmbedConfig and CandleChatConfig, respectively.

Developing and debugging the phymes application

We recommend debugging the application using two terminals: one for phymes-app and another for phymes-server. Dioxus provides a great development loop for front-end application development with nifty hot-reloading features, but requires it's own dedicated terminal to run. Tokio provides an industry grade server along with nifty security features. During development (specifically, debug mode), the server permissions are relaxed to enable iterative debugging of the application. The phymes-core, phymes-agents, and phymes-server all use the Tracing crates for tracing and logging functionality. The packages and verbosity of console logging can be specified on the command line using the RUST_LOG environmental variable.

In the first terminal:

dx serve -p phymes-app

In the second terminal:

# default log level
cargo run -p phymes-server --features wsl-gpu

# only INFO level logs
RUST_LOG=phymes_server=INFO cargo run -p phymes-server --features wsl-gpu

# debug level logs for phymes-server, phymes-core, and phymes-agents
RUST_LOG=phymes_server=DEBUG,phymes_core=DEBUG,phymes_agents=DEBUG cargo run -p phymes-server --features wsl-gpu

PHYMES Core

Synopsis

The PHYMES core crate implements the core algorithm for parallel and streaming hypergraph message passing. Similar to Pregel, messages are passed to computational nodes at each superstep. The messaging passing continues until a criteria is met or a maximum number of supersteps have been reached. Unlike Pregel, PHYMES does not operate over an undirected graph, but over a directed hypergraph using a publish-subscribe model where subjects are nodes and tasks are hyperedges and the directionality is determined by whether subjects are subscribed to or published on. In short, tasks subscribe to subjects, process subjects, and then publish their results to subjects.

Phymes algorithm

Hypergraphs are powerful representations for the complexities of the real world

Hypergraphs are a generalization of graphs that can better represent the complexities of the real world. For example, biochemical networks can be represented as a directed hypergraph where nodes are molecules and hyperedges are reactions that catalyze the conversion of molecules into other molecules. The directionalty of the reaction is lost when represented as an undirected graph and the grouping of molecules involved in the reaction is lost when using directed graphs.

Mathematically, the biochemical reaction hypergraph can be represented as an incidence matrix where the rows corresponding to molecules, the columns correspond to reactions, and the entries are positive or negative integers corresponding the the number of molecules consumed or produced by the reaction.

While simplistic, this representation is already useful for decision science whereby one can optimize the molecules flow through the network using (mixed integer)-(non)-linear programming, for simulation whereby one can reverse parameter fit the network to data and then forward simulate the network using numerical integration solvers, for exploratory data science whereby unsupervised methods such as PCA, KNN clustering, etc can be applied to learn about the network topological and dynamic properties, and for forecasting whereby once can fit a hypergraph neural network to make predictions about future node, hyperedge, or hypergraph properties.

Hypergraphs can be further decomposed into binary and unary hyper edges. This decomposition better aligns with the binary and unary operators of computational graphs. This decomposition also aligns well with biochemical computational graphs where reactions can be broken down into steps describing the binding of individual molecules to a catalyst, the rearrangement of atoms, and the unbinding of individual molecules. When the exact steps are not known, a set of possible hypergraph configurations can be constructed and then weighted by their probability or used as the basis for follow up experiments to determine the exact steps.

Phymes is a parallel hypergraph message streaming algorithm

Phymes is designed around the concept of a hypergraph where subjects are nodes and tasks are hyperedges. Phymes uses a publish-subscribe message passing scheme to implement the hypergraph. Specifically, tasks subscribe to and publish on subjects. The addition of processor metrics, tracing, and the subject tables themselves provides transparency on all hypergraph computation steps.

The execution of tasks is done in parallel and Asynchronously. Tasks can either always subscribe to a subject or conditionally subscribe when a subject is updated. Tasks run only when all of their subscriptions are ready. Conditional subscription on updated subjects enables the coding of first order logic to control when tasks execute. Tasks then publish the results of their computation to subjects. The execution of tasks is coordinated through supersteps where ready tasks are ran in parallel.

Phymes incorporates decomposition of hyperedges. Tasks are composed of processes that iteratively operate over streams of subject messages. The chaining of processors enables the representation of complex computational graphs. The streaming of messages enables the scaling to massive data sets that do not fit into memory on a single node. The computation over streaming messages enables parallel and Asynchronous execution of tasks at each superstep.

Phymes provides the algorithmic basis and computational framework for implementing efficient and scalable algorithms for decision science, simulation, data science, and machine learning. In particular, phymes provides a cross platform, secure, performant, and transparent Agentic AI framework with support for complex ETL and data science function calls.

Phymes glossary

Session

Manifestation of particular task hypergraph that a user can publish subjects to and stream subscribed subjects from. The SessionContext maintains the metrics, tasks, state, and runtime (discussed below). The SessionStream manages the subscribing and publishing of subjects by tasks by iteratively running a "super step" via SessionStreamStep which invokes tasks for which all of their subscribed subjects have been updated and then updates the subjects with the published results of each ran task.

State

The application state are subjects (i.e., tables) that implement the ArrowTableTrait. Subjects include data tables, configurations (e.g., single row tables where the schema defines the parameters and the first row defines the values for the parameters), and aggregated metrics. Subjects can be subscribed to and published to from multiple tasks, which results in a hypergraph structure.

Metrics

Recorded per ArrowProcessorTrait and aggregated at the query level. Examples of baseline metrics include runtime and processed table row counts. Additional metrics can be defined by the user.

Tasks

The unit of operation. Tasks are hyperedges in the task hypergraph. Tasks are often referenced as Agents or Tools in the agentic AI world or as execution plan partitions in the database world. The trait ArrowTaskTrait defines the behavior of the task through the run method which is responsible for subscribing to subjects, processing them by calling a series of structs implementing the ArrowProcessorTrait, and then publishing outgoing messages to subjects. The struct that implements the ArrowTaskTrait includes a name for the task, metrics to track the task processes, runtime information, access to the complete or a subset of the complete state, and a ArrowProcessorTrait. The processing of messages is implemented by the ArrowProcessorTrait, which recieves the incoming message, metrics, state, and runtime from the ArrowTaskTrait.

Process

The processing of messages is implemented by the ArrowProcessorTrait, which recieves the incoming message, metrics, state, and runtime from the ArrowTaskTrait. The processor operates over streams of RecordBatchs which enables scaling across multiple nodes, and allows for calling native or remote (via RPC) components or provider services. Tasks can call a series of structs implementing ArrowProcessorTrait forming a complex computational chain over subscribed messagings. Processes are intended to be quasi-stateless, and therefore all state and runtime artifacts are passed as parameters to the process method. For example, the configurations for the processor method are passed as state tables, the runtime artifacts are set in the runtime, and the metrics for recording the time and number of rows processed and other user defined criteria are all passed as input paramters. The only state maintained by the processors are the subscribed subjects, subjects to publish, and subjects to pass through to the next process in the chain.

Messages

The unit of communication between tasks. ArrowIncomingMessagees are sent to tasks in uncompressed tables or compressed IPC format; while ArrowOutgoingMessagees are sent from tasks in a streaming format to enable lazy and parallel execution of the task processor computational chain. The streaming format is used as both input and output for processors to enable scaling to massive data sizes that can be efficiently distributed across multiple nodes.

Tables

The unit of data. Tables implement the ArrowTableTrait and facilitate the manipulation of Arrow RecordBatches. RecordBatches are a highly efficient columnar data storage format that facilitate computation on the CPU and transfer to the GPU for accelerated computation. Tables can be seralized when sending over the wire, and converted to and from various formats including JSON as needed.

Runtime Env

Defines how Processes are ran. For AI/ML tasks, the RuntimeEnv may include model weights loaded into GPU memory. For ETL tasks, the RuntimeEnv may include CPU settings and memory or wall clock constraints. Similar to state, the RuntimeEnv artifacts are intended to be shared between different processes. For example, a single LLM model weights maybe used by multiple processors in parallel or re-used between queries to the same session without having to reload the weights into memory each time. The runtime environment can also be thought of as services that processes can share. For example, instead of hosting an LLM locally, the runtime can define an API to call to generate tokens remotely which provides flexibility and scalability depending upon the needs of the application.

Example: Single Source Shortest Paths (SSSP)

SSSP computes the shortest path distance from a source node to all reachable destination nodes. Loops are ignored.

In progress...

Example: Biochemical network simulator

A simple network describing the interconnection of molecules is simulated until a steady state is reached. The time axis is uniformly discretized and walked along until there is no further change in molecule amounts. If the change in any molecule amounts is greater than a given threshold, the step is repeated with a finer grained time delta. If the change in all molecules is below a given threshold, the time delta is increased on the next step.

In progress...

Example: Agentic AI

See phymes-agents crate for examples building a chat bot, tool calling agent, and document RAG workflow.

PHYMES Agents

Synopsis

The PHYMES agent crate implements the functionality to enable Agentic AI. Specifically, the crate implements functionality for native rust text generation inference, text embedding inference, and general tensor operations using the candle crates from HuggingFace. All functionality can be GPU accelerated using CUDA and CuDNN.

Model Assets

Synopsis

This tutorial describes how the Candle model assets are used to support text generation inference (TGI) and text embedding inference (TEI) services that are needed for agentic AI. In addition, this tutorial also describes how the Candle Tensor class can provide GPU accelerated ETL operations such as sorting, joining, and group aggregation to build powerful tools and complete ETL pipelines that can be integrated with agentic AI. A chat model that provides TGI via the command line is provided in the examples.

Tutorial

Assets

The assets that enable TGI and TEI include the model weights, configs, and tokenizer. Pytorch .bin, SafeTensor model.safetensor, and .gguf formats are supported. All assets can be downloaded using the HuggingFace API. phymes-agents provides a unified interface that hides away the nuances of different model architectures, quantizations, etc. to provide a more streamlined experience when working with different models similar to other agentic AI libraries.

Text generation inference (TGI)

The TGI model classes supported currently include Llama and Qwen and their quantized versions. Please reach out if other model classes are needed.

Text embedding inference (TEI)

The TEI model classes supported currently include BERT and QWEN and their quantized versions. Please reach out if other model classes are needed.

WASM compatibility

TGI, TEI, and Tensor operations are all supported in WASM with simd128 vectorization acceleration when supported by the CPU. Note that the SafeTensor format cannot be used with WASM. The maximum model weight memory cannot exceed 2 GB. The HuggingFace API can also not be used.

Tensor operations

The Tensor class combined with Arrow's Compute library provides the primitives for select, sort, join, and aggregate operations with CPU and GPU accelerated that can be combined into complete ETL pipelines. Custom operations such as document chunking required for document RAG can also be created. Operations are either Unary or Binary, and composed into complex execution graphs analogous to database query plans. All available functions are wrapped into a unified interface that supports tool calling with agents.

Session Plan: Chat Agent

Synopsis

This tutorial describes how the Chat Agent Session Plan uses the phymes-agent and phymes-core crates to build a simple chat agent. The chat agent is provided in the examples.

Tutorial

The simplest agentic AI architecture is that of a chat agent, which can be modeled as a static directed acyclic graph.

stateDiagram-v2
    direction LR
    [*] --> chat_agent: Query
    chat_agent --> [*]: Response

The session starts with a query to the chat_agent from the user, and the chat_agent runs text generation inference to respond back to the user.

Under the hood, the states of the application are determined by the subjects that are subscribed to and published on by the user and the chat_agent.

sequenceDiagram
    user->>messages: 1
    messages->>chat_agent: 2
    config->>chat_agent: 3
    chat_agent->>messages: 4
    messages->>user: 5

The sequence of actions are the following:

  1. The user publishes to messages subject
  2. The chat_agent subscribes to messages subject when there is a change to the messages subject table.
  3. The chat_agent subscribes to configs subject no matter if there is a change or not because the configs provide the parameters for running the chat_agent.
  4. The chat_agent performs text generation inference based on the messages subject content and publishes the results to the messages subject.
  5. The user subscribes to messages subject where there is a change to the messages subject table.

sign-in

The session ends because there are no further updates to the subjects. If the user publishes a follow-up message, the session will pick-up where it left off with the chat_agent responding to the updated message content.

Next steps

The Chat Agent Session Plan comes with a number of default configurations including the model, number of tokens to sample, temperature of sampling, etc. that can be modified by the user.

Session Plan: Tool Agent

Synopsis

This tutorial describes how the Tool Agent Session Plan uses the phymes-agent and phymes-core crates to build a tool calling agent.

Tutorial

The tool agent adds stochasticity to the agentic AI architecture of the chat agent by conditionally calling an external tool if needed, which can be modeled as a conditional directed cyclic graph.

stateDiagram-v2
    direction LR
    [*] --> chat_agent: Query
    chat_agent --> [*]: Response
    chat_agent --> tool_task: Invoke tool
    tool_task --> chat_agent: Tool call

The session starts with a query to the chat_agent from the user. Next, the chat_agent may call one or more tools to answer the user query. Next, the tool calls are executed in parallel by the tool_task. Finally, the results of the tool calls are provided to the chat_agent to ground the text generation inference to respond back to the user.

Under the hood, the states of the application are determined by the subjects that are subscribed to and published on by the user, tool_task and chat_agent.

sequenceDiagram
    user->>messages: 1
    messages-->>chat_agent: 2
    tools-->>chat_agent: 3a
    config->>chat_agent: 3b
    chat_agent->>tool_calls: 4
    tool_calls->>tool_task: 5
    data->>tool_task: 6
    tool_task->>messages: 7
    messages-->>chat_agent: 8
    tools-->>chat_agent: 9a
    config->>chat_agent: 9b
    chat_agent->>messages: 10
    messages->>user: 11

The sequence of actions are the following:

  1. The user publishes to messages subject
  2. The chat_agent subscribes to messages subject when there is a change to the messages subject table.
  3. The chat_agent subscribes to configs and tools subjects no matter if there is a change or not because the configs provide the parameters for running the chat_agent and the tools describes the schema for the tool calls.
  4. The chat_agent performs text generation inference based on the messages subject content and tool schemas, and publishes the results to either messages or tool_calls subject.
  5. The tool_task subscribes to the tool_calls subject when there is a change to the tool_calls subject table
  6. The tool_task subscribes to the data subject (one or more data tables needed to execute the tool call) no matter if there is a change or not because the data subjects provide the data tables needed for running the tool_task.
  7. The tool_task retrieves the needed data tables to execute the tool_calls, executes the tool_calls, and publishes the results to the messages subject.
  8. The chat_agent subscribes to messages subject when there is a change to the messages subject table, which has now been updated with the results of the tool_calls.
  9. The chat_agent subscribes to configs and tools subjects no matter if there is a change or not because the configs provide the parameters for running the chat_agent and the tools describes the schema for the tool calls.
  10. The chat_agent performs text generation inference based on the messages subject content, tool schemas, and results of the tool_calls, and publishes the results to either messages or tool_calls subject.
  11. The user subscribes to messages subject where there is a change to the messages subject table.

The session ends because there are no further updates to the subjects. If the user publishes a follow-up message the session will pick-up where it left off with the chat_agent responding to the updated message and tool_calls content.

Next steps

The Tool Agent Session Plan comes with a number of default configurations including the model, number of tokens to sample, temperature of sampling, etc. that can be modified by the user. A trivial use case is provided for sorting an array in a table that can be used as a starting point for creating more complex realistic use cases involving tools that manipulate (large) tabular data.

Seesion Plan: Retrieval Augmented Generation (RAG) Agent

Synopsis

This tutorial describes how the Document RAG Agent Session Plan uses the phymes-agent and phymes-core crates to build a tool calling agent.

Tutorial

The document RAG agent adds a complex document parsing, embedding, and retrieval ETL pipeline to the agentic AI architecture of the chat agent.

stateDiagram-v2
    direction LR
    [*] --> embed_task: Documents
    [*] --> embed_task: Query
    [*] --> chat_agent: Query
    embed_task --> vs_task: Search docs
    vs_task --> chat_agent: Top docs
    chat_agent --> [*]: Response

The session starts with an upload of documents to the embed_task to chunk and embed the documents, an upload of the query to the embed_task to embed the query, and a query to the chat_agent from the user. Next, a vector search is performed over the embedded documents to find the top K documents matching the query. Finally, the top K documents are provided to the chat_agent to ground the text generation inference to respond back to the user.

Under the hood, the states of the application are determined by the subjects that are subscribed to and published on by the user, embed_task, vs_task, and chat_agent.

sequenceDiagram
    user->>documents: 1
    user->>query: 2
    user->>messages: 3
    documents --> embed_doc_task: 4
    doc_embed_config --> embed_doc_task: 5
    embed_doc_task --> embedded_documents: 6
    query --> embed_query_task: 7
    query_embed_config --> embed_query_task: 8
    embed_query_task --> embedded_queries: 9
    embedded_documents --> vs_task: 10a
    embedded_queries --> vs_task: 10b
    vs_config --> vs_task: 11
    vs_task -> top_k_docs: 12
    messages-->>chat_agent: 13a
    top_k_docs-->>chat_agent: 13b
    config->>chat_agent: 14
    chat_agent->>messages: 15
    messages->>user: 16

The sequence of actions are the following:

  1. The user publishes to documents subject

sign-in

  1. The user publishes to query subject

sign-in

  1. The user publishes to messages subject
  2. The embed_doc_task subscribes to the documents subject when there is a change to the documents subject table
  3. The embed_doc_task subscribes to configs subject no matter if there is a change or not because the configs provide the parameters for running the chunk_processor.
  4. The embed_doc_task chunks the documents, embeds the chunks, and publishes the results to the embedded_documents subject.
  5. The embed_query_task subscribes to the documents subject when there is a change to the documents subject table
  6. The embed_query_task subscribes to configs subject no matter if there is a change or not because the configs provide the parameters for running the chunk_processor.
  7. The embed_query_task embeds the query and publishes the results to the embedded_queries subject.
  8. The vs_task subscribes to the embedded_documents and embedded_queries subjects when there is a change to the embedded_documents and embedded_queries subject tables
  9. The vs_task subscribes to configs subject no matter if there is a change or not because the configs provide the parameters for running the chunk_processor.
  10. The vs_task computes the relative similarity between the query and document embeddings, sorts the scores in descending order, retrieves the chunk text, formats the results for RAG, and publishes the results to the top_k_docs subject.
  11. The chat_agent subscribes to messages and top_k_docs subjects when there is a change to the messages and top_k_docs subject tables.
  12. The chat_agent subscribes to configs subject no matter if there is a change or not because the configs provide the parameters for running the chat_agent.
  13. The chat_agent performs text generation inference based on the messages subject content and retrieved Top K document chunks, and publishes the results to the messages subject.
  14. The user subscribes to messages subject where there is a change to the messages subject table.

sign-in

The session ends because there are no further updates to the subjects. If the user publishes a follow-up message or uploads new documents, the session will pick-up where it left off with the chat_agent responding to the updated message and top k document chunk content.

Next steps

The Document RAG Agent Session Plan comes with a number of default configurations including the model, number of tokens to sample, temperature of sampling, etc. that can be modified by the user. For production use cases, we recommend the NVIDIA RAG Blue Print.

PHYMES Server

Synopsis

The PHYMES server crate provides the handlers for querying the SessionContext and publishing on and subscribing to subjects for predefined SessionPlans and serves the web application via the tokio, axum, and tower-http stack.

Tutorial

Security recommendations

phymes-server should be built in --release mode for in production. When built in debug mode, the CORS security is relaxed to enable interactive debugging of the phymes-app.

Performance recommendations

It is strongly recommend to enable CUDA and CuDNN for NVIDIA GPU acceleration when running token and tensor services locally. If a GPU is not available for running token services, it is strongly recommended to use an API such as OpenAI, NVIDIA NIMs, etc with your API access key passed as an environmental variable instead. Native acceleration for Intel and Apple chipsets are enabled by default when detected. SIMD128 vector instructions for WASM runtimes are enabled by default.

WASM compatibility

Phymes-server can be built for the wasm32-wasip2 target (without support for serving HTML and without encryption but with support for APIs) by specifying the "tokio_unstable" feature flag. See discussion https://github.com/tokio-rs/tokio/discussions/6526.

RUSTFLAGS="--cfg tokio_unstable" cargo build --release -p phymes-server --features wasip2 --target wasm32-wasip2
wasmtime run -S tcp-listen=127.0.0.1:4000 ./wasi-server.wasm

Unfortunately, it appears that the latest version of Tokio does not build for target wasm32-wasip2.

WASI HTTP can be used using wasmtime serve to forward requests to Wasm components. See example https://github.com/sunfishcode/hello-wasi-http/. However, this has not been implemented yet.

Phymes-server can also be built for the wasm32-unknown-unknown target (ang without without support for serving HTML and without encryption but with support for APIs) for serverless applications. See example https://github.com/tokio-rs/axum/blob/main/examples/simple-router-wasm/src/main.rs.

PHYMES App

Synopsis

The PHYMES application crate implements the UI/UX for querying the SessionContext and publishing on and subscribing to subjects for predefined SessionPlans directly or through a chat interface using dioxus. Application builds for Web, Desktop (all OSs), and mobile (all OSs) are theoretically supported, but only Web and Desktop are optimized for currently. The application takes a full-stack approach whereby the frontend is handled by this crate using dioxus and the backend is handled by the phymes-server crate using tokio.

User Interface (UI)

Help

Description of menu items

Sign in

sign-in

User registration and sign in. Each account corresponds to a single email.

Session plans

session plans

A list of session plans available to the account. Each session is like a different app with different functionality and state. Only one session can be activated at a time.

Subjects

subjects

A list of subject associated with the active session plan. A table shows the schema of the subject tables along with the number if rows. The subject tables can be extended by uploading tables in comma deliminated CSV format with headers that match the subject. The subject tables can also be downloaded in comma deliminated CSV format. Note that all of the parameters for describing how processors process streaming messages are subject tables. Extending the subject tables for a processors parameters will update the processors parameters on the next run. Note that the message history is also a subject table. Extending the messages table is the equivalent of human in the loop.

Tasks

tasks

A list of tasks and subjects that the tasks subscribe to and publish on associated with the active session plan. The reaction between tasks and subjects are visualized as an incidence matrix where + indicates publish on and - indicates subscribe to. A toggle button is provided to expand the tasks to their individual processors and collapse the individual processors to their tasks.

Messaging

messaging

The message history for the active session plan. A chat interface is provided for users to publish messages to the messages subject and to receive subscriptions from the messages subject when the messages subject is updated.

Metrics

metrics

A list of metrics associated with the active session plan. Metrics are tracked per processor. A table in long form displays the values for the tracked metrics. Baseline metrics for row count, and processor start, stop, and total time in nanoseconds are provided. Note each row is approximately one token for text generation inference processors. Please submit a feature request issue if additional metrics are of interest.