CLaaS logo

An assistant that improves
with every interaction

Your current assistant “learns” through system prompts and context tricks. CLaaS is an open-source prototype that turns interactions into weight updates, personalizing and improving your model over time.

Don't have a GPU? Sign up for updates on our hosted solution.

The learning gap

In-context learning doesn't scale. Every preference, instruction, and memory stuffed into your prompt is a token your assistant can't spend on the task that actually matters.

Base model

Every conversation starts from scratch.

Performance
Available context

Prompting and memory

The context window gets filled with system prompts and memory files. But it’s reminding, not learning.

Performance
Available context

Continual Learning

Your feedback changes the model weights themselves, rewiring how it thinks. No context overhead. No forgetting.

Performance
Available context

How it works

Every conversation makes it smarter

Base Model

Assistant response before CLaaS — generic, verbose output
CLaaS

After CLaaS

Assistant response after CLaaS — concise, personalized output

01Chat naturally

Use your personalized assistant on Telegram with an open-source model you control.

02Give feedback

Give text feedback on any response to teach and improve the model.

03Real-time weight update

Your feedback is distilled into weight updates. Your assistant improves without forgetting.

04Better every message

Your next response comes from an updated assistant that’s slightly more yours.

Features

Your model, your infrastructure

Fully open source

Run entirely on open-source models you control. Your data stays on your infrastructure with no external API dependency.

Learns per-request

Every API call updates the model without redeployment. Your model evolves with each interaction in real time.

Telegram integration

Give feedback directly in your Telegram chat. Each piece of feedback triggers a model update in real time.

Flexible deployment

Single consumer GPU, Tinker SDK (no GPU), or Modal cloud. Docker Compose gets you running in minutes.

Coming soon

Hosted solution

No GPU needed. We run CLaaS on our secure infrastructure so you can personalize your assistant with zero overhead.

Coming soon

Implicit feedback

Every chat message and tool use is a training signal. Your assistant learns from how you interact, no explicit feedback needed.

Training Dashboard

CLaaS training dashboard showing request logs, samples, timing, and training metrics

Eval Harness Dashboard

CLaaS eval dashboard showing learning curves, rollouts, and preference compliance

Architecture

Hybrid engine for local deployment

Local deployment alternates between a serving and training state.

Serving mode

vLLM inference

  • Route traffic through vLLM
  • Low-latency generation with LoRA
  • Hot-reload updated adapters
Update mode

Self-distillation

  • Pause serving to free GPU memory
  • Single distillation step on adapter
  • Learn from feedback without forgetting
Deployment

Flexible backends

  • Local GPU (single consumer GPU)
  • Tinker SDK (no GPU required)
  • Modal (coming soon)

Don't want to self-host?

We're building a hosted version of CLaaS: a personalized assistant that learns from you, running on our infrastructure. No GPU required.

Join the waitlist to get early access.