Sovereign AI Inference for New Zealand.
OpenAI-compatible inference API, served from NZ-owned hardware. Flat-fee pricing, no token counting, and your data never leaves the country.
Sovereign by default
NZ-owned hardware in Auckland. Prompts and responses never cross the border.
Local latency
Auckland-hosted. Faster than US or EU round-trips for every NZ user.
OpenAI-compatible
Drop-in for any OpenAI SDK. Swap the base URL, keep your code.
Flat-fee pricing
One predictable line item per month. No per-token billing. No surprises.
OpenAI-compatible. Two lines to switch.
Override the base URL, drop in your AI Foundry key, keep the rest of your OpenAI SDK code. Streaming, multimodal, tools — all behave the same.
from openai import OpenAIclient = OpenAI( api_key="<your-aifoundry-key>", base_url="https://api.aifoundry.co.nz/v1",)resp = client.chat.completions.create( model="nemotron-omni-30b-a3b", # or "qwen3-coder-next" messages=[{"role": "user", "content": "Hello!"}],)print(resp.choices[0].message.content)Flat-fee pricing. No token counting.
Two plans. One API key per subscription. 8 concurrent requests. Cancel anytime.
OpenClaw
Multimodal chat, vision, OCR, and custom agents.
Nemotron Omni 30B
128k tokens
nemotron-omni-30b-a3b
Everyday assistants
Hermes agents
Document scanning (OCR)
Custom agents
Slack and chat bots
Multimodal apps
Coder Pro
Long-context coding for agents, refactors, and code review.
Qwen3 Coder Next
256k tokens
qwen3-coder-next
Coding agents
Multi-file refactors
Codebase analysis
Code review and PRs
Test generation
Long-running tooling
Built for buyers who care where data lives.
Four reasons regulated NZ businesses and indie developers pick a sovereign inference provider.
01
Your data stays in NZ
Every other major inference provider routes through US, EU, or Australian regions. We don't. AI Foundry Inference runs on hardware we own, in a single Auckland facility, with NZ-controlled networking. Nothing crosses the border — not for failover, not for monitoring, not ever.
02
Local latency, no trans-Pacific tax
Auckland-hosted means your application talks to a model on the same island. No 200ms round-trip to Virginia. No regional routing surprises. The further your customers are from the model, the slower their experience — keep it here.
03
One flat fee, no token math
Subscription pricing in USD with NZ GST added at checkout — billed monthly through Stripe. No per-token charges, no surprise invoices, no spreadsheets to forecast usage. Token counts are reported for visibility, not billing.
04
NZ-owned, in for the long haul
We're a New Zealand company building local AI infrastructure. Subscriptions keep revenue in-country and fund more capacity here. If sovereignty matters to your buyers, it matters to ours too.
Two models. Pick what you build with.
Each subscription provisions one API key entitled to one model. Want both? Subscribe to both plans.
Nemotron Omni 30B
nemotron-omni-30b-a3b
128k
context
0B
Strengths
Multimodal reasoning across text, images, audio, and video
Strong document understanding and OCR
Powers Hermes-style and custom agents
General-purpose chat and assistants
Qwen3 Coder Next
qwen3-coder-next
256k
context
0B
Strengths
Specialised for code generation and refactoring
256k context for multi-file reasoning
Built for agent harnesses and long-running tasks
What’s in every subscription
One key, two halves of the offer.
/ Inference
OpenAI-compatible /v1/chat/completions endpoint
Streaming responses and tool use
GET /v1/models entitlement check for your key
8 concurrent requests per API key
Multi-turn chat, system prompts, temperature, max_tokens
/ Billing & support
Self-serve plan management via Stripe Customer Portal
Token counts reported in API responses (visibility, not billing)
Email support and status page
Your model. Our hardware. Private API.
Rent a dedicated GPU in our Auckland facility. Bring any model weights you have rights to — we load them and give you a private, OpenAI-compatible endpoint. Nothing shared, nothing logged, nothing offshore.
Dedicated 24GB
Entry tier for small-to-mid models and prototyping.
Contact for pricing
NVIDIA RTX PRO 4000 Blackwell
24 GB GDDR7
Dedicated · single tenant
Up to 14B dense models
Quantised 30B-class (Q4/Q5)
Fine-tune serving & LoRA adapters
Embedding and reranker workloads
Internal prototypes and pilots
Dedicated 96GB
Flagship for 70B-class models and long-context workloads.
Contact for pricing
NVIDIA RTX PRO 6000 Blackwell
96 GB GDDR7
Dedicated · single tenant
70B-class dense models
100B+ MoE (active params permitting)
Long-context coding and agent stacks
Vision-language and multimodal models
Production workloads at single-tenant latency
Questions, answered.
The short version, with the receipts.
All inference runs on AI Foundry-owned hardware in Auckland, New Zealand, hosted in Datacom Datacentres. Prompts and responses never leave the country — no cloud-provider passthrough, no offshore failover.
Drop-in compatible. Point your existing OpenAI SDK (Python, JS, LangChain, LiteLLM, etc.) at https://api.aifoundry.co.nz/v1, pass your AI Foundry key, and your code works. Chat completions, streaming, multimodal messages, and the usage field all match OpenAI shapes.
Yes — subscribe to both plans. Each subscription provisions its own API key, each entitled to its own model. An OpenClaw key calling the Coder Pro model returns 403, so keys stay scoped to what they paid for.
Each API key has 8 concurrent in-flight requests. Exceeding that returns HTTP 429 with a Retry-After header — the standard OpenAI convention every SDK already handles. No customer-visible RPM or TPM caps. Need more concurrency? Subscribe to the same plan again for an additional key.
Plans are billed in USD via Stripe. For New Zealand customers, 15% GST is added automatically at checkout as a separate line item — not bundled into the headline price.
You keep access until the end of the paid period. No pro-rated refunds for partial months. Cancellation revokes the API key, and you can resubscribe at any time. Manage everything self-serve through the Stripe Customer Portal.
No. Prompts and completions are not logged for training. Token counts are reported in API responses for your visibility, but the content of your requests stays yours.
Yes — that is our Models as a Service offer. Rent a dedicated NVIDIA Blackwell GPU in our Auckland facility (24GB or 96GB VRAM), bring any model weights you have rights to, and we hand back a private OpenAI-compatible endpoint. Single tenant, NZ sovereign, no shared infrastructure. See /services/maas.
Our subscription inference runs on AI Foundry-owned GPU hardware in Auckland. Models as a Service deployments use NVIDIA RTX PRO Blackwell GPUs (24GB or 96GB) per dedicated tenant. Everything sits in Datacom Datacentres on NZ-controlled networking.
Active subscriptions include email support and access to the status page. For enterprise volume, custom SLAs, or hardware partnerships, get in touch via the contact page.
The end of the GPU era. Titan is here.
Purpose-built inference silicon. Not graphics. Not training. Inference.
- 8 TB+memory per system
- 16 Tparameters, single 4U box
- 10 M+token context
- 4,096Titans per cluster
Positron's exclusive APAC reseller. GPUs were a detour — this is the road forward.
Start building today.
Sign up, generate an API key, and send your first request in minutes. Cancel from the Stripe portal any time.