// inference.aifoundry.co.nz
online · auckland · 1 Gbps

Sovereign AI Inference for New Zealand.

OpenAI-compatible inference API, served from NZ-owned hardware. Flat-fee pricing, no token counting, and your data never leaves the country.
Instant API keyCancel anytime+15% NZ GST
// what_you_get
Sovereign by default

NZ-owned hardware in Auckland. Prompts and responses never cross the border.

Local latency

Auckland-hosted. Faster than US or EU round-trips for every NZ user.

OpenAI-compatible

Drop-in for any OpenAI SDK. Swap the base URL, keep your code.

Flat-fee pricing

One predictable line item per month. No per-token billing. No surprises.

// integrate

OpenAI-compatible. Two lines to switch.

Override the base URL, drop in your AI Foundry key, keep the rest of your OpenAI SDK code. Streaming, multimodal, tools — all behave the same.

app.py
from openai import OpenAIclient = OpenAI(    api_key="<your-aifoundry-key>",    base_url="https://api.aifoundry.co.nz/v1",)resp = client.chat.completions.create(    model="nemotron-omni-30b-a3b",  # or "qwen3-coder-next"    messages=[{"role": "user", "content": "Hello!"}],)print(resp.choices[0].message.content)
// pricing

Flat-fee pricing. No token counting.

Two plans. One API key per subscription. 8 concurrent requests. Cancel anytime.

OpenClaw

Multimodal chat, vision, OCR, and custom agents.

$0USD/ month
USD · +15% NZ GST for NZ customers
Model

Nemotron Omni 30B

Context

128k tokens

API ID

nemotron-omni-30b-a3b

Text
Vision
Audio
OCR

Everyday assistants

Hermes agents

Document scanning (OCR)

Custom agents

Slack and chat bots

Multimodal apps

Subscribe to OpenClaw

Coder Pro

Developer Favorite

Long-context coding for agents, refactors, and code review.

$0USD/ month
USD · +15% NZ GST for NZ customers
Model

Qwen3 Coder Next

Context

256k tokens

API ID

qwen3-coder-next

Code reasoning
Multi-file refactor
Agent harness
Code review

Coding agents

Multi-file refactors

Codebase analysis

Code review and PRs

Test generation

Long-running tooling

Subscribe to Coder Pro
+15% NZ GST added at checkout for New Zealand customers. Want both models? Subscribe to both plans — one API key per subscription.
// why_ai_foundry

Built for buyers who care where data lives.

Four reasons regulated NZ businesses and indie developers pick a sovereign inference provider.

01

Your data stays in NZ

Every other major inference provider routes through US, EU, or Australian regions. We don't. AI Foundry Inference runs on hardware we own, in a single Auckland facility, with NZ-controlled networking. Nothing crosses the border — not for failover, not for monitoring, not ever.

02

Local latency, no trans-Pacific tax

Auckland-hosted means your application talks to a model on the same island. No 200ms round-trip to Virginia. No regional routing surprises. The further your customers are from the model, the slower their experience — keep it here.

03

One flat fee, no token math

Subscription pricing in USD with NZ GST added at checkout — billed monthly through Stripe. No per-token charges, no surprise invoices, no spreadsheets to forecast usage. Token counts are reported for visibility, not billing.

04

NZ-owned, in for the long haul

We're a New Zealand company building local AI infrastructure. Subscriptions keep revenue in-country and fund more capacity here. If sovereignty matters to your buyers, it matters to ours too.

// models

Two models. Pick what you build with.

Each subscription provisions one API key entitled to one model. Want both? Subscribe to both plans.

Nemotron Omni 30B

nemotron-omni-30b-a3b

128k

context

OpenClaw
Parameters
0B
Modalities
Text
Vision
Audio
OCR

Strengths

Multimodal reasoning across text, images, audio, and video

Strong document understanding and OCR

Powers Hermes-style and custom agents

General-purpose chat and assistants

Ideal for
Customer-facing chat
Hermes Agent
Document scanning (OCR)
Custom agents
Slack and Teams bots
Vision-enabled workflows

Qwen3 Coder Next

qwen3-coder-next

256k

context

Coder Pro
Parameters
0B
Modalities
Text
Code

Strengths

Specialised for code generation and refactoring

256k context for multi-file reasoning

Built for agent harnesses and long-running tasks

Ideal for
Coding agents
Multi-file refactors
Repository analysis
Developer tooling
// included

What’s in every subscription

One key, two halves of the offer.

/ Inference

OpenAI-compatible /v1/chat/completions endpoint

Streaming responses and tool use

GET /v1/models entitlement check for your key

8 concurrent requests per API key

Multi-turn chat, system prompts, temperature, max_tokens

/ Billing & support

Self-serve plan management via Stripe Customer Portal

Token counts reported in API responses (visibility, not billing)

Email support and status page

// dedicated-gpu.aifoundry.co.nz

Your model. Our hardware. Private API.

Rent a dedicated GPU in our Auckland facility. Bring any model weights you have rights to — we load them and give you a private, OpenAI-compatible endpoint. Nothing shared, nothing logged, nothing offshore.

Dedicated 24GB

Entry tier for small-to-mid models and prototyping.

Pricing

Contact for pricing

Bespoke per workload · monthly or term commitments
GPU

NVIDIA RTX PRO 4000 Blackwell

VRAM

24 GB GDDR7

Tenancy

Dedicated · single tenant

Dedicated GPU
Private endpoint
NZ sovereign
Bring your own weights

Up to 14B dense models

Quantised 30B-class (Q4/Q5)

Fine-tune serving & LoRA adapters

Embedding and reranker workloads

Internal prototypes and pilots

See the spec

Dedicated 96GB

Flagship

Flagship for 70B-class models and long-context workloads.

Pricing

Contact for pricing

Bespoke per workload · monthly or term commitments
GPU

NVIDIA RTX PRO 6000 Blackwell

VRAM

96 GB GDDR7

Tenancy

Dedicated · single tenant

Dedicated GPU
Private endpoint
NZ sovereign
Bring your own weights

70B-class dense models

100B+ MoE (active params permitting)

Long-context coding and agent stacks

Vision-language and multimodal models

Production workloads at single-tenant latency

See the spec
Pricing is bespoke per workload — model size, expected concurrency, and retention all factor in.
// faq

Questions, answered.

The short version, with the receipts.

All inference runs on AI Foundry-owned hardware in Auckland, New Zealand, hosted in Datacom Datacentres. Prompts and responses never leave the country — no cloud-provider passthrough, no offshore failover.

Drop-in compatible. Point your existing OpenAI SDK (Python, JS, LangChain, LiteLLM, etc.) at https://api.aifoundry.co.nz/v1, pass your AI Foundry key, and your code works. Chat completions, streaming, multimodal messages, and the usage field all match OpenAI shapes.

Yes — subscribe to both plans. Each subscription provisions its own API key, each entitled to its own model. An OpenClaw key calling the Coder Pro model returns 403, so keys stay scoped to what they paid for.

Each API key has 8 concurrent in-flight requests. Exceeding that returns HTTP 429 with a Retry-After header — the standard OpenAI convention every SDK already handles. No customer-visible RPM or TPM caps. Need more concurrency? Subscribe to the same plan again for an additional key.

Plans are billed in USD via Stripe. For New Zealand customers, 15% GST is added automatically at checkout as a separate line item — not bundled into the headline price.

You keep access until the end of the paid period. No pro-rated refunds for partial months. Cancellation revokes the API key, and you can resubscribe at any time. Manage everything self-serve through the Stripe Customer Portal.

No. Prompts and completions are not logged for training. Token counts are reported in API responses for your visibility, but the content of your requests stays yours.

Yes — that is our Models as a Service offer. Rent a dedicated NVIDIA Blackwell GPU in our Auckland facility (24GB or 96GB VRAM), bring any model weights you have rights to, and we hand back a private OpenAI-compatible endpoint. Single tenant, NZ sovereign, no shared infrastructure. See /services/maas.

Our subscription inference runs on AI Foundry-owned GPU hardware in Auckland. Models as a Service deployments use NVIDIA RTX PRO Blackwell GPUs (24GB or 96GB) per dedicated tenant. Everything sits in Datacom Datacentres on NZ-controlled networking.

Active subscriptions include email support and access to the status page. For enterprise volume, custom SLAs, or hardware partnerships, get in touch via the contact page.

the gpu killer · positron

The end of the GPU era. Titan is here.

Purpose-built inference silicon. Not graphics. Not training. Inference.

  • 8 TB+memory per system
  • 16 Tparameters, single 4U box
  • 10 M+token context
  • 4,096Titans per cluster

Positron's exclusive APAC reseller. GPUs were a detour — this is the road forward.

See Positron hardware
// ready_when_you_are
live now · plans from $7 USD/mo

Start building today.

Sign up, generate an API key, and send your first request in minutes. Cancel from the Stripe portal any time.