Mountain View, CA — April 5th, 2025 — Groq, the pioneer in AI inference, has launched Meta’s Llama 4 Scout and Maverick models, now live on GroqCloud™. Developers and enterprises get day-zero access to the most advanced openly-available AI models available.
That speed is possible because Groq controls the full stack—from our custom-built LPU to our vertically integrated cloud. The result: models go live with no delay, no tuning, and no bottlenecks—and run at the lowest cost per token in the industry, with full performance.
“We built Groq to drive the cost of compute to zero,” said Jonathan Ross, CEO and Founder of Groq. “Our chips are designed for inference, which means developers can run models like Llama 4 faster, cheaper, and without compromise.”
With Llama 4 models live, developers can run cutting-edge multimodal workloads while keeping costs low and latency predictable.
See Groq pricing here.
Llama 4 is Meta’s latest openly-available model family, featuring Mixture of Experts (MoE) architecture and native multimodality.
Llama 4 Scout and Maverick are accessible through:
Start building today at console.groq.com.
Free access is available, or upgrade for worry-free rate limits and higher throughput.