LLM Reference

LLM Reference helps tech leaders quickly find and compare the best AI models and providers for their specific project needs.

Visit

Published on:

May 29, 2026

Category:

Pricing:

LLM Reference application interface and features

About LLM Reference

LLM Reference is a decision-support directory built specifically for engineers and technology leaders who need to navigate the rapidly expanding landscape of large language models. The platform tracks over 1,800 language models from more than 140 providers and 247 research labs, with data refreshed weekly to include new releases, verified price changes, and benchmark updates. The core value proposition is simple: stop wasting time hunting through scattered sources and start shipping with confidence. Whether you are building a coding assistant, an agentic workflow, a writing tool, or a research pipeline, LLM Reference gives you a single, trustworthy place to compare models side-by-side, see who offers the cheapest pricing for frontier output, and browse curated editors' picks for specific tasks including coding, agents, writing, research, image generation, and video creation. The site is designed for fast triage, enabling you to quickly identify the right model for your job, determine the most cost-effective provider, and get back to building. With a Pulse feed that highlights what changed each week, including new models, price cuts, and benchmark refreshes, LLM Reference keeps you informed without the noise. It is built by the Data Advantage project and updated daily, making it an essential resource for anyone who needs to stay current with the exploding LLM ecosystem.

Features of LLM Reference

Comprehensive Model Directory

LLM Reference maintains an extensive database of 1,843 language models from 140 providers and 247 research labs. The directory is searchable by task type such as coding, RAG, agents, long context, vision, classification, and JSON or tool use. Each model entry includes detailed information on capabilities, benchmarks, pricing, and provider details, allowing you to filter and sort based on your specific requirements. The data is refreshed weekly to ensure you always have access to the most current information available in the market.

The platform features expert-curated selections for specific use cases, organized by audience type. Developers can find top recommendations for coding, agents, tool use, open weights models, long context handling, and cheap options. Knowledge workers get picks for writing, research, summarization, docs Q&A, translation, and data or SQL tasks. Creatives benefit from curated choices for image generation, video creation, voice TTS, transcription, music, and image editing. Each pick includes an excellence rating and detailed reasoning for why it was selected.

Pulse Feed and Weekly Updates

The Pulse section provides a real-time snapshot of what changed in the model market each week. It tracks new models, verified provider price cuts, and benchmark refreshes. Recent activity showed 177 new models added, 53 price cuts verified, and 368 benchmark refreshes processed. This feature eliminates the need to monitor multiple sources for updates, delivering all relevant changes in one centralized feed. The top frontier output pricing is also highlighted, with current cheapest rate displayed prominently.

Side-by-Side Model Comparison

LLM Reference includes a dedicated comparison tool that allows you to evaluate two models directly against each other. This feature is essential for making informed decisions when choosing between competing options. You can compare performance across multiple benchmarks, pricing structures, and provider offerings. The comparison tool integrates with the broader directory, enabling you to quickly pull up any model from the database and see how it stacks up against alternatives in real time.

Use Cases of LLM Reference

Selecting the Best Model for Coding Tasks

Engineering teams building coding assistants or developer tools can use LLM Reference to identify the most capable models for code generation, debugging, and software engineering. The platform tracks specialized benchmarks like SWE-bench Pro and SWE-bench Verified, and provides editors' picks for coding tasks. For example, the current top coding pick is Claude Fable 5, which achieves 80.3% on SWE-bench Pro and 96% on SWE-bench Verified, making it the best production coding choice for non-trivial engineering tasks.

Optimizing Costs for Frontier AI Usage

Technology leaders managing AI budgets can leverage the platform's pricing data to find the most cost-effective providers for frontier-level models. LLM Reference tracks verified price cuts weekly and displays the cheapest frontier output pricing prominently. The current cheapest frontier output is Hunyuan HY3 Preview via Tencent Cloud TI Platform at $0.260 per 1M output tokens. This allows teams to balance performance requirements with budget constraints without manually researching each provider's pricing page.

Building Agentic Workflows and Tool Loops

Developers creating autonomous agents and complex tool-using systems can rely on LLM Reference to find models optimized for agentic tasks. The platform tracks benchmarks like tau-bench and provides editors' picks for agents. The current top agent pick is Claude Sonnet 4.6, which achieves 87.5 on tau-bench and demonstrates strong self-correction capabilities across long tool loops. This helps teams select models that maintain reliability and accuracy in multi-step autonomous workflows.

Evaluating Models for Research and Knowledge Work

Researchers and knowledge workers can use LLM Reference to identify models excelling in analytical tasks, document analysis, and data processing. The platform tracks specialized benchmarks like GDPval-AA ELO and provides picks for research, summarization, translation, and data or SQL tasks. For instance, Claude Fable 5 ranks as the top research pick with a GDPval-AA ELO of 1932 and strong performance in finance, trading, and analytics. This enables teams to select models that deliver accurate and insightful results for complex knowledge work.

Frequently Asked Questions

How often is the data on LLM Reference updated?

The data is refreshed weekly to include new model releases, verified price changes, and benchmark updates. The platform is built by the Data Advantage project and updated daily, ensuring that you always have access to the most current information. The Pulse feed highlights exactly what changed each week, including the number of new models added, price cuts verified, and benchmark refreshes processed.

Can I compare two models side by side on LLM Reference?

Yes, LLM Reference includes a dedicated comparison tool that allows you to evaluate two models directly against each other. You can access this feature from the main navigation or by searching for specific models. The comparison integrates performance benchmarks, pricing data, and provider information to help you make informed decisions. Popular comparisons include Claude Fable 5 versus Claude Opus 4.8 and GPT-5.5 versus Gemini 3.1 Pro Preview.

The platform covers a wide range of tasks organized by audience type. For developers, picks include coding, agents, tool use, open weights models, long context, and cheap options. For knowledge workers, picks include writing, research, summarization, docs Q&A, translation, and data or SQL tasks. For creatives, picks include image generation, video creation, voice TTS, transcription, music, and image editing. Each pick includes an excellence rating and detailed reasoning.

Is LLM Reference free to use?

Yes, LLM Reference is freely accessible as a decision-support directory. The platform is built by the Data Advantage project and provides all core features including model search, comparison tools, editors' picks, the Pulse feed, and pricing data at no cost. There are no subscription fees or paywalls for accessing the model directory, benchmark data, or weekly updates.

Similar to LLM Reference

Optimize your voice channels with Oravaa. Deploy conversational Voice AI to resolve 24/7 support calls, qualify web leads, and manage reminders.

AI copilot helps ace live remote interviews.

Receptri is an AI receptionist that answers calls and chats 24/7, manages bookings, and learns about your business effortlessly.

Avatai lets you create AI avatars that present information, answer questions, and interact with users in a humanized interface.

FX Radar uses AI to filter market noise and deliver real-time forex sentiment and key movers in seconds.

Personal Agent is your AI companion that seamlessly transforms thoughts into polished tasks across all your devices.

Prompt Builder lets you generate, optimize, and manage AI prompts effortlessly for all models, saving time and enhancing results.