NVIDIA: Llama 3.1 Nemotron Nano 8B v1 (free)

nvidia/llama-3.1-nemotron-nano-8b-v1:free

About NVIDIA: Llama 3.1 Nemotron Nano 8B v1 (free)

Llama-3.1-Nemotron-Nano-8B-v1 is a compact large language model (LLM) derived from Meta's Llama-3.1-8B-Instruct, specifically optimized for reasoning tasks, conversational interactions, retrieval-augmented generation (RAG), and tool-calling applications. It balances accuracy and efficiency, fitting comfortably onto a single consumer-grade RTX GPU for local deployment. The model supports extended context lengths of up to 128K tokens.

Note: you must include

detailed thinking on
in the system prompt to enable reasoning. Please see Usage Recommendations for more.

Specifications

Context Length

131,072

Tokenizer

Other

Pricing

Prompt

0.000

Completion

0.000

Image

0

Request

0

Last updated: 4/11/2025