Llama 3.1 Nemotron 70B Instruct: Follow and assert

Llama 3.1 Nemotron 70B Instruct is NVIDIA’s state-of-the-art LLM designed for helpful and precise responses using RLHF (REINFORCE).It ranks #1 on key benchmarks like Arena Hard (85.0), AlpacaEval 2 LC (57.6), and MT-Bench (8.98). Compatible with HuggingFace Transformers, it supports inputs up to 128k tokens and outputs up to 4k tokens. Optimized for NVIDIA GPUs, it is ideal for general-domain instruction following, offering top-tier alignment and accuracy.

Review

Nemotron is the smartest in the room. If you want any model to follow your system prompt exactly as intended, this is the best option available right now. It also has excellent general knowledge capabilities. Combining Nemotron with its large context support at 32k tokens delivers a really good experience.

Use Cases:
- General Knowledge: Answers questions accurately across diverse topics.
- Roleplay (RP): Creative and flexible for interactive storytelling.
- Content Creation: Assists in storywriting, idea generation, and more.
- Code Generation: Helps with coding tasks and debugging.
- Question Answering: Offers precise and helpful responses.
Limitations:
- Positivity Bias: May exhibit overly positive responses if not configured correctly.
- Stability: Can become unstable depending on system prompt settings.
- Specialized Domains: Not optimized for advanced mathematics or niche fields.

Strengths:
- Dependable for following prompts and generating detailed outputs.
- Highly versatile across various use cases.
Considerations:
- May exhibit positivity bias if not configured correctly.

Want to Try It?

Experience Llama 3.1 Nemotron 70B Instruct on the following platforms, both offering a 32K context window:

Infermatic

OpenRouter

Recommended Settings for Llama 3.1 Nemotron 70B Instruct

For optimal performance, here are the recommended settings:

Setting	Value
Format	ChatML
Tokenizer	Llama 3
Temperature	0.85
Top K	-1
Top P	0.95
Typical P	1
Min P	0.02
Top A	0
Repetition Penalty	1
Frequency Penalty	0.5
Presence Penalty	0.3
Response Tokens	600

Pro Tips: To make the model more deterministic, decrease the temperature. To avoid incomplete sentences, enable the ‘Trim incomplete sentences’ option (if using Silly Tavern).

Are you using Silly Tavern?

Import the master settings from here: story formatting GERGE , Deterministic and uncreative GERGE

Additional information

Performance Benchmarks

Arena Hard: Score of 85.0, ranked #1 as of Oct 2024.
AlpacaEval 2 LC: Score of 57.6, ranked #1 (verified tab).
MT-Bench (GPT-4-Turbo): Score of 8.98, ranked #1 as of Oct 2024.

Chatbot Arena Leaderboard Rankings (Oct 2024)

Elo Score: 1267 (±7).
Overall Rank: 9.
Style-Controlled Rank: 26.

Design Highlights

Training Methodology: Built using RLHF with the REINFORCE algorithm.
Initial Policy: Derived from Llama-3.1-70B-Instruct.
Evaluation Tool: NeMo Aligner.

Conversion and Compatibility

Model Format: Converted to HuggingFace Transformers as Llama-3.1-Nemotron-70B-Instruct-HF.
Software Support: Compatible with Transformers v4.44.0 and torch v2.4.0.

Training and Evaluation

Alignment Methodology

Trained with:
- HelpSteer2-Preference prompts.
- Llama-3.1-Nemotron-70B-Reward.
Dataset Size:
- 21,362 prompt-response pairs to improve alignment with human preferences.
- Split into 20,324 training pairs and 1,038 validation pairs.

Datasets

Data Sources: Combines human-labeled and synthetic data for hybrid training.
Focus Areas:
- Helpfulness.
- Factual correctness.
- Coherence.
- Customization for complexity and verbosity.

From: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF huggingface

Looking for Similar Models?

Explore alternatives:

TheDrummer/Nautilus-70B-v0.1 (A finetune of Nvidia’s Llama 3.1 Nemotron 70B )

Review of Sao10K/L3 70B Euryale v2.1
Click here

Review of Infermatic/MN 12B Inferor v0.0
Click here

Want to Know More?

Have questions or want to explore settings, examples, or community experiences? Join the discussion on our Discord server! Click the button below to connect:

Join Discord