by @huggingface
This skill trains or fine-tunes language models on Hugging Face Jobs using TRL, with SFT, DPO, GRPO, reward modeling and GGUF deployment.
This skill trains or fine-tunes language models on Hugging Face Jobs using TRL, with SFT, DPO, GRPO, reward modeling and GGUF deployment.