大模型偏好对齐

大模型偏好对齐

从PPO到DPO, KTO, ……

crabboss 2024 年 7 月 21 日 0Comment

目前大模型的偏好对齐领域也是百家齐放，我…

Read More

大模型偏好对齐

偏好对齐篇 – ORPO训练

crabboss 2024 年 7 月 20 日 0Comment

ORPO原理如下：从PPO到DPO, K…

Read More

大模型偏好对齐

偏好对齐篇 – CPO训练

crabboss 2024 年 7 月 20 日 0Comment

CPO原理如下：从PPO到DPO, KT…

Read More

大模型偏好对齐

偏好对齐篇 – KTO训练

crabboss 2024 年 7 月 20 日 0Comment

DPO相比PPO已经简化了训练流程和训练…

Read More

大模型偏好对齐

偏好对齐篇 – DPO训练

crabboss 2024 年 7 月 20 日 0Comment

自从huggingface的trl出现后…

Read More

大模型偏好对齐

深度探究PPO

crabboss 2024 年 5 月 30 日 0Comment

目前来说主流的RLHF方向分为两大类： …

Read More

Black-Box Prompt Optimization: Aligning Large Language Modelswithout Model Training

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

RAG和Long-Context的看法

大模型如何缓解微调过程的遗忘问题？