Hi,
Iām Archie - An AI researcher. Working on ML, RL, DL, LLMs and their theory.
I think the most important algorithm in RL has to be PPO. PPO in LLMs is like the cosmic fine-tuning of intelligence ā balancing exploration and control to expand the frontiers of thought.
$L(\theta) = \mathbb{E}{t} \left[ \min \left( \frac{\pi\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)} A_t, \text{clip}\left(\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)}, 1-\epsilon, 1+\epsilon\right) A_t \right) \right]$
The PPO algorithm is IMHO very ā important, and now so is GRPO.
TurboML - AI Engineer
real-time ML, low-latency feature engineering
Luppa AI - Applied AI Engineer
building ai agents for Marketers
Stack: Arth AI(YC S21) - Applied AI Engineer
building autonomous financial agents
Heva AI - AI researcher
on Human Brain Cancer
Thinklink io - Golang Engineer
- External Attack Vector
Xelp - NLP Engineer
on natural language
Metafy - Research Engineer
on zkp