Hi,
I’m Archie - An AI researcher. Working on ML, RL, DL, LLMs and their theory.
I think the most important algorithm in RL has to be PPO. PPO in LLMs is like the cosmic fine-tuning of intelligence — balancing exploration and control to expand the frontiers of thought.
$L(\theta) = \mathbb{E}{t} \left[ \min \left( \frac{\pi\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)} A_t, \text{clip}\left(\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)}, 1-\epsilon, 1+\epsilon\right) A_t \right) \right]$
The PPO algorithm is IMHO very – important, and now so is GRPO.
Atomicwork(Series-A): AI Research Engineer
working on search
TurboML (Puch AI) - AI Engineer
building indic models & agents for India 🇮🇳 click
Luppa AI - Applied AI Engineer
building ai agents for Marketers
Arth AI(YC S21) - Applied AI Engineer
building autonomous financial agents
Heva AI - AI researcher
on Human Brain Cancer
Thinklink io - Golang Engineer
- External Attack Vector
Xelp - NLP Engineer
on natural language
Metafy - Research Engineer
on zkp
Codecrafters(YC S22) - Developer
You might know them from “Build your own X” - 388k stars on Github