Hi,

Iā€™m Archie - An AI researcher. Working on ML, RL, DL, LLMs and their theory.

I think the most important algorithm in RL has to be PPO. PPO in LLMs is like the cosmic fine-tuning of intelligence ā€” balancing exploration and control to expand the frontiers of thought.

PPO Algorithm:

$L(\theta) = \mathbb{E}{t} \left[ \min \left( \frac{\pi\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)} A_t, \text{clip}\left(\frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_{\text{old}}}(a_t|s_t)}, 1-\epsilon, 1+\epsilon\right) A_t \right) \right]$

The PPO algorithm is IMHO very ā€“ important, and now so is GRPO.

Work

TurboML - AI Engineer real-time ML, low-latency feature engineering

Luppa AI - Applied AI Engineer building ai agents for Marketers

Stack: Arth AI(YC S21) - Applied AI Engineer building autonomous financial agents

Heva AI - AI researcher on Human Brain Cancer

Thinklink io - Golang Engineer - External Attack Vector

Xelp - NLP Engineer on natural language

Metafy - Research Engineer on zkp


Github: