Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon (tridao.me)

24 points by jxmorris12 3 days ago | 4 comments

jnwatson 5 hours ago [-]

Back in the elden days, I took a course called "Large Scale Scientific Computing". It was mostly about multiplying large matrices. I didn't think this was going to be remotely applicable to anything commercial.

Boy was I wrong.

cs702 6 hours ago [-]

A superior alternative to standard Muon and AdamW optimizers for training large models.

Fantastic work, instantly valuable, immediately usable.

A big THANK YOU to the authors:

Jack Zhang, Noah Amsel, Berlin Chen, and Tri Dao

ainch 5 hours ago [-]

Tri Dao's lab must have saved countless watts with FlashAttention. Great to see them continuing to open-source massive efficiency gains.

akoboldfrying 5 hours ago [-]

Only read the first section but this sounds really impressive -- up to 50% of up to 17% of training time when using the Muon optimiser, so up to around 7% of basically pure improvement with no downside.

Rendered at 05:22:08 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.