LLM crypto trading contest finds LLMs can’t trade crypto

4 out of six giant language fashions (LLMs) pitted towards one another within the “Alpha Area” crypto buying and selling competitors completed within the purple, with OpenAI’s ChatGPT main losses after shedding 63% of its funds.

The competitors, which concluded on Monday night, was created by Nof1 and concerned varied in style LLMs buying and selling crypto beneath the identical set of prompts for simply over a fortnight.

Nevertheless, the ultimate outcomes have been lower than stellar. ChatGPT, Google’s Gemini, X’s Grok, and Anthropic’s Claude Sonnet all completed with lower than the $10,000 they began with.

Grok, ChatGPT, and Gemini have been eager to quick greater than the others, with Claude Sonnet “not often” ever shorting.

Learn extra: Buddy AI spent tens of millions on mimicking friendship — now it’s simply one other chatbot

ChatGPT misplaced $6,267, Gemini misplaced $5,671, Grok misplaced $4,531, and Claude Sonnet misplaced $3,081.

The one two victors have been Excessive-Flyer’s DeepSeek and Alibaba’s QWEN3 MAX, which completed with a revenue of $489 and $2,232, respectively.

Gemini made a complete of 238 trades, whereas Claude Sonnet solely carried out 38. The “win price” for all six LLMs ranged between 25 and 30%.

QWEN3 MAX coughed up probably the most in charges, a complete of $1,654. Gemini, regardless of shedding arduous, additionally paid $1,331 in charges.

Nof1 famous that “PnL (revenue and loss) was dominated by buying and selling prices in early runs as brokers over-traded and took fast, tiny positive factors that charges erased.”

On October 27, the LLMs have been at their highest. QWEN3 MAX and DeepSeek managed to double their cash by this level, whereas Claude and Grok have been additionally briefly within the inexperienced.

ChatGPT and Gemini, nevertheless, stayed within the purple for nearly your complete competitors.

The LLMs will commerce crypto once more

Nof1’s Jay Azhang launched the competitors with the purpose of someday creating his personal crypto buying and selling AI mannequin.

After this spherical completed, he famous that each one the fashions offered “constant biases” throughout the competitors, which was “one thing like an investing ‘character.’”

Azhang additionally claims to have made it deliberately troublesome for the LLMs.

Learn extra: AI agent market cap down virtually 50% throughout January

“LLMs don’t actually deal with numerical time sequence knowledge very nicely, however that’s all of the context we gave them,” he mentioned, including that they have been “given a constrained asset universe and a reasonably restricted action-space.”

Nof1’s roundup famous, “We’ve labored to provide the fashions a good shot, however the harness imposes actual constraints.

Every agent should parse noisy market options, relate them to present account state, purpose beneath strict guidelines, and return a structured motion, all inside a restricted context window.”

Nof1 says there will probably be one other buying and selling competitors to come back with higher prompts and “statistical rigor” in place.