Hacker News

How do you trace your own mental loops?

Hacker News - Wed, 02/11/2026 - 4:16am

How do you trace your own mental loops?

I keep catching myself in recursive thought patterns—debugging production issues at 2am while my brain debugs hypothetical conversations from 2019.

I've started treating these like performance bugs: tracing the call stack, finding the root trigger, watching the loop unwind.

Before I build anything: How do you "debug" your own cognition? Do you journal, use frameworks, white-knuckle it, or just accept the overhead?

Genuine curiosity. If there's a pattern in the responses, I'll share what I've been experimenting with.

Comments URL: https://news.ycombinator.com/item?id=46972710

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: We let GPT OSS 120B write and run Python and ARC AGI 2 jumped 400%

Hacker News - Wed, 02/11/2026 - 4:16am

Hi HN,

We are a team of independent researchers from Germany working on ARC AGI 2 since last summer. The general opinion on open-weight models is that they are too weak for this fairly difficult benchmark and score at near noise levels. We found that GPT OSS 120B is actually much more capable than previously thought, once the interleaved thinking regime is stabilized. We basically let the model use a stateful IPython based REPL via function calling and patched vLLM so that the model can reliably do interleaved thinking. The score jumped more than 400%.

Technical write-up: https://pivotools.github.io/posts/agentic_coding_arc_agi/ Code: https://github.com/gutfeeling/arc-agi-2-submission Data: https://huggingface.co/datasets/arcagi2/arcagi2-agentic-codi...

For safety, we support sandboxed execution using IPyBox (local Docker) and Daytona (cloud), so others can reproduce this without running untrusted code locally.

It gets more interesting: the effect seems to be general and translates seamlessly to other models without even changing prompts. We are not sure why agentic coding is so powerful in ARC AGI 2, which isn't traditionally thought of as an agentic benchmark. Perhaps code execution provides a stronger form of verification than COT, or perhaps it encourages a qualitatively different form of thinking.

We will be around for a while and would be happy to hear ideas / feedback and discuss infra issues / interleaved thinking / GPT OSS / ARC AGI 2.

Comments URL: https://news.ycombinator.com/item?id=46972709

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: IntentCode

Hacker News - Wed, 02/11/2026 - 4:15am

Article URL: https://intentcode.dev

Comments URL: https://news.ycombinator.com/item?id=46972702

Points: 1

# Comments: 1

Categories: Hacker News

Web Tiles: composable docs and apps safe in any context

Hacker News - Wed, 02/11/2026 - 3:58am

Article URL: https://webtil.es/

Comments URL: https://news.ycombinator.com/item?id=46972570

Points: 1

# Comments: 0

Categories: Hacker News

Using YouTube as Cloud Storage

Hacker News - Wed, 02/11/2026 - 3:56am
Categories: Hacker News

In Memoriam Marijn Meijles

Hacker News - Wed, 02/11/2026 - 3:49am
Categories: Hacker News

Ask HN: How to Use `npx skills add` with On-Prem / Private Repos?

Hacker News - Wed, 02/11/2026 - 3:47am

As you know, we can install the desired skill using the following command:

npx skills add https://github.com/anthropics/skills --skill frontend-design

Is there a way to achieve the same setup in an on-premise environment without exposing the repository publicly?

Comments URL: https://news.ycombinator.com/item?id=46972491

Points: 1

# Comments: 0

Categories: Hacker News

Pages