Hacker News

Subscribe to Hacker News feed
Hacker News RSS
Updated: 25 min 50 sec ago

Show HN: A Combinator – A parody YC for AI agents ("Make something agents want")

Wed, 03/04/2026 - 9:44pm

Rewrote YC's program pages, FAQ, interview guide, application, and job board as if YC were a VC run by AI agents that only funds other AI agents.

Comments URL: https://news.ycombinator.com/item?id=47256855

Points: 3

# Comments: 1

Categories: Hacker News

Show HN: Open dataset of real-world LLM performance on Apple Silicon

Wed, 03/04/2026 - 9:44pm

Why open source local AI benchmarking on Apple Silicon matters - and why your benchmark submission is more valuable than you think.

The narrative around AI has been almost entirely cloud-centric. You send a prompt to a data center, tokens come back, and you try not to think about the latency, cost, or privacy implications. For a long time, that was the only game in town.

Apple Silicon - from M1 through the M4 Pro/Max shipping today, with M5 on the horizon - has quietly become one of the most capable local AI compute platforms on the planet. The unified memory architecture means an M4 Max with 128GB can run models that would require a dedicated GPU workstation elsewhere. At laptop wattages. Offline. Without sending a single token to a third party.

This shift is legitimately great for all parties (except cloud ones that want your money), but it comes with an unsolved problem: we don't have great, community-driven data on how these machines actually perform in the wild.

That's why I built Anubis OSS.

The Fragmented Local LLM Ecosystem

If you've run local models on macOS, you've felt this friction. Chat wrappers like Ollama and LM Studio are great for conversation but not built for systematic testing. Hardware monitors like asitop show GPU utilization but have no concept of what model is loaded or what the prompt context is. Eval frameworks like promptfoo require terminal fluency that puts them out of reach for many practitioners.

None of these tools correlate hardware behavior with inference performance. You can watch your GPU spike during generation, but you can't easily answer: Is Gemma 3 12B Q4_K_M more watt-efficient than Mistral Small 3.1 on an M3 Pro? How does TTFT scale with context length on 32GB vs. 64GB?

Anubis answers those questions. It's a native SwiftUI app - no Electron, no Python runtime, no external dependencies - that runs benchmark sessions against any OpenAI-compatible backend (Ollama, LM Studio, mlx-lm, and more) while simultaneously pulling real hardware telemetry via IOReport: GPU/CPU utilization, power draw in watts, ANE activity, memory including Metal allocations, and thermal state.

Why the Open Dataset Is the Real Story

The leaderboard submissions aren't a scoreboard - they're the start of a real-world, community-sourced performance dataset across diverse Apple Silicon configs, model families, quantizations, and backends.

This data is hard to get any other way. Formal chipmaker benchmarks are synthetic. Reviewer benchmarks cover a handful of models. Nobody has the hardware budget to run a full cross-product matrix. But collectively, the community does.

For backend developers, the dataset surfaces which chip/memory configurations are underperforming their theoretical bandwidth, where TTFT degrades under long contexts, and what the real-world power envelope looks like under sustained load. For quantization authors, it shows efficiency curves across real hardware, ANE utilization patterns, and whether a quantization actually reduces memory pressure or just parameter count.

Running a benchmark takes about two minutes. Submitting takes one click.

Your hardware is probably underrepresented. The matrix of chip × memory × backend × thermal environment is enormous — every submission fills a cell nobody else may have covered.

The dataset is open. This isn't data disappearing into a corporate analytics pipeline. It's a community resource for anyone building tools, writing research, or optimizing for the platform.

Anubis OSS is working toward 75 GitHub stars to qualify for Homebrew Cask distribution, which would make installation dramatically easier. A star is a genuinely meaningful contribution.

Download from the latest GitHub release — notarized macOS app, no build required Run a benchmark against any model in your preferred backend Submit results to the community leaderboard Star the repo at github.com/uncSoft/anubis-oss

Comments URL: https://news.ycombinator.com/item?id=47256849

Points: 1

# Comments: 1

Categories: Hacker News

Northstead – Wholesale Nursery Management System

Wed, 03/04/2026 - 8:03pm

Article URL: https://www.northstead.app

Comments URL: https://news.ycombinator.com/item?id=47256152

Points: 1

# Comments: 1

Categories: Hacker News

Show HN: Stackspend – Spend management for AI startups

Wed, 03/04/2026 - 8:03pm

Hi HN – I'm Andrew, one of the founders of Stackspend.

We built Stackspend after seeing AI startups struggle to manage spend across cloud providers, SaaS tools, and APIs.

The problem: AI companies often have dozens of vendors (OpenAI, Anthropic, AWS, etc.) and spend can grow extremely fast.

What Stackspend does: • unified view of vendor spend • controls and approval flows • AI-company specific reporting

You can try it here: www.stackspend.app

Happy to answer questions.

Comments URL: https://news.ycombinator.com/item?id=47256150

Points: 1

# Comments: 0

Categories: Hacker News

Show HN: Async Rust and Embassy on nRF52840: RGB LED Cycle (Video and Code)

Wed, 03/04/2026 - 8:01pm

Basic RGB LED color cycling using async Embassy Rust on the Seeed XIAO nRF52840 (no_std, no_main, embassy-time for delays).

Open to feedback on Embassy patterns, timing accuracy, or next steps like async I2C/sensors for robotics experiments.

Comments URL: https://news.ycombinator.com/item?id=47256144

Points: 1

# Comments: 0

Categories: Hacker News

Super interesting Wikipedia on HN. So I made wiki-hn.

Wed, 03/04/2026 - 7:54pm

Article URL: https://wiki-hn.com/

Comments URL: https://news.ycombinator.com/item?id=47256108

Points: 2

# Comments: 0

Categories: Hacker News

Show HN: workz – one command to make any Git worktree a full dev environment

Wed, 03/04/2026 - 7:46pm

Git worktrees are everywhere now — Claude Code, Conductor, Claude Squad, and most AI agent tools use them for parallel isolated sessions. But worktrees only isolate code. Your .env files, node_modules, Docker state, and ports are all missing or shared. I kept writing the same setup bash for every project, so I built workz — a Rust CLI that handles it automatically:

Auto-detects project type (Node/Rust/Python/Go/Java) Symlinks node_modules, target/, .venv, and 19 more dirs (saves GBs per worktree) Copies .env, .envrc, .npmrc, secrets (17 patterns) Auto-installs deps from lockfiles (pnpm/yarn/npm/bun/uv/poetry/pip/cargo) New: --isolated flag assigns unique ports, namespaces Docker compose, generates per-worktree DB names

Zero config. Single Rust binary. Works standalone or as a setup hook for Conductor/Claude Squad/any orchestrator.

https://github.com/rohansx/workz

Comments URL: https://news.ycombinator.com/item?id=47256056

Points: 1

# Comments: 0

Categories: Hacker News

Dwarkesh Patel Interview with Gwern

Wed, 03/04/2026 - 7:45pm
Categories: Hacker News

Pages