Hacker News

Detect control characters, quotes and backslashes efficiently using SWAR

Hacker News - Sun, 04/13/2025 - 6:55pm

Article URL: https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/

Comments URL: https://news.ycombinator.com/item?id=43676460

Points: 2

# Comments: 0

Categories: Hacker News

Show HN: H-1B salary search without fuss

Hacker News - Sun, 04/13/2025 - 6:52pm

The US Department of Labor releases base salary data for every H-1B visa application and has done so for the past 19 years. This data is often useful to people negotiating an offer, researching a company, or just curious about industry salaries.

h1bsalaries.fyi allows you to quickly search through this huge trove of salary data by company name, job title, and location without the obnoxious ads, pop-ups, and distracting elements that are characteristic of other similar websites.

In other words, just H-1B salary data. No bullsh*t.

Open to thoughts, comments, and feedback.

Comments URL: https://news.ycombinator.com/item?id=43676446

Points: 1

# Comments: 0

Categories: Hacker News

Ask HN: Can someone ELI5 how Google's A2A is different from MCP?

Hacker News - Sun, 04/13/2025 - 6:52pm

After reading some contents related to Google's new A2A, it still didn't click to me, how is it different from MCP? Can someone please ELI5 with a good example?

In my mind, I am still designing things with MCP. Even the communication between 2 agents can modeled as MCP with some capabilities.

Comments URL: https://news.ycombinator.com/item?id=43676444

Points: 1

# Comments: 0

Categories: Hacker News

Google Workspace adds new AI tools to Docs, Sheets, Chat and more

Hacker News - Sun, 04/13/2025 - 6:48pm

Article URL: https://blog.google/products/workspace/cloud-next-2025-workspace-gemini/

Comments URL: https://news.ycombinator.com/item?id=43676417

Points: 1

# Comments: 0

Categories: Hacker News

Make your code past tense

Hacker News - Sun, 04/13/2025 - 6:47pm

Article URL: https://github.com/rottytooth/PastTense

Comments URL: https://news.ycombinator.com/item?id=43676411

Points: 1

# Comments: 0

Categories: Hacker News

Introducing Presenter [video]

Hacker News - Sun, 04/13/2025 - 6:46pm

Article URL: https://www.youtube.com/watch?v=fNjdcSUFBJQ

Comments URL: https://news.ycombinator.com/item?id=43676400

Points: 1

# Comments: 0

Categories: Hacker News

Ask HN: Why don't we have a functional DSL for data+embedding+API pipelines?

Hacker News - Sun, 04/13/2025 - 6:45pm

I’ve been working on a pretty common problem:

- I have structured data in JSONL files (in.jsonl, out.jsonl) - I match lines by a key - I transform them into (text, embedding) pairs - I optionally filter/map them - I batch them (into chunks of 50) - I push each batch into an external system (e.g. vector DB, Chroma) That’s it. Sounds trivial. But it turns into ugly imperative Python code very quickly: nested for-loops, global indices, +=, manual batching, line-by-line handling, low-level JSON parsing.

Here’s what it usually looks like in Python:

```

with open("in.json", "r") as fin: with open("out.json", "r") as fout: for in_line, out_line in zip(fin, fout): in_data = json.loads(in_line) out_data = json.loads(out_line) if in_data["custom_id"] != out_data["custom_id"]: raise Exception... texts = in_data["body"]["input"] embeddings = [d["embedding"] for d in out_data["response"]["body"]["data"]] for i in range(len(texts)): doc = texts[i] emb = embeddings[i] metadata = { "source": f"chunk-{global_ids}", ```

We’re in 2025, and this is how we’re wiring data into APIs.

---

Why do we tolerate this?

This is a declarative, streaming, data processing problem. Why aren’t we using something more elegant? Something more composable, like functional pipelines?

I'm asking myself: Why don’t we have a composable, streaming, functional DSL for this kind of task?

---

Why not build it like Unix pipes?

What I want is something that feels like:

In Lisp / Clojure:

(->> (zip input output) (filter (= :custom_id)) (mapcat (fn [[in out]] (zip (:input in) (:embedding out)))) (partition-all 50) (map send-to-chroma)) ---

In Elixir + Broadway:

And now, back to Python..

We’re stuck writing imperative soup or building hacky DSLs with things like:

load_json_pairs() \ | where(is_valid) \ | select(to_embedding_record) \ | batch(50) \ | foreach(send_to_chroma) ...or, more realistically, writing thousands of lines of with open(...) as f.

And even though libraries like tf.data.Dataset, dask.bag, pandas, or pipe exist, none of them really solve this use case in a cohesive and expressive way. They all focus on either tabular data, or big data, or ML input pipelines – not this "structured data -> transform -> push to API" pattern.

---

This is especially absurd now that everyone’s doing RAG

With Retrieval-Augmented Generation (RAG) becoming the norm, we’re all parsing files, extracting embeddings, enriching metadata, batching, and inserting into vector stores.

Why are we all writing the same low-level, ad-hoc code to do this?

Shouldn’t this entire category of work be addressed by proper DSL/framework?

Wouldn’t it make sense to build... - a functional DSL for JSON-to-embedding-to-API pipelines? - or a Python library with proper map, filter, batch, pipe, sink semantics? - or even a streaming runtime like Elixir Broadway or a minimal functional Rx-style graph?

Even R with dplyr has more elegant ways to express transformation than what we do in Python for these jobs.

---

Am I missing something?

Is there a tool, a language, or a framework out there that actually solves this well?

Or is this just one of those gaps in the tooling ecosystem that no one has filled yet?

Hacker News - Sun, 04/13/2025 - 6:06pm

One thing that I really appreciate about the blockchain and AI communities is that there is a lot of open source activity in these spaces. It seems feasible to find an interesting project on GitHub, get a deep understanding of it, and take up an issue to start contributing to it.

I wonder if there are any good examples of open source projects in infrastructure that have a similar amount of activity? When I am talking about "infrastructure" here, I'm thinking of things like: pub-sub, event streaming, distributed coordination, and/or data distribution systems.

I've started to get the feeling that open source infrastructure is relatively static now given all of the work that went into it during the last two decades, so there isn't much going on outside of the largest companies that need to scale existing solutions.

Comments URL: https://news.ycombinator.com/item?id=43676206

Points: 1

# Comments: 1

Categories: Hacker News

Microsoft Prepares for New Round of Layoffs in May 2025

Hacker News - Sun, 04/13/2025 - 5:55pm

Article URL: https://www.thebridgechronicle.com/tech/microsoft-layoffs-may-2025

Comments URL: https://news.ycombinator.com/item?id=43676153

Points: 20

# Comments: 11

Categories: Hacker News

How addiction Hijacks the Teenage Brain

Hacker News - Sun, 04/13/2025 - 5:49pm

Article URL: https://www.rte.ie/lifestyle/living/2025/0409/1506653-dr-pennie-how-addiction-hijacks-the-teenage-brain/

Comments URL: https://news.ycombinator.com/item?id=43676112

Points: 1

# Comments: 0

Categories: Hacker News

Quick Primer on MCP Using Ollama and LangChain

Hacker News - Sun, 04/13/2025 - 5:43pm

Article URL: https://www.polarsparc.com/xhtml/MCP.html

Comments URL: https://news.ycombinator.com/item?id=43676084

Points: 2

# Comments: 0

Categories: Hacker News

Biographical Information Summary - This is Just a Summary Joe Pearce
About Joe Pearce joeintenn
Links Joe Pearce
Flounder's Keylime Pie is the Best in the World, At Least I Think So... Joe Pearce
Harley Ride Joe Pearce
Cobra with New Cover Joe Pearce
Mustang Cobra After Ceramic Coating Joe Pearce
Carter County Cruise In Joe Pearce
2003 Ford Mustang SVT Cobra Convertible NAPA Auto Car Show Top 10 Joe Pearce
Ponies in the Smokies - Mustang Trophy Joe Pearce

Hacker News

Detect control characters, quotes and backslashes efficiently using SWAR

Show HN: H-1B salary search without fuss

Ask HN: Can someone ELI5 how Google's A2A is different from MCP?

Google Workspace adds new AI tools to Docs, Sheets, Chat and more

Make your code past tense

Introducing Presenter [video]

Ask HN: Why don't we have a functional DSL for data+embedding+API pipelines?

1 in Every 22 NYers Is a Millionaire

Leonard vs. Pepsico, Inc

Five Takeaways from New Research About A.D.H.D

Venezuelan migrants' arrival in El Salvador: "They had no idea what was coming"

How a Secretive Gambler Called 'The Joker' Took Down the Texas Lottery

You're Probably Breaking the Llama Community License

Distinguishing Human Journalists from AI Through Stylistic Fingerprints

The Flame Bearers (Short Story)

The US Is Turning a Blind Eye to Crypto Crimes

Ask HN: Any open source infrastructure projects under active development?

Microsoft Prepares for New Round of Layoffs in May 2025

How addiction Hijacks the Teenage Brain

Quick Primer on MCP Using Ollama and LangChain

Pages

Welcome to Joe Pearce's Home Page.

Web page offered by Joe Pearce © 2004 - 2025 - All rights reserved.

Thanks to the ETSU Computer and Information Sciences Department.

Thanks to the NSTCC Computer and Information Sciences and Computer Engineering Technologies Department.

This is my Favicon.

You are here

Hacker News

Pages

Welcome to Joe Pearce's Home Page.

Web page offered by Joe Pearce © 2004 - 2025 - All rights reserved.

Thanks to the ETSU Computer and Information Sciences Department.

Thanks to the NSTCC Computer and Information Sciences and Computer Engineering Technologies Department.

This is my Favicon.