Hacker News
Detect control characters, quotes and backslashes efficiently using SWAR
Article URL: https://lemire.me/blog/2025/04/13/detect-control-characters-quotes-and-backslashes-efficiently-using-swar/
Comments URL: https://news.ycombinator.com/item?id=43676460
Points: 2
# Comments: 0
Show HN: H-1B salary search without fuss
The US Department of Labor releases base salary data for every H-1B visa application and has done so for the past 19 years. This data is often useful to people negotiating an offer, researching a company, or just curious about industry salaries.
h1bsalaries.fyi allows you to quickly search through this huge trove of salary data by company name, job title, and location without the obnoxious ads, pop-ups, and distracting elements that are characteristic of other similar websites.
In other words, just H-1B salary data. No bullsh*t.
Open to thoughts, comments, and feedback.
Comments URL: https://news.ycombinator.com/item?id=43676446
Points: 1
# Comments: 0
Ask HN: Can someone ELI5 how Google's A2A is different from MCP?
After reading some contents related to Google's new A2A, it still didn't click to me, how is it different from MCP? Can someone please ELI5 with a good example?
In my mind, I am still designing things with MCP. Even the communication between 2 agents can modeled as MCP with some capabilities.
Comments URL: https://news.ycombinator.com/item?id=43676444
Points: 1
# Comments: 0
Google Workspace adds new AI tools to Docs, Sheets, Chat and more
Article URL: https://blog.google/products/workspace/cloud-next-2025-workspace-gemini/
Comments URL: https://news.ycombinator.com/item?id=43676417
Points: 1
# Comments: 0
Make your code past tense
Article URL: https://github.com/rottytooth/PastTense
Comments URL: https://news.ycombinator.com/item?id=43676411
Points: 1
# Comments: 0
Introducing Presenter [video]
Article URL: https://www.youtube.com/watch?v=fNjdcSUFBJQ
Comments URL: https://news.ycombinator.com/item?id=43676400
Points: 1
# Comments: 0
Ask HN: Why don't we have a functional DSL for data+embedding+API pipelines?
I’ve been working on a pretty common problem:
- I have structured data in JSONL files (in.jsonl, out.jsonl) - I match lines by a key - I transform them into (text, embedding) pairs - I optionally filter/map them - I batch them (into chunks of 50) - I push each batch into an external system (e.g. vector DB, Chroma) That’s it. Sounds trivial. But it turns into ugly imperative Python code very quickly: nested for-loops, global indices, +=, manual batching, line-by-line handling, low-level JSON parsing.
Here’s what it usually looks like in Python:
```
with open("in.json", "r") as fin: with open("out.json", "r") as fout: for in_line, out_line in zip(fin, fout): in_data = json.loads(in_line) out_data = json.loads(out_line) if in_data["custom_id"] != out_data["custom_id"]: raise Exception... texts = in_data["body"]["input"] embeddings = [d["embedding"] for d in out_data["response"]["body"]["data"]] for i in range(len(texts)): doc = texts[i] emb = embeddings[i] metadata = { "source": f"chunk-{global_ids}", ```
We’re in 2025, and this is how we’re wiring data into APIs.
---
Why do we tolerate this?
This is a declarative, streaming, data processing problem. Why aren’t we using something more elegant? Something more composable, like functional pipelines?
I'm asking myself: Why don’t we have a composable, streaming, functional DSL for this kind of task?
---
Why not build it like Unix pipes?
What I want is something that feels like:
cat input.jsonl \ | match output.jsonl on custom_id \ | extract (text, embedding) \ | filter not-empty \ | batch 50 \ | send-to-chroma ---
In Lisp / Clojure:
(->> (zip input output) (filter (= :custom_id)) (mapcat (fn [[in out]] (zip (:input in) (:embedding out)))) (partition-all 50) (map send-to-chroma)) ---
In Elixir + Broadway:
Broadway |> read_stream("in.jsonl", "out.jsonl") |> match_on(:custom_id) |> map() |> batch_every(50) |> send_to_chroma() ---
And now, back to Python..
We’re stuck writing imperative soup or building hacky DSLs with things like:
load_json_pairs() \ | where(is_valid) \ | select(to_embedding_record) \ | batch(50) \ | foreach(send_to_chroma) ...or, more realistically, writing thousands of lines of with open(...) as f.
And even though libraries like tf.data.Dataset, dask.bag, pandas, or pipe exist, none of them really solve this use case in a cohesive and expressive way. They all focus on either tabular data, or big data, or ML input pipelines – not this "structured data -> transform -> push to API" pattern.
---
This is especially absurd now that everyone’s doing RAG
With Retrieval-Augmented Generation (RAG) becoming the norm, we’re all parsing files, extracting embeddings, enriching metadata, batching, and inserting into vector stores.
Why are we all writing the same low-level, ad-hoc code to do this?
Shouldn’t this entire category of work be addressed by proper DSL/framework?
Wouldn’t it make sense to build... - a functional DSL for JSON-to-embedding-to-API pipelines? - or a Python library with proper map, filter, batch, pipe, sink semantics? - or even a streaming runtime like Elixir Broadway or a minimal functional Rx-style graph?
Even R with dplyr has more elegant ways to express transformation than what we do in Python for these jobs.
---
Am I missing something?
Is there a tool, a language, or a framework out there that actually solves this well?
Or is this just one of those gaps in the tooling ecosystem that no one has filled yet?
Would love to hear what others are doing – and if anyone’s already working on a solution like this.
Thanks.
Comments URL: https://news.ycombinator.com/item?id=43676397
Points: 5
# Comments: 3
1 in Every 22 NYers Is a Millionaire
Article URL: https://secretnyc.co/nyc-wealthiest-city-in-the-world-2025/
Comments URL: https://news.ycombinator.com/item?id=43676389
Points: 4
# Comments: 0
Leonard vs. Pepsico, Inc
Article URL: https://en.wikipedia.org/wiki/Leonard_v._Pepsico,_Inc.
Comments URL: https://news.ycombinator.com/item?id=43676357
Points: 1
# Comments: 1
Five Takeaways from New Research About A.D.H.D
Article URL: https://www.nytimes.com/2025/04/13/magazine/adhd-children-research-takeaways.html
Comments URL: https://news.ycombinator.com/item?id=43676339
Points: 1
# Comments: 1
Venezuelan migrants' arrival in El Salvador: "They had no idea what was coming"
Article URL: https://www.cbsnews.com/news/photojournalist-witnesses-venezuelan-migrants-arrival-in-el-salvador-60-minutes/
Comments URL: https://news.ycombinator.com/item?id=43676323
Points: 3
# Comments: 0
How a Secretive Gambler Called 'The Joker' Took Down the Texas Lottery
You're Probably Breaking the Llama Community License
Article URL: https://notes.victor.earth/youre-probably-breaking-the-llama-community-license/
Comments URL: https://news.ycombinator.com/item?id=43676254
Points: 1
# Comments: 1
Distinguishing Human Journalists from AI Through Stylistic Fingerprints
Article URL: https://www.mdpi.com/2073-431X/13/12/328
Comments URL: https://news.ycombinator.com/item?id=43676241
Points: 1
# Comments: 0
The Flame Bearers (Short Story)
Article URL: https://brian.bearblog.dev/the-flame-bearers/
Comments URL: https://news.ycombinator.com/item?id=43676227
Points: 2
# Comments: 0
The US Is Turning a Blind Eye to Crypto Crimes
Article URL: https://www.wired.com/story/the-us-is-turning-a-blind-eye-to-crypto-crimes/
Comments URL: https://news.ycombinator.com/item?id=43676218
Points: 10
# Comments: 1
Ask HN: Any open source infrastructure projects under active development?
One thing that I really appreciate about the blockchain and AI communities is that there is a lot of open source activity in these spaces. It seems feasible to find an interesting project on GitHub, get a deep understanding of it, and take up an issue to start contributing to it.
I wonder if there are any good examples of open source projects in infrastructure that have a similar amount of activity? When I am talking about "infrastructure" here, I'm thinking of things like: pub-sub, event streaming, distributed coordination, and/or data distribution systems.
I've started to get the feeling that open source infrastructure is relatively static now given all of the work that went into it during the last two decades, so there isn't much going on outside of the largest companies that need to scale existing solutions.
Comments URL: https://news.ycombinator.com/item?id=43676206
Points: 1
# Comments: 1
Microsoft Prepares for New Round of Layoffs in May 2025
Article URL: https://www.thebridgechronicle.com/tech/microsoft-layoffs-may-2025
Comments URL: https://news.ycombinator.com/item?id=43676153
Points: 20
# Comments: 11
How addiction Hijacks the Teenage Brain
Article URL: https://www.rte.ie/lifestyle/living/2025/0409/1506653-dr-pennie-how-addiction-hijacks-the-teenage-brain/
Comments URL: https://news.ycombinator.com/item?id=43676112
Points: 1
# Comments: 0
Quick Primer on MCP Using Ollama and LangChain
Article URL: https://www.polarsparc.com/xhtml/MCP.html
Comments URL: https://news.ycombinator.com/item?id=43676084
Points: 2
# Comments: 0