Hacker News

Which year: guess which year each photo was taken

Hacker News - Thu, 04/17/2025 - 6:42am

Article URL: https://whichyr.com/

Comments URL: https://news.ycombinator.com/item?id=43715024

Points: 3

# Comments: 0

Categories: Hacker News

Why I Support Privacy

Hacker News - Wed, 04/16/2025 - 1:23pm
Categories: Hacker News

Flatten Your Data

Hacker News - Wed, 04/16/2025 - 1:22pm
Categories: Hacker News

Ask HN: How do you raise your kids in the age of AI?

Hacker News - Wed, 04/16/2025 - 1:19pm

What do you tell your growing kids in this age and time? How do you give your kids some certainty in an age that is lacking certainty?

Comments URL: https://news.ycombinator.com/item?id=43707939

Points: 2

# Comments: 4

Categories: Hacker News

How to align with user preference in a RAG system?

Hacker News - Wed, 04/16/2025 - 1:18pm

Current embedding-based RAG systems primarily rely on semantic similarity. Given a document and a query, the system usually retrieves multiple sections that appear semantically relevant. However, in domain-specific applications, such as financial analysis or legal research, users often have domain-specific preferences for which parts of a document to consult first. These preferences are typically driven by experience about where answers are typically found or which sections are considered more trustworthy sources of information.

For example:

- When querying about financial performance metrics (e.g., earnings adjustments), experienced analysts typically look first at the Management’s Discussion and Analysis (MD&A) section or related financial statement footnotes.

- For questions about company risks, they usually prioritize the Risk Factors section before turning to broader disclosures.

These expert-driven navigation patterns are difficult to capture using embedding-based RAG alone. Fine-tuning embedding models to reflect such preferences is possible, but it tends to be costly and resource-intensive.

An alternative approach is to incorporate reasoning-based retrieval, which mimics how humans find information. For example, when reading a long document, a human typically starts by reviewing the table of contents to determine which sections to read first, based on the context of the query and preference. Similarly, one can build an LLM agent that analyzes the "table of contents" and then navigates through the document according to expert preferences. This can be achieved by using few-shot prompting, where the system learns from sample user preference examples provided in the prompt, allowing it to prioritize sections based on the user’s needs.

To support this paradigm, we developed an open-sourced tool called PageIndex. It can transform any long documents into an LLM-friendly "table-of-contents" tree index, which is ready for the LLM agents to navigate. With PageIndex, you can easily build RAG agents that align with user preferences and domain logic.

Would love any feedback, particularly thoughts on reasoning-based RAG or other potential applications of PageIndex.

Comments URL: https://news.ycombinator.com/item?id=43707928

Points: 7

# Comments: 1

Categories: Hacker News

Birth of Basic [video]

Hacker News - Wed, 04/16/2025 - 1:15pm
Categories: Hacker News

Pages