Yoonchul Yi
โ† Back to daily insights

2026-03-19

/

๐Ÿ“ฐ Daily Digest โ€” 2026-03-19

4 items | AI, Business, Open Source


๐Ÿ“‹ Quick Summary

GPT 5.4 is a big step for Codex

Source: Interconnects (Substack) ยท Category: AI ยท Link: Original

  • Nathan Lambert argues agent evaluation should move beyond a single correctness score toward a 4-factor view: correctness, ease of use, speed, and cost.
  • In hands-on Codex usage (fast mode with high/xhigh effort), he describes GPT 5.4 as the first OpenAI agent that reliably handles messy end-to-end tasks with far fewer operational โ€œhard edges.โ€
  • He also reports stronger context behavior and generous rate limits versus his own usage limits, while noting a shared GPT/Claude weakness: occasional dropped TODO items in multi-task prompts.

์‹ค๋ฆฌ์ฝ˜๋ฐธ๋ฆฌ VC๊ฐ€ ์“ด VC์—๊ฒŒ ํˆฌ์ž๋ฐ›๋Š” ๋ฒ•

Source: ์ด์•ˆ์˜ ์ฃผ๊ฐ„์‹ค๋ฆฌ์ฝ˜๋ฐธ๋ฆฌ (Substack) ยท Category: Business ยท Link: Original

  • Ian Park frames fundraising as a game-theoretic equation of incentives and introduces a three-layer VC model plus a โ€œgenius vs. foolโ€ psychology matrix.
  • The piece breaks down classic โ€œ2 and 20โ€ math with concrete examples (e.g., a 100์–ต fund, ~20์–ต fees over 10 years, ~80์–ต investable capital, and why many LPs seek ~3x outcomes).
  • Founder guidance focuses on selecting the right house/champion path by fund size, vintage year, check-size fit, and who can actually win internal IC debates.

Agent-to-Agent Communication Is Broken: Why an Email-like Inbox Model Works

Source: Medium ยท Category: AI ยท Link: Original

  • โš ๏ธ Fetch failed (403/security verification page).
  • The retrievable page only exposed Mediumโ€™s anti-bot interstitial (โ€œPerforming security verificationโ€), not the article body.
  • Kept the original link for retry when direct content access is available.

Anthropic โ€˜81,000 people want from AIโ€™ โ€” 669 classified quotes with occupation, country, region, topic, category, and sentiment

Source: GitHub Gist ยท Category: Open Source ยท Link: Original

  • The gist publishes a CSV of 669 classified quotes derived from Anthropicโ€™s โ€œ81,000 people want from AIโ€ material.
  • The file spans 84 countries and 13 world regions; top country counts include United States (148), South Korea (46), and Japan (43).
  • Label distribution is light 375, shade 263, mixed 31, with frequent topics such as Productivity (54), Learning & education (48), and Emotional support (40).

๐Ÿ“ Detailed Notes

1. GPT 5.4 is a big step for Codex

  1. The article argues current agent benchmarks underrepresent real-world usefulness.
    • Traditional leaderboards compress performance into one correctness score for interpretability.
    • Lambert says agent work quality depends on four axes: correctness, usability, speed, and cost.
    • He expects separate benchmark dimensions to mature over time rather than one scalar metric.
  2. Practical workflow reliability is presented as GPT 5.4โ€™s main step-change.
    • The author describes prior OpenAI-agent usage as โ€œdeath by a thousand cutsโ€ in everyday operations.
    • He specifically calls out fewer failures around messy operational tasks such as packages, file ops, and git-adjacent flows.
    • In his Codex usage, GPT 5.4 fast mode plus higher reasoning effort feels robust across varied tasks.
  3. The write-up distinguishes model capability from harness experience and product feel.
    • Claude is characterized as warmer and more opinionated, which helps newcomers stick with the tool.
    • GPT 5.4 is described as meticulous and literal, better for tightly specified execution-heavy task lists.
    • The comparison frames two philosophies: intent-modeling assistant versus exact-instruction coordinator.
  4. Cost/rate-limit and context behavior are central to the authorโ€™s preference split.
    • He references paying for both Claude ($100/month) and ChatGPT ($200/month) plans in parallel.
    • He reports rarely reaching Codex limits in fast mode, while hitting Claude limits at times.
    • The post links this to reasoning efficiency and cites third-party CursorBench context for token-performance trade-offs.
  5. Remaining friction is small but non-trivial in multi-step prompting.
    • Both GPT 5.4 and Claude Opus 4.6 are reported to occasionally drop tasks from multi-TODO prompts.
    • Queueing extra follow-up messages during execution is described as risky outside simple cases.
    • Net assessment stays strongly positive: better โ€œagentnessโ€ and usability, while model choice remains use-case and taste dependent.

2. ์‹ค๋ฆฌ์ฝ˜๋ฐธ๋ฆฌ VC๊ฐ€ ์“ด VC์—๊ฒŒ ํˆฌ์ž๋ฐ›๋Š” ๋ฒ•

  1. The core framing treats fundraising as an incentives equation, not a pitch-deck beauty contest.
    • Ian Park applies a utility-function lens from economics/game theory to founderโ€“VC interactions.
    • He models VC behavior through three layers (fund economics, org-person incentives, IC politics).
    • He also introduces a psychological matrix around consensus safety versus contrarian โ€œgenius/foolโ€ outcomes.
  2. Fund-level economics explain why VC behavior can look extreme from the outside.
    • The article uses the standard โ€œ2 and 20โ€ structure: management fee and carried interest.
    • In the 100์–ต example, about 20์–ต is consumed as 10-year fees, leaving roughly 80์–ต for investment.
    • He highlights why LP expectations and dilution math push many funds toward power-law, home-run hunting.
  3. Carry design materially changes internal behavior and decision incentives.
    • Whole-fund carry aligns teammates around portfolio-level outcomes and shared upside.
    • Deal-by-deal carry can intensify internal competition because personal upside ties to individual deal wins.
    • The same valuation decision can be rational under one structure and irrational under the other.
  4. IC process is described as a political multiplayer game centered on a champion.
    • Founders present early, but internal debate is often mediated by the partner or investor champion.
    • The championโ€™s preparation, credibility, and coalition-building can decide whether objections are neutralized.
    • Park emphasizes this is not purely data adjudication; rhetoric, trust, and repeated-game dynamics matter.
  5. Founder strategy should optimize for fit and influence path before narrative polish.
    • He recommends screening target funds by size, vintage stage, and expected check/exit math compatibility.
    • He advises prioritizing contacts who can actually carry the deal through IC, not just exploratory meetings.
    • He also notes first-contact path matters: direct partner access or strong non-VC intro can improve champion alignment.

3. Anthropic โ€˜81,000 people want from AIโ€™ โ€” 669 classified quotes with occupation, country, region, topic, category, and sentiment

  1. The gist packages a structured sample from a much larger response corpus.
    • The headline references 81,000 total responses while the shared CSV contains 669 labeled quotes.
    • The downloadable raw file is anthropic_81k_quotes.csv with one record per selected quote.
    • This turns narrative responses into a compact, queryable research artifact.
  2. Schema design supports cross-sectional analysis across demographics and themes.
    • Columns include occupation, country, world region, full quote text, topic, category, and sentiment.
    • The sample covers 84 countries and 13 world regions, enabling geographic comparisons.
    • Country frequency is concentrated, with the United States at 148 entries, followed by South Korea (46) and Japan (43).
  3. Sentiment labels show optimism-leading but mixed public perception.
    • The dataset contains 375 light, 263 shade, and 31 mixed entries.
    • That split suggests many respondents report benefits while a large minority emphasizes concerns.
    • The sentiment balance makes the file useful for studying coexistence of enthusiasm and anxiety narratives.
  4. Topic/category frequencies reveal where everyday value and risk perceptions cluster.
    • Top topics include Productivity (54), Learning & education (48), and Emotional support (40).
    • High-volume categories include Personal transformation (113) and Professional excellence (104).
    • Reliability & trust (56) and Job displacement (37) remain visible concern channels in the labeled data.
  5. Practical caveats matter when interpreting or reusing the sample.
    • Not every field is fully populated; occupation is blank in 154 rows.
    • It is a curated subset, so counts should be treated as directional within this file, not population estimates.
    • Still, the format is immediately usable for dashboards, qualitative coding, and prompt-grounded synthesis workflows.