2026-03-19

📰 Daily Digest — 2026-03-19

4 items | AI, Business, Open Source

📋 Quick Summary

GPT 5.4 is a big step for Codex

Source: Interconnects (Substack) · Category: AI · Link: Original

Nathan Lambert argues agent evaluation should move beyond a single correctness score toward a 4-factor view: correctness, ease of use, speed, and cost.
In hands-on Codex usage (fast mode with high/xhigh effort), he describes GPT 5.4 as the first OpenAI agent that reliably handles messy end-to-end tasks with far fewer operational “hard edges.”
He also reports stronger context behavior and generous rate limits versus his own usage limits, while noting a shared GPT/Claude weakness: occasional dropped TODO items in multi-task prompts.

실리콘밸리 VC가 쓴 VC에게 투자받는 법

Source: 이안의 주간실리콘밸리 (Substack) · Category: Business · Link: Original

Ian Park frames fundraising as a game-theoretic equation of incentives and introduces a three-layer VC model plus a “genius vs. fool” psychology matrix.
The piece breaks down classic “2 and 20” math with concrete examples (e.g., a 100억 fund, ~20억 fees over 10 years, ~80억 investable capital, and why many LPs seek ~3x outcomes).
Founder guidance focuses on selecting the right house/champion path by fund size, vintage year, check-size fit, and who can actually win internal IC debates.

Agent-to-Agent Communication Is Broken: Why an Email-like Inbox Model Works

Source: Medium · Category: AI · Link: Original

⚠️ Fetch failed (403/security verification page).
The retrievable page only exposed Medium’s anti-bot interstitial (“Performing security verification”), not the article body.
Kept the original link for retry when direct content access is available.

Anthropic ‘81,000 people want from AI’ — 669 classified quotes with occupation, country, region, topic, category, and sentiment

Source: GitHub Gist · Category: Open Source · Link: Original

The gist publishes a CSV of 669 classified quotes derived from Anthropic’s “81,000 people want from AI” material.
The file spans 84 countries and 13 world regions; top country counts include United States (148), South Korea (46), and Japan (43).
Label distribution is light 375, shade 263, mixed 31, with frequent topics such as Productivity (54), Learning & education (48), and Emotional support (40).

📝 Detailed Notes

1. GPT 5.4 is a big step for Codex

The article argues current agent benchmarks underrepresent real-world usefulness.
- Traditional leaderboards compress performance into one correctness score for interpretability.
- Lambert says agent work quality depends on four axes: correctness, usability, speed, and cost.
- He expects separate benchmark dimensions to mature over time rather than one scalar metric.
Practical workflow reliability is presented as GPT 5.4’s main step-change.
- The author describes prior OpenAI-agent usage as “death by a thousand cuts” in everyday operations.
- He specifically calls out fewer failures around messy operational tasks such as packages, file ops, and git-adjacent flows.
- In his Codex usage, GPT 5.4 fast mode plus higher reasoning effort feels robust across varied tasks.
The write-up distinguishes model capability from harness experience and product feel.
- Claude is characterized as warmer and more opinionated, which helps newcomers stick with the tool.
- GPT 5.4 is described as meticulous and literal, better for tightly specified execution-heavy task lists.
- The comparison frames two philosophies: intent-modeling assistant versus exact-instruction coordinator.
Cost/rate-limit and context behavior are central to the author’s preference split.
- He references paying for both Claude ($100/month) and ChatGPT ($200/month) plans in parallel.
- He reports rarely reaching Codex limits in fast mode, while hitting Claude limits at times.
- The post links this to reasoning efficiency and cites third-party CursorBench context for token-performance trade-offs.
Remaining friction is small but non-trivial in multi-step prompting.
- Both GPT 5.4 and Claude Opus 4.6 are reported to occasionally drop tasks from multi-TODO prompts.
- Queueing extra follow-up messages during execution is described as risky outside simple cases.
- Net assessment stays strongly positive: better “agentness” and usability, while model choice remains use-case and taste dependent.

2. 실리콘밸리 VC가 쓴 VC에게 투자받는 법

The core framing treats fundraising as an incentives equation, not a pitch-deck beauty contest.
- Ian Park applies a utility-function lens from economics/game theory to founder–VC interactions.
- He models VC behavior through three layers (fund economics, org-person incentives, IC politics).
- He also introduces a psychological matrix around consensus safety versus contrarian “genius/fool” outcomes.
Fund-level economics explain why VC behavior can look extreme from the outside.
- The article uses the standard “2 and 20” structure: management fee and carried interest.
- In the 100억 example, about 20억 is consumed as 10-year fees, leaving roughly 80억 for investment.
- He highlights why LP expectations and dilution math push many funds toward power-law, home-run hunting.
Carry design materially changes internal behavior and decision incentives.
- Whole-fund carry aligns teammates around portfolio-level outcomes and shared upside.
- Deal-by-deal carry can intensify internal competition because personal upside ties to individual deal wins.
- The same valuation decision can be rational under one structure and irrational under the other.
IC process is described as a political multiplayer game centered on a champion.
- Founders present early, but internal debate is often mediated by the partner or investor champion.
- The champion’s preparation, credibility, and coalition-building can decide whether objections are neutralized.
- Park emphasizes this is not purely data adjudication; rhetoric, trust, and repeated-game dynamics matter.
Founder strategy should optimize for fit and influence path before narrative polish.
- He recommends screening target funds by size, vintage stage, and expected check/exit math compatibility.
- He advises prioritizing contacts who can actually carry the deal through IC, not just exploratory meetings.
- He also notes first-contact path matters: direct partner access or strong non-VC intro can improve champion alignment.

3. Anthropic ‘81,000 people want from AI’ — 669 classified quotes with occupation, country, region, topic, category, and sentiment

The gist packages a structured sample from a much larger response corpus.
- The headline references 81,000 total responses while the shared CSV contains 669 labeled quotes.
- The downloadable raw file is anthropic_81k_quotes.csv with one record per selected quote.
- This turns narrative responses into a compact, queryable research artifact.
Schema design supports cross-sectional analysis across demographics and themes.
- Columns include occupation, country, world region, full quote text, topic, category, and sentiment.
- The sample covers 84 countries and 13 world regions, enabling geographic comparisons.
- Country frequency is concentrated, with the United States at 148 entries, followed by South Korea (46) and Japan (43).
Sentiment labels show optimism-leading but mixed public perception.
- The dataset contains 375 light, 263 shade, and 31 mixed entries.
- That split suggests many respondents report benefits while a large minority emphasizes concerns.
- The sentiment balance makes the file useful for studying coexistence of enthusiasm and anxiety narratives.
Topic/category frequencies reveal where everyday value and risk perceptions cluster.
- Top topics include Productivity (54), Learning & education (48), and Emotional support (40).
- High-volume categories include Personal transformation (113) and Professional excellence (104).
- Reliability & trust (56) and Job displacement (37) remain visible concern channels in the labeled data.
Practical caveats matter when interpreting or reusing the sample.
- Not every field is fully populated; occupation is blank in 154 rows.
- It is a curated subset, so counts should be treated as directional within this file, not population estimates.
- Still, the format is immediately usable for dashboards, qualitative coding, and prompt-grounded synthesis workflows.

📰 데일리 다이제스트 — 2026-03-19

4건 정리 | AI, Business, Open Source

📋 간단 요약

GPT 5.4 is a big step for Codex

출처: Interconnects (Substack) · 카테고리: AI · 링크: 원문

Nathan Lambert는 에이전트 평가는 단일 정답률 점수 대신 정확성, 사용성, 속도, 비용의 4축으로 봐야 한다고 주장합니다.
그는 Codex 실사용(빠른 모드 + high/xhigh effort)에서 GPT 5.4가 처음으로 다양한 실무형 작업을 안정적으로 처리하는 OpenAI 에이전트라고 평가합니다.
또한 컨텍스트 운용과 사용 한도 측면은 개선됐지만, GPT/Claude 모두 멀티 TODO 프롬프트에서 일부 작업을 놓치는 약점은 남아 있다고 지적합니다.

실리콘밸리 VC가 쓴 VC에게 투자받는 법

출처: 이안의 주간실리콘밸리 (Substack) · 카테고리: Business · 링크: 원문

Ian Park는 펀드레이징을 “피칭 기술”보다 인센티브를 푸는 게임이론 방정식으로 보고, VC 3층 구조와 “천재/바보” 심리 매트릭스를 제시합니다.
글은 “2 and 20” 구조를 수치로 설명하며(예: 100억 펀드, 10년 운용보수 약 20억, 실투자 가능 자금 약 80억, LP의 3x 기대), VC 의사결정의 배경을 해부합니다.
창업자 실전 전략으로는 펀드 사이즈·빈티지·체크사이즈 적합성, 그리고 IC를 실제로 통과시킬 수 있는 챔피언 경로 선택을 강조합니다.

Agent-to-Agent Communication Is Broken: Why an Email-like Inbox Model Works

출처: Medium · 카테고리: AI · 링크: 원문

⚠️ Fetch failed (403/보안 검증 페이지).
수집 가능한 페이지에는 Medium의 봇 방지 인터스티셜(“Performing security verification”)만 노출되어 본문을 확인할 수 없었습니다.
원문 링크는 추후 재시도용으로 유지했습니다.

Anthropic ‘81,000 people want from AI’ — 669 classified quotes with occupation, country, region, topic, category, and sentiment

출처: GitHub Gist · 카테고리: Open Source · 링크: 원문

이 Gist는 Anthropic의 “81,000 people want from AI” 자료를 바탕으로 분류된 669개 인용문 CSV를 공개합니다.
데이터는 84개 국가와 13개 권역을 포괄하며, 국가별 상위는 미국(148), 한국(46), 일본(43)입니다.
감성 라벨 분포는 light 375, shade 263, mixed 31이며, 빈도 높은 토픽은 Productivity(54), Learning & education(48), Emotional support(40)입니다.

📝 상세 정리

1. GPT 5.4 is a big step for Codex

글의 핵심 문제의식은 기존 에이전트 벤치마크가 실사용 가치를 충분히 반영하지 못한다는 점입니다.
- 기존 리더보드는 해석 용이성을 위해 성능을 단일 정답률 점수로 압축합니다.
- Lambert는 에이전트 품질이 정확성, 사용성, 속도, 비용의 네 축에서 결정된다고 봅니다.
- 그래서 향후에는 단일 점수보다 다차원 지표가 필요하다고 전망합니다.
GPT 5.4의 가장 큰 변화로는 실무 워크플로우 신뢰성 개선을 꼽습니다.
- 저자는 과거 OpenAI 에이전트 경험을 “death by a thousand cuts”로 표현합니다.
- 패키지 설치, 파일 작업, git 인접 작업 같은 운영성 태스크에서 실패가 줄었다고 설명합니다.
- Codex에서 fast mode와 높은 reasoning effort 조합이 다양한 작업에서 안정적이었다고 평가합니다.
모델 성능과 하네스/제품 경험을 분리해 비교하는 점도 중요합니다.
- Claude는 더 따뜻하고 의견성이 있어 신규 사용자에게 진입 장벽이 낮다고 봅니다.
- GPT 5.4는 더 기계적이고 지시 충실형이라 명확한 실행 과제에 강하다고 말합니다.
- 결과적으로 “의도 해석형 조력자”와 “정확 지시 실행자”라는 철학 차이를 제시합니다.
비용·한도·컨텍스트 관리도 실제 선택에 큰 영향을 줍니다.
- 저자는 Claude($100/월)와 ChatGPT($200/월)를 병행 구독 중이라고 밝힙니다.
- 본인 사용 기준으로 Codex 한도는 거의 닿지 않는 반면 Claude 한도는 가끔 닿는다고 합니다.
- 이를 추론 효율성과 연결하고, CursorBench를 토큰-성능 트레이드오프 맥락으로 인용합니다.
남은 마찰은 작지만 멀티 스텝 프롬프팅에서 의미 있게 남아 있습니다.
- GPT 5.4와 Claude Opus 4.6 모두 멀티 TODO 지시에서 일부 작업 누락이 발생한다고 합니다.
- 실행 중 추가 메시지를 쌓는 방식은 단순한 케이스를 제외하면 리스크가 크다고 적습니다.
- 그럼에도 총평은 긍정적이며, 모델 선택은 용도와 사용자 취향의 함수라고 정리합니다.

2. 실리콘밸리 VC가 쓴 VC에게 투자받는 법

이 글은 펀드레이징을 피칭 미학이 아닌 인센티브 방정식으로 해석합니다.
- Ian Park는 경제학/게임이론의 효용함수 관점으로 창업자-VC 상호작용을 설명합니다.
- VC 행동을 펀드 경제학, 조직-개인 인센티브, IC 정치학의 3층 구조로 모델링합니다.
- 여기에 컨센서스 안정성과 컨트래리언 베팅의 “천재/바보” 심리축을 겹쳐 해석합니다.
펀드 단위 경제학이 왜 VC 행동을 극단적으로 보이게 하는지 설명합니다.
- 글은 표준 “2 and 20”(운용보수 + 캐리) 구조를 기본 전제로 둡니다.
- 100억 예시에서 10년 운용보수 약 20억이 먼저 빠지고, 투자 가능 자금은 약 80억이 됩니다.
- LP 기대수익과 희석 구조 때문에 홈런 중심의 파워로(멱함수) 전략이 강제된다고 봅니다.
캐리 설계 차이는 내부 행동과 의사결정을 실제로 바꿉니다.
- Whole-fund carry는 포트폴리오 단위 성과와 팀 업사이드를 더 정렬시킵니다.
- Deal-by-deal carry는 개인 딜 성과와 보상이 직결되어 내부 경쟁을 키울 수 있습니다.
- 같은 밸류에이션 결정도 캐리 구조에 따라 합리/비합리 판단이 달라질 수 있다고 지적합니다.
IC는 챔피언 중심의 정치적 멀티플레이어 게임으로 묘사됩니다.
- 창업자는 초반에 설명하지만, 내부 논쟁은 주로 챔피언 투자자가 대리 수행합니다.
- 반론을 꺾는 힘은 챔피언의 준비도, 신뢰도, 연합 형성 능력에 크게 좌우됩니다.
- 따라서 IC는 데이터만의 심사가 아니라 수사, 신뢰, 반복게임 역학이 작동하는 장이라고 봅니다.
창업자 전략은 스토리보다 적합도와 영향 경로 최적화가 먼저입니다.
- 목표 펀드를 사이즈, 빈티지 단계, 체크사이즈/엑싯 적합성으로 먼저 스크리닝하라고 권합니다.
- 탐색성 미팅보다 IC를 실제 통과시킬 수 있는 접점과 인물을 우선하라고 조언합니다.
- 최초 접점 경로(직접 파트너 접촉, 강한 비VC 소개)의 질이 챔피언 정렬을 좌우한다고 강조합니다.

3. Anthropic ‘81,000 people want from AI’ — 669 classified quotes with occupation, country, region, topic, category, and sentiment

이 Gist는 대규모 응답에서 구조화된 샘플을 추출해 제공합니다.
- 헤드라인은 총 81,000개 응답을 언급하고, 공개 CSV에는 669개 라벨링 인용문이 담겨 있습니다.
- 원본 다운로드 파일명은 anthropic_81k_quotes.csv이며 행 단위로 인용문이 기록됩니다.
- 서술형 응답을 즉시 질의 가능한 연구 데이터 형태로 바꿨다는 점이 핵심입니다.
스키마 설계가 인구통계·주제 교차분석을 가능하게 합니다.
- 컬럼은 직업, 국가, 권역, 인용문 본문, 토픽, 카테고리, 감성 라벨을 포함합니다.
- 샘플은 84개 국가, 13개 권역을 포괄해 지역 비교 분석이 가능합니다.
- 국가 빈도는 미국 148건이 가장 높고, 그다음 한국 46건, 일본 43건입니다.
감성 라벨은 낙관 우세지만 양가적 인식이 공존함을 보여줍니다.
- 분포는 light 375, shade 263, mixed 31입니다.
- 즉, 효용 체감 응답이 다수지만 우려 서사도 큰 비중으로 함께 존재합니다.
- 이 균형 덕분에 기대와 불안을 함께 추적하는 분석에 적합합니다.
토픽·카테고리 빈도는 사람들이 체감하는 가치와 리스크 지점을 보여줍니다.
- 상위 토픽은 Productivity(54), Learning & education(48), Emotional support(40)입니다.
- 상위 카테고리는 Personal transformation(113), Professional excellence(104)입니다.
- Reliability & trust(56), Job displacement(37)도 우려 축으로 뚜렷하게 나타납니다.
재사용 시 해석 주의사항도 분명합니다.
- 모든 필드가 완전하지 않으며 직업 컬럼은 154행이 공란입니다.
- 공개본은 큐레이션된 서브셋이므로, 수치를 모집단 추정치로 일반화하면 안 됩니다.
- 그럼에도 대시보드, 정성 코딩, 프롬프트 기반 합성 파이프라인에 바로 활용 가능한 형식입니다.