ZD 25.18: Deep Research Drag Race

Which AI-powered deep research tool to use for what.

May 06, 2025

Is AGI really a milestone? Is ChatGPT really killing the planet? Huwaei as hydra. Welcome to the future with a stagnant economy. Music & your brain. Which AI powered deep research tool to use? 1980s synth techno musical coda.

The Distilled Spirit

Author’s note: We passed 250 subscribers! Welcome to the burst of recent subscribers, thank you for signing up. Come for the NotebookLM, stay for the Deep Research and more. Now on with today’s program.

Speculating About AI

🚀 Battlefield AI & Risk Tolerance (The Daily Upside)
The pressure to deploy AI before the other guy is bumping up against human capacity and management structures. What will shake out as the defense industry is disrupted?

🚵‍♂️ Why AGI Is Not A Milestone (AI Snake Oil)
What if AGI was just another click on the odometer, not some massive achievement that will be clear and dramatic? Diffusion and adoption will take time.

💬 Why ChatGPT Became Sycophantic (The Algorithmic Bridge)
ChatGPT got so friendly OpenAI had to roll back the changes. What if this was a very intentional change revealing OpenAI’s goals?

🌳 Using ChatGPT Is Not Killing the Planet (The Weird Turn Pro)
A cheat sheet for refuting claims that ChatGPT is bad for the environment.

👷‍♀️The Great Dislocation Will Be Painful (David Shapiro’s Substack)
If AI can become better, faster and cheaper than many workers we will experience a painful dislocation as a society.

International

🐍 Huawei the Hydra (High Capacity)
Huawei is a multi-headed hydra at the center of China’s industrial policy.

🐲 The Dragon’s Dominion (Interesting Engineering ++)
How China conquered the shipbuilding industry and how the US can take it back.

⚡ El Blackout (Energy Bad Boys)
Frequency matters on a power grid and losing a lot of solar breaks that balance.

💃 Power Outage Street Party (FastCompany)
Mass power outage feels like the world is ending but Spain knows how to party.

Business & Economics

🍔 New Potential for Abundance (The Great Progression: 2025-2050)
AI can drive down labor costs, solar can drive down power costs and bioengineering can do amazing things to power an abundant future if we let it.

🚫 Tariffs Are Taking Effect (Apricitas Economics)
Core GDP growth seems to have held on as other metrics went wild with pre-tariff stockpiling.

🛑 Economy Stuck In The Tracks (Strength In Numbers)
One review of the April economic data.

🤖 Welcome to the Future (Noah Smith)
We are living in the cyberpunk future that just leads to another future.

Interesting

🧠 deepculture (deepculture)
Many of the interesting articles you find here come from deepculture. It is a veritable gold mine of ideas. This is the other newsletter you should read on Tuesdays.

💵 Make Sure Your Little Girls Bet Too (Money Changes Everything )
Women are not worse investors than men, they are just brought up to be more risk averse.

💊 Texan King of MDMA (Texas Monthly)
Techno grew up in Dallas. So did the MDMA business.

🎧 Music & Your Brain (Fast Company)
Use music intentionally to improve your flow state.

🏹 Why Archers Didn’t Volley Fire (ACOUP)
Learn why archers fired alone unlike in the movies.

📷 Art Deco Tour (Wallpaper*)
Explore some lesser known Art Deco marvels as the style turns 100.

Deep Research Drag Race

Chatting with AI is fun. Using it to scan sources, plan research, and deliver meaningful results is where its real power lies. In recent months all of the major AI tools have rolled out just such a feature based on their model or tooling. Deep Research is present on ChatGPT, Claude, Gemini, Grok, and Perplexity. In all cases, it is explicitly engaged via a button. I tested each with two questions to see which tool worked the best. Implementation details matter more than the model itself. Gemini is the best overall choice. The long version is a bit more nuanced and interesting.

Test Protocol

I asked two questions of my AI assistants. The first was a history question:

What happened at the Battle of the Golden Spurs? Why did it happen? What factors led to it being fought there?
What happened during the battle? Can you create a graphic showing the action on that day hour by hour?

This was meant to test general research ability, ability to try and pull together a graphic from text sources, and to weave a coherent story about a somewhat documented but fairly confusing event with nebulous causes and effects.

The second question was quite a bit more developed. I used ChatGPT to build it out, but the basic concept is: “Do a deep dive on $STOCK and suggest a few options strategies.” The idea was to have it look at current information and make judgments based on a wide set of data. You can see the full prompt here.

For each question, I looked at a few factors to determine what tool was best:

How long did the tool take to do the research?
How many sources were consulted? What is the quality of these sources?
How well did the tool follow the instructions?
How well did the tool answer the question?
How could I get the answer out of the tool? What can I do with the output?

Read on to see how each of these tools fared in this test and to learn which one to use where.

ChatGPT

OpenAI was unsurprisingly the first to launch a deep research tool. ChatGPT is widely considered strong, especially when paired with the more powerful o3 model. I was excited to see how well it fared.

ChatGPT was the slowest of the tools to get to output. To its credit, it was also the best at asking questions to clarify your request. In action, it is interesting to watch — there is a lot of thinking going on as the minutes tick past. Somewhat surprisingly, ChatGPT looked at comparatively few sources — though it did seem to pick enough high-quality sources to make up for this.

O3 was a tale of two outputs. The battle question was really well done. The answer was well written and well researched. I spotted at least one huge problem in the stock report — after a very long research period of 33 minutes, the tool had the complete wrong price range for Palantir, thinking it was a $10-15 stock these days. Overall the stock output looked at a lot of 2003 data, which intersects with ChatGPT’s built-in knowledge cut-off.

ChatGPT fails as a tool on multiple fronts. The two major flaws are limited supply — most paid versions get 10 Deep Research queries per month, plus another 15 paid queries — making the tool dear. Beyond that, almost everything else has much better ways to get the long deep research out of the tool. Copy and paste does work but there are definitely better ways.

✔Ratings

Speed: Slow, bordering on glacial.
Sources: Few but good quality. Answers reflected depth of research.
Instructions: Very well. Bonus points for clarification.
Answers: In depth answers to questions, but it might not be so good at current events.
Tooling: Expensive, not great outside of copy and paste.

📃Use ChatGPT When

Use ChatGPT if you're doing long-form historical analysis, need deep synthesis, and have time to wait. Its answers are great but come at a cost of speed, price, and usability. The tool is slow, access is limited, and output extraction options are weak.

Claude

Anthropic is pushing hard into product. They are doing amazing things with MCP. Deep Research is another recent addition to the tool. Like most other things Anthropic does, it is pretty impressively implemented and worth a strong look.

Claude was noticeably faster than ChatGPT, even though there are reports of the tool taking 45 minutes. Number of sources were an interesting balance — the tool looked at relatively few sources for the Golden Spurs question, but looked under many rocks for the stocks question. Overall it did not look like it missed sourcing things.

Claude’s responses were accurate and generally followed instructions well. The answers were well thought out if a bit brief and to the point. It did not have the detail of OpenAI but it did not need the detail of OpenAI.

Tooling-wise Claude was pretty strong. You can use all of the features of the platform with your query — including MCPs. I did not get much chance to try this but it could have some very interesting uses in combining deep research and your data. Claude provides the ability to export a PDF for convenience.

✔Ratings

Speed: Not too fast, not too slow.
Sources: Few for the medieval battle, more for the stocks question. Good enough.
Instructions: Good job responding to the prompt.
Answers: Adequate, if brief.
Tooling: Integrations are interesting and potentially very powerful.

📃Use Claude When

Try Claude if you value speed, clean answers, and tight document integration. You want to leverage features of the platform like MCP integration and styles particularly. Solid choice overall. Good research and good answers overall.

Gemini

Google was the second kid on the block with Deep Research. Recently it was upgraded to use Gemini 2.5 Pro, which might be the best LLM on the market. Combine that with Google’s strong background in web crawling and it seems like Gemini is a strong contender for best of breed.

Speed has long been an obsession at Google. Gemini holds its own here — it returned results in relatively short order. It used a decent number of sources for each query, clearly enough to come up with more than adequate answers.

Output was great — the answers were well written, well formatted, and included lots of helpful tables. The stock report was especially strong and full of quantitative analysis the other tools did not attempt. The source listing and including the thoughts was also a very nice touch from Google.

Gemini’s tools are the best of the bunch. Google has ported in the audio overview feature from NotebookLM. You can export directly to a Google Doc or a Gmail, an amazing feature if you are using that platform for work.

✔Ratings

Speed: Pretty quick all things considered.
Sources: Good, and good pedigree.
Instructions: Followed them well.
Answers: Well written and well researched.
Tooling: Very good with features like audio overviews and Google integration.

⭐ Use Gemini When ⭐

Ideal for analysts working in Google Docs or Slides environments. Gemini is an amazing choice if you do a research-centric job and are in the Google Workspace. Beyond that, I would strongly recommend this for the speed, accuracy, and audio overviews.

Grok

One of Grok’s headline features is access to the real-time data in X for research. I thought it would be worth including since it is a free option.

Grok was impressively fast to do research and come up with answers. It did not use many sources for the queries.

Answers were a bit on the brief side. The output was a bit more superficial, especially on the history question, but that fits with billing. The stock question answer was very sharp, perhaps sharper than the rest. In terms of formatting, the tool ignored what I asked for on the stock side, but I will say its output format made a lot of sense and I was not unhappy with it.

For tooling, everything is embedded in the X app. There are not a lot of good ways to get your content off of X save copy and paste.

✔Ratings

Speed: Very fast.
Sources: Has access to X’s graph, seems better at more real-time things.
Instructions: Has its own mind. Go figure.
Answers: Brief but well researched given the source note.
Tooling: Too embedded in X for my tastes.

📃Use Grok When

Use Grok for social media or trending topic snapshots. Grok is fast and the tool has some access to real-time information that the platform is known for. It probably has some superpowers when used there. Price is right.

Perplexity

Perplexity is different from all of the other tools listed — they do not make an LLM, they provide tooling to mate real-time information to your queries. To some extent, they have some advantages here, as they have been building a deep research-type tool longer than the other entities listed.

That experience in deep research showed in the speed department. Perplexity was really fast at finding and processing information. Depth was great too — especially because you can tell the tool to look into academic and social media sources deterministically.

Answers were a bit brief and very fact-based. Perplexity no longer discloses which LLM it uses, but the other options appeared to create better output from a depth and readability perspective. That said, as a quicker fact sheet, Perplexity was more than adequate, and it did a good job following prescribed output formats.

Perplexity shines in the tooling department. Source options as mentioned are great. On the output side, you can easily export the content. You can also leverage the platform’s robust sharing features like creating a nicely formatted page for distribution.

✔Ratings

Speed: Fast.
Sources: Great, especially given you can include academic and social media specifically.
Instructions: Follows them well enough.
Answers: A bit on the brief and fact-focused side.
Tooling: Great export and sharing options.

Use Perplexity When

Perplexity is fast and has really good sources. Great for quicker fact based checks that do not need a lot of nuance. Social media and academic checkboxes are also nice and sharing tooling is great.

Final Verdict: Gemini 👆 ChatGPT 👇

If you're exploring AI for research, you're spoiled for choice. Each tool has a niche — and none are unusable. Gemini stands out for its integration, speed, and export features, especially if you're deep in the Google ecosystem. Grok and Perplexity stand out for their speed and real-time information. Claude is a solid choice especially as MCP takes hold. In a rare loss, ChatGPT is bringing up the rear. At least today — if the past is any prologue, this will change with a new release sometime next week.

Musical Coda

The Look

There will never be another Jamie Vardy. Original Football tells you about his amazing career.

Jamie Vardy's incredible career saw him win Golden Boot in non-league AND Prem as magical 13-year Leicester spell ends | The Sun

Did you enjoy reading this post? Hit the ♥ button above or below because it helps more people discover great Substacks like this one and it helps train your algorithm to get you more posts you like. Please share here or in your networks to help us grow!

Discussion about this post

Ready for more?