How to Write Content That AI Agents Cite
2026-05-29 · 14 min read
You wrote a great article. Deep research. Clear structure. Useful to real humans. And when someone asks ChatGPT the same question your article answers, it cites someone else.
This happens constantly. And the reason is not that the other article was better written. The reason is structural. AI agents do not cite content the way humans cite content. They do not browse, compare, and pick the most eloquent source. They resolve entities, match structured data, and pull from the sources that are easiest to verify programmatically.
Good writing is necessary. But it is not sufficient. What makes content citable to an AI agent is a specific combination of structure, schema, cross-referencing, and original data that most content creators never think about.
I know this because I build these systems for a living. Across three companies, I have had to make our content visible not just to Google, but to the AI layer that now sits on top of it. The patterns are consistent. And they are not what most "AI SEO" guides tell you.
Why AI agents cite differently than search engines rank
A search engine ranks pages. It looks at backlinks, keyword density, domain authority, page speed, and about two hundred other signals. Then it orders a list. You click a blue link. The search engine's job is done.
An AI agent does something different. It needs to generate a confident answer. To do that, it has to resolve facts. And to cite a source alongside that answer, it needs to trust the source at a structural level.
As I covered in How AI Training Data Decides Who Gets Cited, what matters first is whether your content existed in training data at all. But for models with retrieval augmented generation (RAG), real-time browsing, or knowledge graph access, what matters is whether your content is structurally parseable and verifiable.
71% of pages cited by ChatGPT use schema markup [1]. That is not a coincidence. It is architecture.
The anatomy of citable content
Here is the full picture. This is the structure that gets cited, from the data layer up to the content layer.
Article, Person, Organization,
FAQPage, sameAs properties"] --> B["Entity Resolution Layer
Author linked to Wikidata/ORCID,
Org linked to official registries"] B --> C["Content Structure Layer
Semantic HTML: h1-h3 hierarchy,
lists, tables, clear sections"] C --> D["Original Data Layer
First-party data, methodology,
named frameworks, unique findings"] D --> E["Cross-Reference Layer
Citations to authoritative sources,
backlinks from institutions"] E --> F["AI Citation Output
Confident answer + source attribution"] style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#c8a882,color:#ede9e3 style C fill:#222221,stroke:#c8a882,color:#ede9e3 style D fill:#222221,stroke:#6b8f71,color:#ede9e3 style E fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#191918,stroke:#c8a882,color:#c8a882
Each layer is necessary. Skip one, and the system weakens. Let me walk through each.
Layer 1: JSON-LD schema markup
Schema markup is the single highest-leverage thing you can do for AI citability. It is metadata that tells machines what your content means, not just what it says.
At minimum, every article page needs:
- Article schema with author, datePublished, dateModified, publisher, headline, and description
- Person schema for the author with sameAs links to LinkedIn, Wikidata, ORCID, or other authority profiles
- Organization schema for the publisher with sameAs links to official registries and social profiles
- FAQPage schema for any FAQ section (this is what AI agents love most for direct answers)
The format matters. Google recommends JSON-LD, and AI systems prefer it too because it separates structured data from HTML. The machine can read the JSON-LD block without parsing your CSS, your layout, or your decorative elements.
Here is the thing most guides skip: the sameAs property is not optional for AI citation. It is the mechanism by which an AI agent confirms that "Ibrahim Anwar the author" is the same entity as "Ibrahim Anwar on LinkedIn" and "Ibrahim Anwar in Wikidata." Without it, you are just a string of characters. With it, you are a resolved entity.
Layer 2: Entity resolution
This is where most content fails. The author is not connected to anything verifiable. The organization has no structured presence outside the website itself.
AI systems are built on knowledge graphs. Google's Knowledge Graph contains over 500 billion facts about 5 billion entities [2]. When your content's author and publisher are nodes in that graph, connected to other verified nodes, the AI can trust the source. When they are not, the AI has to guess. And AI systems that guess do not cite. They paraphrase.
Practical steps:
- Create or claim your Wikidata entry (yes, individuals can have them if notable)
- Ensure your Google Business Profile and LinkedIn match your schema exactly
- Use
rel="me"links across all your profiles to close the identity loop - Publish an author page on your site with structured Person schema that links everywhere
Layer 3: Content structure
Semantic HTML is not a nice-to-have. It is how AI agents parse your content into discrete, extractable units.
The hierarchy matters. An h1 tells the machine "this is the main topic." An h2 says "this is a subtopic." An h3 says "this is a detail within that subtopic." Lists are parsed as enumerable items. Tables are parsed as structured comparisons. Paragraphs without headings are harder to attribute to specific claims.
I wrote about this at length in the case for long-form content. The issue is not word count. It is information density organized in a machine-parseable hierarchy. A 3,000-word article with clear heading structure will get cited over a 5,000-word article that reads like a stream of consciousness.
Question-style headings perform especially well. "What is schema markup?" directly mirrors how users prompt AI agents. When the AI encounters a heading that matches the query and a clear answer paragraph below it, that is the easiest extraction path. That is what gets cited.
Layer 4: Original data and methodology
This is the layer that separates citable content from commodity content.
AI agents can find a thousand articles explaining what schema markup is. They will not cite all of them. They will cite the ones with original data, unique frameworks, or first-party research that cannot be found elsewhere.
What counts as original data:
- Your own experiments and their results (with methodology)
- Named frameworks you created (the "Trust Chain" is mine, for example)
- Surveys you conducted with sample sizes and methodology disclosed
- Analyses of proprietary datasets
- Case studies with real numbers (not "a client" but specific, verifiable outcomes)
When you name a framework, define it clearly, and use it consistently across multiple pieces of content, AI models learn to associate that framework with you. This is how you stop being "one of many sources" and start being "the source" for a specific concept.
Layer 5: Cross-referencing and institutional backlinks
AI agents verify claims by checking whether other trustworthy sources say the same thing. This is cross-referencing. And it works both directions.
Forward references: you cite authoritative sources (academic papers, government data, institutional reports) in your content. This signals to the AI that your claims are grounded in verifiable material.
Backward references: authoritative sources link to you. This is harder to get and more valuable. A mention in a university publication, an industry report, or a government database does more for AI citability than a thousand blog backlinks.
As I explored in the comparison between Perplexity and Google, different AI agents weight these signals differently. Perplexity leans heavily on real-time web results and favors pages with clear attribution. ChatGPT relies more on training data patterns and entity resolution. Gemini sits somewhere in between. But all of them reward content that is cross-referenced with institutional sources.
The citability checklist
Here is the full checklist. Score yourself honestly.
| Factor | What AI agents look for | Priority |
|---|---|---|
| Article schema (JSON-LD) | Headline, author, datePublished, dateModified, publisher, description | Critical |
| Person schema for author | Name, sameAs links (LinkedIn, Wikidata, ORCID), jobTitle, worksFor | Critical |
| Organization schema | Name, sameAs links, official URL, logo, founding date | Critical |
| FAQPage schema | Question-answer pairs matching long-tail queries | High |
| Semantic HTML hierarchy | Proper h1 > h2 > h3 nesting, lists, tables, not div soup | High |
| Question-style headings | Headings that mirror how users prompt AI agents | High |
| Original data or framework | First-party research, named methodology, unique findings | High |
| Authoritative citations | References to academic, government, or institutional sources | Medium |
| Institutional backlinks | Mentions from universities, industry bodies, or official databases | Medium |
| Entity cross-linking | sameAs, rel="me", consistent NAP across profiles | Medium |
| Content freshness | dateModified reflects actual updates, not fake refreshes | Medium |
| Clear answer-first structure | Definitive answer in first paragraph under each heading | Medium |
If you are missing the critical items, nothing else on this list matters. Start there.
Before and after: what structural authority looks like
Let me show you the difference. Same topic, same information, two completely different levels of AI citability.
<title>Schema Markup Guide</title><h1>Everything You Need to Know About Schema</h1><p>Schema markup is really important for SEO these days...</p><div>By: Marketing Team</div>Problems: No JSON-LD. No author entity. No FAQ schema. Generic heading. No dates. No verifiable attribution. "Marketing Team" is not a resolvable entity. AI cannot verify anything.
<script type="application/ld+json">{ "@type": "Article", "author": { "@type": "Person", "name": "Ibrahim Anwar", "sameAs": ["wikidata", "linkedin"] }, "datePublished": "2026-05-29" }</script><h1>How to Write Content That AI Agents Cite</h1><h2>What makes content citable to AI?</h2><p>AI agents cite content that is structurally verifiable...</p>Result: Resolved author entity. Timestamped. Question-style heading. Answer-first structure. AI can verify, extract, and attribute confidently.
The content quality might be identical. The citability is not even close.
The schema implementation that actually matters
Let me get specific. I see a lot of schema guides that tell you to add Article schema and call it done. That is the minimum. Here is what actually moves the needle for AI citation.
The connected entity graph
Your schema should not be isolated blocks. It should be a connected graph. The Article references the Person. The Person references the Organization. The Organization references official registries. Each node has sameAs properties that point to external authority databases.
This is what AI agents use for entity resolution. When ChatGPT encounters your article, it does not just read your text. It checks whether the author entity in your schema connects to a known entity in its knowledge base. If it does, confidence goes up. If it does not, you are treated as an unverified source.
The practical difference: a verified entity gets cited with attribution. An unverified entity gets paraphrased without credit.
FAQ schema is AI's favorite format
This is not my opinion. It is observable behavior. FAQPage schema is the single most effective schema type for getting direct AI citations [3]. The reason is simple: FAQ schema provides question-answer pairs in a format that maps directly to how users prompt AI agents.
User asks: "What is entity infrastructure?"
If your page has an FAQ schema entry with that exact question and a 40-60 word answer, you are handing the AI a pre-packaged response with attribution. It does not need to summarize your 3,000-word article. It just needs to extract and cite.
Keep FAQ answers between 40 and 60 words. Long enough to be substantive. Short enough to be extractable. And make sure the visible content on the page matches the schema exactly. Discrepancies between schema and visible content are penalized.
Original data is your moat
I run three companies. PT Arsindo (industrial pumps), Hibrkraft (handmade craft), and Witanabe (digital infrastructure). Across all three, the content that gets cited by AI agents is never the generic explainer. It is always the content with original data.
At Witanabe, we documented the exact process for getting a client's Knowledge Graph panel to appear. We published the methodology, the timeline, the specific steps, and the measurable outcome. That piece gets cited by AI when people ask about Knowledge Graph optimization. Not because the writing is exceptional. Because the data is original and the methodology is transparent.
The same pattern holds across industries. AI agents are looking for what they cannot find elsewhere. If your article says the same thing as fifty other articles, you are competing on domain authority alone. If your article contains data that exists nowhere else, you are the only possible source.
This is not about making things up. It is about documenting what you actually do. Every practitioner has original data. Most just do not publish it.
The content structure that wins
Based on what I have seen work repeatedly, here is the content structure that maximizes AI citability:
- Question-style title that mirrors a real query ("How to..." or "What is...")
- Definitive answer in the first paragraph (not a teaser, not a "in this article we will...")
- Structured sections with h2 headings that are also questions or clear topic labels
- Original data or framework in at least one section
- Comparison table for any evaluative content
- FAQ section with 4-6 questions targeting long-tail variations of the main topic
- References to authoritative external sources, properly cited
- JSON-LD schema covering Article, Person, Organization, and FAQPage
Notice what is not on this list. Word count targets. Keyword density. Internal linking strategies. Those are SEO concerns. They matter for traditional search. For AI citation, structural authority is what counts.
What most guides get wrong
Most "AI SEO" content tells you to write clearly and add schema. That is correct but incomplete. Like telling someone to build a house by saying "use bricks and a roof."
Here is what they miss:
Schema without entity resolution is decoration. Adding Article schema with an author name but no sameAs links gives the AI nothing to verify. It is better than nothing, but not by much.
Content without original data is commodity. If your article summarizes what others have written, you are competing against every other summarizer. AI agents prefer primary sources.
Structure without freshness signals is unreliable. A well-structured article from 2019 with no dateModified tells the AI that the information might be outdated. Update your content and update the timestamp honestly.
Backlinks from blogs are not the same as institutional references. A hundred links from marketing blogs do less for AI citability than one mention in a university curriculum or government report. The quality threshold is different from traditional SEO.
The uncomfortable truth about AI citability
Here it is. Most content on the internet will never be cited by AI agents. Not because it is bad. Because it is structurally invisible.
Only about 12.4% of registered domains have implemented schema.org structured data [4]. That means roughly 88% of the web is missing the basic infrastructure that AI agents use to identify, verify, and cite sources.
This is actually good news if you are willing to do the work. The bar is not high. It is just specific. And most people are not willing to learn the specific requirements because they are not glamorous. Adding JSON-LD to your header is not as exciting as writing a viral thread. Connecting your author entity to Wikidata is not as fun as designing a new homepage.
But it is the work that compounds. Every page you publish with proper structural authority makes the next page more citable. Because entity authority is cumulative. The more connected and verifiable your presence becomes, the more AI agents trust everything you publish.
What to do this week
If you have read this far and want to actually improve your AI citability, here is a concrete starting point. No fluff.
- Audit your schema. Run your top 5 pages through Google's Rich Results Test. If you do not have Article schema with author attribution, that is your first fix.
- Connect your author entity. Add sameAs links in your Person schema to LinkedIn, Wikidata (create an entry if needed), and any other authority profiles.
- Add FAQ schema. Pick your highest-traffic article. Add 4-5 question-answer pairs that target how people actually ask about your topic. Add the corresponding FAQPage schema.
- Publish one piece with original data. Document something only you know. A process, a result, a methodology. Publish it with full schema and clear structure.
- Check your dateModified. If your articles do not have dateModified in schema, add it. Keep it honest.
This is not a weekend project. It is infrastructure. Like I build for clients through entity infrastructure work. The returns are not instant. But they compound in a way that "content marketing" alone never will.
Frequently Asked Questions
What type of schema markup is most important for AI citations?
FAQPage schema is the most effective for direct AI citations because it provides pre-packaged question-answer pairs that map to user prompts. Article schema with Person and Organization entities (connected via sameAs properties) is essential for author attribution and entity resolution. Implement both.
Can good writing alone get you cited by AI agents?
No. Good writing is necessary but not sufficient. AI agents cite based on structural authority: schema markup, entity resolution, original data, and cross-referencing with institutional sources. A mediocre article with proper schema and verified author entity will often be cited over a brilliant article with no structured data.
How long does it take for schema changes to affect AI citations?
Foundation schema work (Article, Person, Organization) takes 2-4 weeks to be crawled and processed. Full entity authority builds over 3-6 months as knowledge graphs update and models retrain. For RAG-based systems like Perplexity that use real-time retrieval, changes can surface within days.
Does schema markup matter for ChatGPT specifically?
Yes. Research shows 71% of pages cited by ChatGPT use schema markup. ChatGPT relies heavily on entity resolution from its training data, which includes structured data from Common Crawl and knowledge graph databases. Proper schema makes your content more likely to be included and correctly attributed in training data.
What is the difference between AI citability and traditional SEO?
Traditional SEO optimizes for ranking in a list of links. AI citability optimizes for being selected as a source in a generated answer. SEO rewards keyword density, backlinks, and page speed. AI citation rewards entity verification, structured data, original research, and institutional cross-referencing. They overlap but are not the same.
References
- OutpaceSEO. "Schema Markup & Structured Data Guide: The 2026 Masterclass." OutpaceSEO, 2026. Link
- WPRiders. "Schema Markup: 8 Tactics to Boost AI Citations." WPRiders, 2025. Link
- Averi AI. "Schema Markup for AI Citations: The Technical Implementation Guide." Averi.ai, 2025. Link
- Averi AI. "Schema Markup for AI Citations: The Technical Implementation Guide." Averi.ai, 2025. As of 2025, more than 45 million web domains have implemented schema.org structured data, approximately 12.4% of all registered domains. Link
- Suso Digital. "Writing for Robots: Content Writing Best Practices for AI Visibility." Suso Digital, 2025. Link
Linked from
Related notes
The companies that show up in ChatGPT are the ones that bothered to be verifiable.