
Three weeks ago I posted about content-model-simulator at v0.3.0. The pitch was simple: stop testing Contentful content models blindly. Define schemas locally, preview them in a browser, catch the bad field design before editorial has 200 entries in it.
Today the package is at v0.6.1. That's four minor releases in three weeks, all pre-1.0, all shipped publicly the moment they were ready instead of bundled into one big drop. Here's what landed, what I learned about the cadence, and one specific thing I need from anyone reading this.
v0.4.0 — to-import closed the migration loop
The first big gap I hit after v0.3.0 was the silent one between "the simulation looks right" and "now actually run the migration in Contentful." cms-sim simulate produced a beautiful preview, but exporting that to a Contentful-importable shape was still on the user.
cms-sim to-import converts a simulation output directory into the JSON bundle that the official contentful-import CLI consumes. Validations, default values, link references, RichText, locales — all the bits Contentful's importer expects, generated from the simulator's already-validated model.
# Simulate locally
npx cms-sim --schemas=schemas/ --input=data/export.ndjson --output=output/
# Convert simulator output → contentful-import bundle
npx cms-sim to-import --input=output/ --schemas=schemas/ --output=bundle/
# Run the actual migration with Contentful's own tool (cms-sim never touches your space)
npx contentful-import --content-file=bundle/contentful.json --space-id=YOUR_SPACE --management-token=YOUR_CMA
The bundle gets validated against contentful-import's own Joi schema before it's written, so a simulation that passes locally produces a payload that passes the importer too. By default everything's marked as draft; pass --publish and the entries / assets ship as published.
This is the piece that turned cms-sim from "a preview tool" into "the preview half of a real migration workflow."
v0.5.0 — visual HTML diff + 8 fixes that only dogfood could surface
The next thing I needed was a way to see what changed between two simulations. I had cms-sim diff already, but it was JSON-only. Reading a diff JSON to compare two content models is a job for nobody.
cms-sim diff --html --open renders both schema-only and full-report diffs as a self-contained HTML page with KPI cards (added / removed / changed content types, entry-count deltas, error and warning gain or loss), collapsible per-content-type panels with field-level changes, color-coded badges for added / removed / changed / reordered fields, and an entry-count table with side-by-side bars. Zero external assets — open the file with file:// and it works.
The more interesting half of v0.5.0 was an unplanned batch of 8 UX fixes that came out of dogfooding the package against a production Contentful space with 23 content types and 9,246 entries. Some highlights:
- The simulator was silently filtering content types with zero entries from the report, which meant a freshly added empty CT vanished from the model graph the moment you ran
diff. Fixed. - The content browser was hardcoding the entry display name to
internalName > title > name > lblTitle > idinstead of reading the schema's declareddisplayField. With auto-generated IDs ininternalName, the browser was showingpillarpage-5x3cm7...instead of the human-readable titles. Fixed. - Safari and Chrome restore
<select>and<input>form state across reloads of the same file path. A user who picked "Pillar Page" once saw an empty list on every subsequent load if the new dataset didn't have that CT.autocomplete="off"+ explicit reset on init. - Synchronous render of thousands of entries blocked the browser before it could paint the loading state. Defer with
requestAnimationFrame× 2 — one to let the spinner paint, one to start the heavy work.
The pattern I noticed: every one of these bugs was a filter or default that "cleaned up" the output in a way that hid real information. Whenever I saw entryCount > 0 as a filter going forward, I started suspecting it.
v0.6.0 — pull-sanity (the harder source)
The roadmap I'd written had v0.6.0 listed as "Multi-CMS adapter — prove the architecture is CMS-agnostic." When I sat down to start it, I realized this was the wrong direction. The package is explicitly positioned as the offline Contentful simulator — the README literally says "Stop designing Contentful models blind" and "Who this is NOT for: Non-Contentful platforms." Building a multi-CMS adapter would have contradicted the whole product positioning.
The real gap was different. Today, the package already pulled from Contentful (cms-sim pull) and could preview migrations from WordPress XML or Sanity NDJSON — but only if the user wrote a transforms/ directory by hand to map their source data into Contentful's shape. That's exactly the work cms-sim pull automates for Contentful sources. It should automate it for non-Contentful sources too.
So v0.6.0 became cms-sim pull-sanity:
# Export your Sanity dataset (their CLI, not ours)
sanity dataset export production export.ndjson
# Convert to Contentful shape (offline, zero deps, read-only)
npx cms-sim pull-sanity --input=export.ndjson --output=pulled-sanity/
# Now everything downstream works the same as with `cms-sim pull` from a Contentful space
npx cms-sim --schemas=pulled-sanity/schemas/ --input=pulled-sanity/data/entries.ndjson --open
npx cms-sim to-import --input=output/ --schemas=pulled-sanity/schemas/ --output=bundle/
What this command does in one pass:
-
Infers a content type schema from real document samples (Symbol / Text / Integer / Number / Boolean / Date / Object / RichText / Link Entry / Link Asset / Array variants), with
linkContentTypevalidations derived from the actual cross-document references in the corpus. -
Rewrites
_refreferences (single + nested in arrays) to Contentful'sLink Entrysys shape. -
Rewrites image references (
{_type: 'image', asset: {_ref}}) toLink Asset+ emits anassets/assets.jsonindex. - Converts Portable Text (Sanity's structured rich-text format) into Contentful RichText documents — paragraph / heading / lists / blockquote / decorator marks / hyperlinks all map. Unknown marks emit explicit warnings while preserving the text.
-
Detects locale-shaped values (
{en: '…', es: '…'}) and fans each document out into one variant per locale, with the matching schema fields flaggedlocalized: true.
End-to-end against the production Sanity export I use as my dogfood dataset (21 docs / 7 content types / 15 assets / 2 locales / Portable Text bodies): pipeline warnings dropped from 97 against the raw NDJSON to 0 after pull-sanity.
Running it against that dataset looks like this:
════════════════════════════════════════════════════════════════════
Content Model Simulator — Pull Sanity
════════════════════════════════════════════════════════════════════
Reading Sanity NDJSON export (offline)…
Read 21 document(s), 15 asset(s) from .../production.ndjson
1 inference warning(s):
• [vehicle.availableFeatures] Contentful arrays only support Symbol/Link items — collapsed to Object.
Wrote 7 schema(s) + 37 entry/entries (2 locales) + 15 asset(s) to ./pulled-sanity
✓ 7 content type(s) detected
✓ 37 entry/entries written (21 docs × 2 locales, deduped where missing)
✓ 15 asset(s) detected
And the resulting content browser, with a multi-locale post (English + Spanish bodies rendered as Contentful RichText nodes side-by-side, references resolved to actual entries, and the field-type badges showing RichText / Object / Date / Link Entry as inferred):
v0.6.1 — pull-wordpress (the larger audience)
A week later, the same approach landed for WordPress. cms-sim pull-wordpress reads a WXR XML export (wp-admin → Tools → Export → All content) and writes the same Contentful-shape output:
npx cms-sim pull-wordpress --input=wp-export.xml --output=pulled-wp/
npx cms-sim --schemas=pulled-wp/schemas/ --input=pulled-wp/data/entries.ndjson --open
WordPress is messier than Sanity because the references aren't structured the same way:
-
Authors come from
<dc:creator>as login strings, not refs. -
Categories / tags come from inline
<category>elements withnicenameslugs. -
Featured images live in
<wp:postmeta>under the_thumbnail_idkey as a post_id pointer. - Bodies are Gutenberg-flavored HTML (block comments stripped, but the inner markup stays).
-
Locales (if Polylang is installed) come from a
languagetaxonomy on each item.
pull-wordpress handles all of those: post categories[] slugs → Array<Link Entry> with linkContentType: ['category'] validations derived from the real refs, _thumbnail_id → featuredImage: { sys: Link Asset }, attachments extracted into assets/assets.json, body HTML converted to Contentful RichText via the existing htmlToRichText walker, and per-doc Polylang locale tags surfaced in contentful-space.json.
Stable id prefixes per content type — wp_<postId>, wp_author_<login>, wp_category_<slug>, wp_tag_<slug> — so cross-references resolve without lookup tables and the asset id in entries.ndjson matches the asset id in assets.json.
What I learned shipping four minors in three weeks
A few things stuck with me.
1. Ship per-source, not all at once
The original roadmap had Sanity and WordPress bundled as "v0.6.0 multi-CMS adapter." Splitting them into v0.6.0 + v0.6.1 was the right call. Real users of pull-sanity will surface bugs that no synthetic fixture catches, and those bugs feed back into the WordPress design instead of being baked into a single shipped release I can't unship.
Pre-1.0 SemVer is built for exactly this rhythm. MINOR bumps add features; PATCH bumps fix them. Bundling delays releases by weeks for no real benefit — there's no installed base getting "release fatigue" yet.
2. The simulator's own warning count is the best progress metric
For every milestone of pull-sanity and pull-wordpress, I tracked one number: how many warnings does the simulator emit against the real dataset? It started at 97 for the link-vehicles Sanity export and went to 0 across five milestones (schema inference → 13, ref rewriting → 13, RichText body → 13, asset linking → 0). That was more useful than any unit test for guiding what to build next.
3. Doc consistency is harder than code consistency
I shipped v0.6.0 with a hardcoded tests 510 passing badge that was actually 601 by then. Nobody noticed before the publish. I added a permanent rule to my own development docs to check the badge before every commit. The tools-don't-update-themselves problem is real even for small repos.
Help wanted — real WordPress data, especially with ACF
This is the special section.
pull-wordpress shipped in v0.6.1 with synthetic WXR fixtures only. The unit tests cover the shapes I could construct from scratch (Gutenberg + Classic bodies, nested categories, featured images, Polylang per-doc locales), and the example-wp-pull/ walk-through exercises the happy path against 12 synthetic documents. That's enough to prove the structure works, but it almost certainly misses edge cases that only show up in real production exports.
What I'd love to dogfood against:
- ACF (Advanced Custom Fields) field groups with repeaters, flexible content, post-object refs, gallery / image fields, conditional logic. ACF is the single biggest gap right now — the synthetic fixture doesn't exercise it at all, and ACF is in 60%+ of serious WordPress sites.
-
Polylang or WPML with real translation grouping — the
_translationspost-meta key that links en/fr/es versions of the same post into one logical entry. The current adapter treats each translation as a separate entry tagged with its own locale, which is correct as a starting point but loses the multi-locale-on-one-entry shape Contentful prefers. - WooCommerce product structures (product attributes, variations, custom taxonomies, gallery images).
- Custom post types + ACF combos.
- Gutenberg blocks beyond the basics — custom blocks, embeds, columns, reusable blocks.
If you maintain a WordPress site and can share the file from wp-admin → Tools → Export → All content, please drop it on the tracking issue:
https://github.com/JoshuaPozos/content-model-simulator/issues/15
Anonymization is fine and encouraged. Replace post bodies with lorem ipsum, scrub user emails and display names, swap your domain in <wp:base_site_url> and <wp:attachment_url>. What matters is the structure — the <wp:post_type> declarations, the <wp:postmeta> keys (ACF / WooCommerce / Polylang plugin conventions), the category and language taxonomy slugs, and the Gutenberg block markers inside <content:encoded>. The content can be whatever.
What you get back if you contribute:
- Your anonymized export becomes a checked-in test fixture under
src/wordpress/**/*.test.tsorexample-wp-pull/data/. - The release notes for whatever patch fixes the edge cases you surface will credit your handle (unless you ask me not to).
- If your export uncovers a non-trivial bug, you get co-author credit on the fix commit.
I'm going to let v0.6.1 sit for a couple of weeks specifically to see what the community surfaces. Whatever lands in issue #15 informs what v0.6.2 ships.
Where the project stands today
-
Current version:
v0.6.1—npm install content-model-simulator - Test suite: 659 passing, zero runtime dependencies
-
Adapters shipped: Contentful (
pull), Sanity (pull-sanity), WordPress (pull-wordpress), plusfrom-migrationsforcontentful-migrationscript replay -
Bridge to the real thing:
to-importexports acontentful-import-ready bundle - What's next: v0.7.x plugin system polish, then v0.9.0 API stabilization, then v1.0.0
The thing I'm still optimizing for is the same as v0.3: reduce stupid risk before it reaches a real Contentful space. Every release in this cycle has been a step toward making more of the work you'd otherwise do against a live space doable offline first, with structured feedback you can actually read.
If you work with Contentful and any of this sounds relevant — pull or migration planning, validating a model before it ships, comparing two iterations of a content type — give it a try. Open an issue if it breaks. Drop a WXR on issue #15 if you can.
Repo: https://github.com/JoshuaPozos/content-model-simulator
Package: https://www.npmjs.com/package/content-model-simulator
WordPress dogfood issue: https://github.com/JoshuaPozos/content-model-simulator/issues/15
United States
NORTH AMERICA
Related News
What Does "Building in Public" Actually Mean in 2026?
20h ago
The Agentic Headless Backend: What Vibe Coders Still Need After the UI Is Done
20h ago
Why I’m Still Learning to Code Even With AI
22h ago
I gave Claude a persistent memory for $0/month using Cloudflare
1d ago
NYT: 'Meta's Embrace of AI Is Making Its Employees Miserable'
1d ago




