Experience Sitecore ! | Sitemaps in Sitecore XM Cloud: Automation, Customization, and SEO Best Practices

Experience Sitecore !

More than 200 articles about the best DXP by Martin Miles

Sitemaps in Sitecore XM Cloud: Automation, Customization, and SEO Best Practices

In Sitecore XM Cloud, sitemaps are generated and served via Experience Edge to inform search engines about all discoverable URLs. XM Cloud uses SXA’s built‑in sitemap features by default, storing the generated XML as media items in the CMS so they can be published to Experience Edge. Sitemap behavior is controlled by the Sitemap configuration item under /sitecore/content/<SiteCollection>/<Site>/Settings/Sitemap. There are few important fields - Refresh threshold which defines minimum time between regenerations, Cache expiration, Maximum number of pages per sitemap for splitting into a sitemap index, and Generate sitemap media items which must be enabled to publish via Edge. The Sitemap media items field of the Site item will list the generated sitemap(s) under /sitecore/media library/Project/<Site>/<Site>/Sitemaps/<Site>​, and the default link provider is used unless overridden. Tip: you can configure a custom provider via <linkManager> and choose its name in the Sitemap settings.

Automated Sitemap Generation Workflow

When content authors publish pages, XM Cloud schedules sitemap regeneration automatically based on the refresh threshold. Behind the scenes, an OnPublishEnd pipeline (often the SitemapCacheClearer.OnPublishEnd handler in SXA) checks each site’s sitemap settings. If enough time has elapsed since the last build, a Sitemap Refresh job runs. In this job, the old sitemap media item is deleted and a new one is generated and saved in the Media Library​. Once created, the new sitemap item is linked in the Sitemap media items field of the site and then published. This typically triggers two publish actions: one to publish the new media item (/sitecore/media library/Project/.../Sitemaps/<Site>/sitemap) and one to re-publish the Site item so Experience Edge sees the updated link.

For high-volume publishing, it’s best to set a reasonable refresh threshold to batch sitemap generation. For example, if you publish many pages daily, you might set the refresh threshold to 0 forcing a rebuild every time, or schedule a daily publish so the sitemap is updated once per day. Generating sitemaps can be resource-intensive especially for large sites, so avoid rebuilding on every small change unless necessary.

Sitemap Filtering: SXA provides pipeline processors to include or exclude pages. By default, items inheriting SXA’s base page templates have a Change frequency field. Setting it to "do not include" will exclude that page from the sitemap​. The SXA sitemap pipelines (sitemap.filterItem) include built‑in processors for base template filtering and change-frequency logic. To exclude a page, simply open it in Content Editor (or Experience Editor SEO dialog) and set Change frequency to "do not include"​.

GraphQL Sitemap Query: Once published, the XM Cloud GraphQL API provides access to the sitemap media URL. For example, the following query returns the sitemap XML URL for a given site name:

query SitemapQuery($site: String!) {
      site {
        siteInfo(site: $site) {
          sitemap
        }
      }
    }

This returns the Experience Edge URL of the generated sitemap media item. You can use this in headless code or debugging to verify the sitemap’s existence and freshness.

Sitemaps in Local Docker Containers

In a local XM Cloud Docker setup, the /sitemap.xml route often returns an empty file by default because the Experience Edge publish never occurs. There is no web database or Edge target, so the OnPublishEnd process never actually runs, leaving the empty sitemap item. Attempting to publish locally throws an exception (Invalid Authority connection string for Edge). To debug or test sitemap issues locally, you can manually trigger the SXA sitemap pipeline.

I really like the Sitemap Developer Utility approach suggested by Jeff L'Heureux: in your XM Cloud solution’s Docker files, create a page (e.g. generateSitemap.aspx) inside docker\deploy\platform with code that simulates a publish event. For example, one can invoke the SitemapCacheClearer.OnPublishEnd() method manually in C#.

// Simulate a publish event for the "Edge" target
    Database master = Factory.GetDatabase("master");
    List<string> targets = new List<string> {"Edge"};
    PublishOptions options = new PublishOptions(master, master, PublishMode.SingleItem, 
        Language.English, DateTime.Now, targets);
    Publisher publisher = new Publisher(options);
    SitecoreEventArgs args = new SitecoreEventArgs("OnPublishEnd", new object[] { publisher }, new EventResult());
    new SitemapCacheClearer().OnPublishEnd(null, args);
    

This code triggers the same sitemap build logic as a real publish​. Jeff's utility page provides buttons to run various steps (OnPublishEnd, the sitemap.generateSitemapJob pipeline, etc.) and shows output.

Once you run the utility and the cache job completes, the media item is regenerated. Then restart or refresh your Next.js site locally to see the updated sitemap at http://front-end-site.localhost/sitemap.xml. The browser will display the raw XML with <loc>, <lastmod>, <changefreq>, and <priority> entries as it normally should.

Sitemap Customization for Multi-Domain Sites

A common scenario is one XM Cloud instance serving multiple language or regional domains (say, www.siteA.com and www.siteA.fr) with one shared content tree. In SXA this is often handled by a Site Grouping with multiple hostnames. By default, SXA will generate a single sitemap based on the primary hostname. This leads to two issues: the same XML file is returned on both domains, and each page appears several times (once per language) under the same <loc>. For example, a bilingual site without customization might show both English and French URLs under the English domain, duplicating <url> entries.

To fix this, customize the Next.js API route (e.g. pages/api/sitemap.ts) that serves /sitemap.xml. The approach is: detect which host/domain the request is for, fetch the raw sitemap XML via GraphQL, and then filter and rewrite the entries accordingly. For instance, if the host header contains the French domain, only include the French URLs and update the <loc> and hreflang="fr" links to use the French hostname. Pseudocode for the filtering might look like:

if (lang === 'en') {
      // Filter out French URLs and fix alternate links
      urls = urls.filter(u => !u.loc[0].includes(FRENCH_PREFIX))
                 .map(updateFrenchAlternateLinks);
    } else if (lang === 'fr') {
      // Filter out English URLs and swap French loc to French domain
      urls = urls.filter(u => u.loc[0].includes(FRENCH_PREFIX))
                 .map(updateLocToFrenchDomain)
                 .map(updateFrenchAlternateLinks);
    }
    

Here, FRENCH_PREFIX is something like en.mysite.com/fr, and we replace it with the French hostname. In practice, the XML is parsed (e.g. via xml2js), then the result.urlset.url array is filtered and modified, and rebuilt to XML. There is a great solution suggested by Mike Payne which uses two helper functions filterUrlsEN and filterUrlsFR to drop unwanted entries and updateLoc/updateFrenchXhtmlURLs to replace URL prefixes​. Finally, the modified XML is sent in the HTTP response. This ensures that when a sitemap is requested from www.site.ca, all <loc> URLs and alternate links point to site.ca, and when requested from www.othersite.com, they point to www.othersite.com.

SEO Considerations and Best Practices

  • Include Alternate Languages (hreflang): XM Cloud (via SXA) automatically adds <xhtml:link rel="alternate" hreflang="..."> entries in the sitemap for multi-lingual pages. Ensure these are correct for your domains. After customizing for multiple hostnames, the <xhtml:link> URLs should also be updated to the appropriate domain​. This helps Google index the right language version for each region.

  • Set Change Frequency and Priority: Use SXA’s SEO dialog or Content Editor on the page item to set Change frequency and Priority for each page. For example, if a page is static, set a low change frequency. These values are written into <changefreq> and <priority> in the sitemap. Note: Pages can be excluded by setting frequency to "do not include".

  • Maximize Crawling via Sitemap Index: If your site has many pages, configure Maximum number of pages per sitemap so XM Cloud generates a sitemap index with multiple files. This avoids any single sitemap exceeding search engine limits and keeps crawlers from giving up on a very large file.

  • Robots.txt: SXA will append the sitemap link /sitemap.xml to the site’s robots.txt automatically​. Verify that your robots.txt in production references the correct sitemap and hostname.

  • Media Items and Edge: Always keep Generate sitemap media items enabled: without having this, XM Cloud cannot deliver the XML to the front-end. After a successful build, the sitemap XML is stored in a media item and served by Experience Edge. You can confirm the published sitemap exists by checking /sitecore/media library/Project/<Site>/<Site>/Sitemaps/<Site> or by running the GraphQL query mentioned above.

  • Link Provider Configuration: If your site uses custom URL routing (e.g. language segments or rewritten paths), you can override the link provider used for sitemap URLs. In a patch config, add something like:

    <linkManager defaultProvider="switchableLinkProvider">
          <providers>
            <add name="customSitemapLinkProvider" 
                 type="Sitecore.XA.Foundation.Multisite.LinkManagers.LocalizableLinkProvider, Sitecore.XA.Foundation.Multisite"
                 lowercaseUrls="true" .../>
          </providers>
        </linkManager>

    Don't forget to set the "Link provider name" field in the Sitemap settings to customeSitemapLinkProvider​ afterwards. This ensures the sitemap uses the correct domain and culture prefixes as needed.

Diagnostics and Troubleshooting

If the sitemap isn’t updating or the XML is wrong, check these:

  • Site Item Settings: On the site’s Settings/Sitemap item, confirm the refresh threshold and expiration are as expected. During debugging you can set threshold to 0 to force immediate rebuilds.

  • Was it published to Edge? Ensure the sitemap media item was published to Edge. You might need to publish the Site item or Media Library manually if it wasn’t picked up.

  • Cache Type: In the SXA Sitemap settings, the Cache Type can be set to "Inactive," "Stored in cache", or "Stored in file". For XM Cloud, the default "Stored in file" is typically used so the XML is persisted. If set to "Inactive", the sitemap generator will not run.

  • Inspect Job History: In the CM admin (/sitecore/admin/Jobs.aspx), look for the "Sitemap refresh" jobs to see if these succeeded or threw errors.

  • Next.js Route Errors: If your Next.js site’s /sitemap.xml endpoint returns an error, inspect its handler. The custom API route uses GraphQLSitemapXmlService.getSitemap(). Ensure the hostnames in your logic match your ENV variables, namely PUBLIC_EN_HOSTNAME. Add logging around the xml2js parsing if the output seems empty or malformed.

By following the above patterns - configuring SXA sitemap settings, automating generation on publish, and customizing for your site topology -you can ensure that XM Cloud serves up accurate, SEO‑friendly sitemaps. This helps search engines index your content fully and respects multi-lingual domain structures and refresh logic specific to a headless architecture.

References: one, two, three and four.

Comments are closed