Experience Sitecore ! | All posts tagged 'Best practices'

Sitemaps in Sitecore XM Cloud: Automation, Customization, and SEO Best Practices

24. December 2024 Administrator (0)

In Sitecore XM Cloud, sitemaps are generated and served via Experience Edge to inform search engines about all discoverable URLs. XM Cloud uses SXA’s built‑in sitemap features by default, storing the generated XML as media items in the CMS so they can be published to Experience Edge. Sitemap behavior is controlled by the Sitemap configuration item under /sitecore/content/<SiteCollection>/<Site>/Settings/Sitemap. There are few important fields - Refresh threshold which defines minimum time between regenerations, Cache expiration, Maximum number of pages per sitemap for splitting into a sitemap index, and Generate sitemap media items which must be enabled to publish via Edge. The Sitemap media items field of the Site item will list the generated sitemap(s) under /sitecore/media library/Project/<Site>/<Site>/Sitemaps/<Site>, and the default link provider is used unless overridden. Tip: you can configure a custom provider via <linkManager> and choose its name in the Sitemap settings.

Automated Sitemap Generation Workflow

When content authors publish pages, XM Cloud schedules sitemap regeneration automatically based on the refresh threshold. Behind the scenes, an OnPublishEnd pipeline (often the SitemapCacheClearer.OnPublishEnd handler in SXA) checks each site’s sitemap settings. If enough time has elapsed since the last build, a Sitemap Refresh job runs. In this job, the old sitemap media item is deleted and a new one is generated and saved in the Media Library. Once created, the new sitemap item is linked in the Sitemap media items field of the site and then published. This typically triggers two publish actions: one to publish the new media item (/sitecore/media library/Project/.../Sitemaps/<Site>/sitemap) and one to re-publish the Site item so Experience Edge sees the updated link.

For high-volume publishing, it’s best to set a reasonable refresh threshold to batch sitemap generation. For example, if you publish many pages daily, you might set the refresh threshold to 0 forcing a rebuild every time, or schedule a daily publish so the sitemap is updated once per day. Generating sitemaps can be resource-intensive especially for large sites, so avoid rebuilding on every small change unless necessary.

Sitemap Filtering: SXA provides pipeline processors to include or exclude pages. By default, items inheriting SXA’s base page templates have a Change frequency field. Setting it to "do not include" will exclude that page from the sitemap. The SXA sitemap pipelines (sitemap.filterItem) include built‑in processors for base template filtering and change-frequency logic. To exclude a page, simply open it in Content Editor (or Experience Editor SEO dialog) and set Change frequency to "do not include".

GraphQL Sitemap Query: Once published, the XM Cloud GraphQL API provides access to the sitemap media URL. For example, the following query returns the sitemap XML URL for a given site name:

query SitemapQuery($site: String!) {
      site {
        siteInfo(site: $site) {
          sitemap
        }
      }
    }

This returns the Experience Edge URL of the generated sitemap media item. You can use this in headless code or debugging to verify the sitemap’s existence and freshness.

Sitemaps in Local Docker Containers

In a local XM Cloud Docker setup, the /sitemap.xml route often returns an empty file by default because the Experience Edge publish never occurs. There is no web database or Edge target, so the OnPublishEnd process never actually runs, leaving the empty sitemap item. Attempting to publish locally throws an exception (Invalid Authority connection string for Edge). To debug or test sitemap issues locally, you can manually trigger the SXA sitemap pipeline.

I really like the Sitemap Developer Utility approach suggested by Jeff L'Heureux: in your XM Cloud solution’s Docker files, create a page (e.g. generateSitemap.aspx) inside docker\deploy\platform with code that simulates a publish event. For example, one can invoke the SitemapCacheClearer.OnPublishEnd() method manually in C#.

// Simulate a publish event for the "Edge" target
    Database master = Factory.GetDatabase("master");
    List<string> targets = new List<string> {"Edge"};
    PublishOptions options = new PublishOptions(master, master, PublishMode.SingleItem, 
        Language.English, DateTime.Now, targets);
    Publisher publisher = new Publisher(options);
    SitecoreEventArgs args = new SitecoreEventArgs("OnPublishEnd", new object[] { publisher }, new EventResult());
    new SitemapCacheClearer().OnPublishEnd(null, args);

This code triggers the same sitemap build logic as a real publish. Jeff's utility page provides buttons to run various steps (OnPublishEnd, the sitemap.generateSitemapJob pipeline, etc.) and shows output.

Once you run the utility and the cache job completes, the media item is regenerated. Then restart or refresh your Next.js site locally to see the updated sitemap at http://front-end-site.localhost/sitemap.xml. The browser will display the raw XML with <loc>, <lastmod>, <changefreq>, and <priority> entries as it normally should.

Sitemap Customization for Multi-Domain Sites

A common scenario is one XM Cloud instance serving multiple language or regional domains (say, www.siteA.com and www.siteA.fr) with one shared content tree. In SXA this is often handled by a Site Grouping with multiple hostnames. By default, SXA will generate a single sitemap based on the primary hostname. This leads to two issues: the same XML file is returned on both domains, and each page appears several times (once per language) under the same <loc>. For example, a bilingual site without customization might show both English and French URLs under the English domain, duplicating <url> entries.

To fix this, customize the Next.js API route (e.g. pages/api/sitemap.ts) that serves /sitemap.xml. The approach is: detect which host/domain the request is for, fetch the raw sitemap XML via GraphQL, and then filter and rewrite the entries accordingly. For instance, if the host header contains the French domain, only include the French URLs and update the <loc> and hreflang="fr" links to use the French hostname. Pseudocode for the filtering might look like:

if (lang === 'en') {
      // Filter out French URLs and fix alternate links
      urls = urls.filter(u => !u.loc[0].includes(FRENCH_PREFIX))
                 .map(updateFrenchAlternateLinks);
    } else if (lang === 'fr') {
      // Filter out English URLs and swap French loc to French domain
      urls = urls.filter(u => u.loc[0].includes(FRENCH_PREFIX))
                 .map(updateLocToFrenchDomain)
                 .map(updateFrenchAlternateLinks);
    }

Here, FRENCH_PREFIX is something like en.mysite.com/fr, and we replace it with the French hostname. In practice, the XML is parsed (e.g. via xml2js), then the result.urlset.url array is filtered and modified, and rebuilt to XML. There is a great solution suggested by Mike Payne which uses two helper functions filterUrlsEN and filterUrlsFR to drop unwanted entries and updateLoc/updateFrenchXhtmlURLs to replace URL prefixes. Finally, the modified XML is sent in the HTTP response. This ensures that when a sitemap is requested from www.site.ca, all <loc> URLs and alternate links point to site.ca, and when requested from www.othersite.com, they point to www.othersite.com.

SEO Considerations and Best Practices

Include Alternate Languages (hreflang): XM Cloud (via SXA) automatically adds <xhtml:link rel="alternate" hreflang="..."> entries in the sitemap for multi-lingual pages. Ensure these are correct for your domains. After customizing for multiple hostnames, the <xhtml:link> URLs should also be updated to the appropriate domain. This helps Google index the right language version for each region.
Set Change Frequency and Priority: Use SXA’s SEO dialog or Content Editor on the page item to set Change frequency and Priority for each page. For example, if a page is static, set a low change frequency. These values are written into <changefreq> and <priority> in the sitemap. Note: Pages can be excluded by setting frequency to "do not include".
Maximize Crawling via Sitemap Index: If your site has many pages, configure Maximum number of pages per sitemap so XM Cloud generates a sitemap index with multiple files. This avoids any single sitemap exceeding search engine limits and keeps crawlers from giving up on a very large file.
Robots.txt: SXA will append the sitemap link /sitemap.xml to the site’s robots.txt automatically. Verify that your robots.txt in production references the correct sitemap and hostname.
Media Items and Edge: Always keep Generate sitemap media items enabled: without having this, XM Cloud cannot deliver the XML to the front-end. After a successful build, the sitemap XML is stored in a media item and served by Experience Edge. You can confirm the published sitemap exists by checking /sitecore/media library/Project/<Site>/<Site>/Sitemaps/<Site> or by running the GraphQL query mentioned above.
Link Provider Configuration: If your site uses custom URL routing (e.g. language segments or rewritten paths), you can override the link provider used for sitemap URLs. In a patch config, add something like:
```
<linkManager defaultProvider="switchableLinkProvider">
      <providers>
        <add name="customSitemapLinkProvider" 
             type="Sitecore.XA.Foundation.Multisite.LinkManagers.LocalizableLinkProvider, Sitecore.XA.Foundation.Multisite"
             lowercaseUrls="true" .../>
      </providers>
    </linkManager>
```
Don't forget to set the "Link provider name" field in the Sitemap settings to customeSitemapLinkProvider afterwards. This ensures the sitemap uses the correct domain and culture prefixes as needed.

Diagnostics and Troubleshooting

If the sitemap isn’t updating or the XML is wrong, check these:

Site Item Settings: On the site’s Settings/Sitemap item, confirm the refresh threshold and expiration are as expected. During debugging you can set threshold to 0 to force immediate rebuilds.
Was it published to Edge? Ensure the sitemap media item was published to Edge. You might need to publish the Site item or Media Library manually if it wasn’t picked up.
Cache Type: In the SXA Sitemap settings, the Cache Type can be set to "Inactive," "Stored in cache", or "Stored in file". For XM Cloud, the default "Stored in file" is typically used so the XML is persisted. If set to "Inactive", the sitemap generator will not run.
Inspect Job History: In the CM admin (/sitecore/admin/Jobs.aspx), look for the "Sitemap refresh" jobs to see if these succeeded or threw errors.
Next.js Route Errors: If your Next.js site’s /sitemap.xml endpoint returns an error, inspect its handler. The custom API route uses GraphQLSitemapXmlService.getSitemap(). Ensure the hostnames in your logic match your ENV variables, namely PUBLIC_EN_HOSTNAME. Add logging around the xml2js parsing if the output seems empty or malformed.

By following the above patterns - configuring SXA sitemap settings, automating generation on publish, and customizing for your site topology -you can ensure that XM Cloud serves up accurate, SEO‑friendly sitemaps. This helps search engines index your content fully and respects multi-lingual domain structures and refresh logic specific to a headless architecture.

References: one, two, three and four.

Infrastructure-as-Code: best practices you have to comply with

19. May 2022 Martin Miles (0)

Infrastructure as Code (IaC) is an approach that involves describing infrastructure as code and then applying it to make the necessary changes. IaC does not dictate how exactly to write code, it just provides tools instead. A good examples are Terraform, Ansible and Kubernetes itself where you don't say what to do, rather than you dictate what state you want you infrastructure to get into.

Keep the infrastructure code readable. Your colleagues would be able to easily understand it, and, if necessary, add or test it. Looking to be an obvious point, it is quite often is forgotten, resulting in “write-only code” - the one can only be written, but cannot be read. Its author inclusive, and is unlikely to be able to understood what he wrote and figure out how it all works, even a few days afterward.

An example of a good practice is keeping all variables in a separate file. This is convenient because they do not have to be searched throughout the code. Just open the file and immediately get what you need.

Adhere to a certain style of writing code. As a good example, you may want keeping the code line length between 80-120 characters. If the lines are very long, the editor starts wrapping them. Line breaks destroy the overall view and interfere with the understanding of the code. One has to spend a lot of time just figuring out where the line starts and where it ends.

It's nice to have the coding style check automated, at least use by using the CI/CD pipeline for this. Such a pipeline could have a Lint step: a process of statistical analysis of what is written, helping to identify potential problems before the code is applied.

Utilize git repositories same way developers do. Saying that I mean developing new branches, linking branches to tasks, reviewing what has already been written, sending Pull Requests before making changes, etc.

Being a solo maintainer one may seem the listed actions to be redundant - it is a common practice when people just come and start committing. However, even if you have a small team, it could be difficult to understand who, when, and why made some corrections. As the project grows, such practices will increasingly help the understanding of what is happening and mess up the work. Therefore, it is worth investing some time into adopting some of the development practices to work with repositories.

Infrastructure as Code tools are typically associated with DevOps. As we know DevOps as specialists who not only deal with maintenance but also help developers work: set up pipelines, automate test launches, etc. - all the above also applies to IaC.

In Infrastructure as Code, automation should be applied: Lint rules, testing, automatic releases, etc. Having repositories with let's say Ansible or Terraform, but rolled out manually (by an engineer manually starting a task) is not that much good. Firstly, it is difficult to track who launched it, why, and at what moment. Secondly, it is impossible to understand how that worked out and draw conclusions.

With everything kept in the repository and controlled by an automatic CI/CD pipeline, we can always see when the pipeline was launched and how it performed. We can also control the parallel execution of pipelines, identify the causes of failures, quickly find errors, and much more.

You can often hear from maintainers that they do not test the code at all or just first run it somewhere on dev. It's not the best practice, because it does not give any guarantee that dev matches prod. In the case of Ansible or other configuration tools, standard testing could be something as:

launched a test on dev;
rolled on dev, but crashed with an error;
fixed this error;
once again, the test was not run because dev is already in the state to which they tried to bring it.

It seems that the error has been corrected, and you can roll on prod. What will happen to prod? It is always a matter of luck - hit or miss, guess or miss. If somewhere in the middle, something falls again, the error will be corrected and everything will be restarted.

But infrastructure code can and should be tested. At the same time, even if specialists know about different testing methods, they still cannot use them. The reason is that Ansible roles or Terraform files are written without the initial focus on the fact that they will need to be tested somehow.

In an ideal world, at the moment of writing a code developer is aware of what (else) needs to be tested. Accordingly, before starting to write a code, developer plans on how to test it, commonly know as TDD. Untested code is low-quality code.

The same exactly applies to infrastructure code: once written, you should be able to test it. Decent testing allows to reduce the number of errors and make it easier for colleagues who will finalize your roles on Ansible or Terraform files.

A few words about automation. A common practice when working with Ansible is that even if something could be tested, there is no automation to it. Usually, this is a case when someone creates a virtual machine, takes some role written by colleagues, and launches it. Afterward that person relizes the need to add certain new things to it - appends and launches again on the virtual machine. Then he realizes that even more changes are equired and also the current virtual machine has already been brought to some kind of state, so it needs to be killed, new virtual machine reinstantstiated and the role rolled over it. In case something does not work, this algorithm would have to be repeated until all errors are eliminated.

Usually, the human factor comes into a play, and after the N-th number of repetitions, it becomes too lazy deleting the VM and re-creating it again. Once everything seems to work exactly as it should (this time), so one seems could freeze the changes and roll into the prod environment. But reality is that errors could still occur, that is why automation is needed. When it works through automated pipelines and Pull Requests are used - it helps to identify bugs faster and prevent their re-appearance.

Things beginners get incorrect about Kubernetes

13. October 2021 Martin Miles (0)

On start playing with Kubernetes, one may face with one of the biggest delusions considering the K8S will work in the same way for both the development or testing environment.

But It won't!

When it comes to containers in general and Kubernetes specifically, there is a big difference between occasional runs in a labs-alike conditions and in full production lifecycle. That is similar to a difference between just starting an app and long term running it full security and reliability enabled.

Not a Kubernetes exclusive problem, but is true for the entire variety of containers and microservices. Spin-up a container comes as relative simple task, while scaling containers as containerized microservices in the production turns to be more complicated.

Although Kubernetes has alternatives, it has quickly become a de-facto standard for orchestration. However there is a difference between launching K8S in a sandbox compared to a full production environment.

Delusion #1. Running containers with Kubernetes in the development or testing environment ensures that your operational needs will be satisfied.

The truth: the launch of Kubernetes in the development or testing environment allows cutting the corners, simplify things and not to bother with the operational load, which one faces when going live to Prod. Ops and safety considerations will become major areas of differences between K8S running in prod and in the development / testing environments. Failing a cluster in the labs conditions does not bring any losses.

For me it looks like a compromise between an agility and reliability: devs use containers to achieve flexibility while working with apps when developing and testing the code does its purpose. While the ops need to provide reliability, scaling, performance and safety provided by a sustainable, industry-proven platform. They are looking for a deployment automation for the clusters to ensure the repeatability and consistency. It also helps when restoring the system.

Versioning is also critical for operations. As far as possible, you need enabling versioning everywhere, including services deployment configuration, policies and infrastructure (applying the infrastructure-as-a-code approach). That results in environments becoming repeatable. As a good practice, avoid "latest" image versions, in order to avoid configuration drift effect.

Delusion #2. Both reliability and security got provided with Kubernetes

In reality: when using Kubernetes at non-production environments only, most unlikely reliability and security got provided, at least initially. Do not get discouraged, you will be there: it's a matter of designing an architecture before switching to the Prod.

Obviously, performance, scaling, availability and safety requirements are much higher in prod environments. This It is important to plan these requirements for the deployment of K8S into architecture, as well as build scaling and security plans into Helm-charts, etc.

But how could running a cluster in dev/testing environments lead to a false confidence?

This is common for non-production environments having all network connections open. It is acceptable that any service can refer to any other service: open connections are the defaults for Kubernetes. However such an approach is an evil practice for production environments and can lead to downtime. It also exposes larger areas for potential attack and increases threats to business.

When it comes to containers / microservices, one needs spending bigger effort for creating a highly available and reliable system. Orchestration itself helps a lot but isn't a "silver bullet", same applies to security. We will have to work hard to protect Kubernetes and reduce the surface of the attack. It is very important using RBAC with minimal privileges and enforce network policies, leaving only those channels services indeed use.

Also vulnerabilities of container images can rapidly turn ops into a critical state, while on development / testing environments this danger may absent at all. Pay attention to the base images used for building your containers: as far as possible, use trusted official images, or build your own. The last thing you want happening for your Kubernetes cluster is helping someone mining crypto coins.

It is recommended to refer to the security of containers as a ten-level system covering the container stack (host and registries), as well as questions related to the life cycle of containers (for example, API management).

Delusion #3. Orchestration makes scaling a formality

Although Kubernetes considered being a completely necessary tool for scaling containers, it will be delusted to think that orchestration immediately sorts out scaling needs for the production environment. The volume of data at live environments is times more, please also keep in mind that monitoring may also need scaling. With increasing volumes, everything changes.

It is impossible to ensure all K8S components implementing the interfaces correctly until you spin-up the prod: determining Kubernetes "working normally", and the API server and other controlled components get scaled according to your needs.

As I say, the development and testing environments go much easier. In local environments it is easy skipping basics like defining the right resources and restrictions for requests. Avoiding that can collapse you prod once later.

Scaling the cluster both directions is a good example when the task goes easy locally, being clearly complicated at production: scaling prod clusters is more difficult than clusters for development/testing.

While Kubernetes makes it relatively simple scaling horizontally, DevOps still need keeping in mind some nuances, especially when it comes to maintaining services live when scaling an infrastructure. It is crucial to ensure that the main services, as well as a system monitoring and security alerts, were distributed across the cluster nodes and do work with stateful volumes so that data not being lost on scaling down.

Again, it all comes to proper planning and resources available. You need not just understand your needs for scaling when planning but most importantly - test them. Your production environment must be capable for handling much higher loads.

Delusion #4. Kubernetes works everywhere equally that same

In reality: differences in work in another environment may vary similar to those differences between running Kubernetes on the developer's laptop and prod server. The reality is that there may be serious differences depending on the vendor .Many believe that if the K8S works locally, it will work in any operational environment.

Local environments commonly miss important components required by prod environments: monitoring, logging, certificate management and credentials. You need to keep that in mind, as that is another problem raised from a difference between prod and development/testing environments.

However, that isn't Kubernetes exclusively, but applies to containers/microservices in general, especially in multicloud and hybrid cloud setups. Those Kubernetes implementations are more complicated than it seems initially, as many of the mandatory services are proprietary, like load balancing and firewalls. A container that works well locally may work unprotected (may not start at all) in the cloud with another setup of tools. Therefore, SERVICE MESH technologies like Istio attract so much attention. They guarantee the availability wherever your container works, so you do not need to think about infrastructure - which is the main reason for using containers.

I hope you can reach safer and more reliable production environments with Kubernetes keeping the above in mind!

Sitecore Cheat Sheets

20. November 2017 Martin Miles (0)

While working with Sitecore, you have to keep in mind plenty of things, few of them not easy to remember. To simplify my life as a developer, I have created a compilation of various cheat sheets for various aspects of work with Sitecore. Having that printed on standard A4 papers I found that not quite convenient, and decided to re-create in more convenient format. Then I thought why not to issue that in a portable book format, and share with hundreds of people it may help.

Table of Content:

Sitecore Item API
ULR parameters
Sitecore queries
Sitecore PowerShell
Content Search API
Admin folder pages
Security
Databases
Mongo
Configuration
Core database
Rules engine
Helix / Habitat
Razor view extensions
Glass Mapper
Azure
Going live
ReSharper most used hotkeys
IIS and AS.NET
Icons in Sitecore

This is bare minimum I have already completed, however, if there are more of interesting topic - please let me know.

Sitecore Boilerplate - the repository of best practices all at the same place

24. July 2015 Martin Miles (0)

I decided to create an ultimate "boilerplate" solution for Sitecore, implementing all the best Sitecore practices in one place, well documented and cross-linked with the support on this blog.

As a multi-language website with Experience Editor (ex. Page Editor) support utilizing with Glass Mapper, Lucene indexes and test-driven codebase and much more working well all together - it will be a perfect place for newbies to familiarize themselves with Sitecore platform. It aims also to simplify work of more senior Sitecore developers in terms of quickly searching for desired features and grabbing them into their working solutions.

The project originated out of my R&D activities as I decided it would be beneficial to share my workouts with Sitecore community. Any suggestions, comments and criticism are highly welcome!

List of the features I desire to supply into Sitecore boilerplate:

Support for Page Editor
Usage of Glass Mapper for ORM purposes
Unit testable code
Synchronization of user-editable content from CD environment to CM and further re-publish to the rest of CDs
Support for multi-language environment
Custom Lucene indexes
Custom personalisation of components and data
Workflows based on user permissions
Make all mentioned above working together as a solid and stable website
Implement new Sitecore 8 marketing features on top of that

.. for the moment I have planned and implemented several of mentioned features as a starting point, so it is coming soon on GitHub and further blog posts here.