Experience Sitecore! | All posts tagged 'SEO'

Experience Sitecore!

Martin Miles on Sitecore

Evolutional approach to Next.js and its modes

As you've might hear, Sitecore has chosen Next.js to be used along with its JSS SDK. But what makes Next that great tool for most of us switching to a new paradigm of development for Sitecore? In this blog post, I'll go through Sitecore development evolution, starting with a review of Sitecore development progressing with time.

Old school development

A decade ago, we used classical ASP.NET WebForms to render a page on a server and pass it to the client. The whole idea of WebForms was faulty as it tried to mimic the event-based model of desktop development to make web development feel familiar to them. That was at cost of ignoring the stateless nature of HTML, creating weird ugly abstractions (ie. ViewState and EventValidation).

It was later made obsolete with an MVC approach which turned ASP.NET web development to what it should be in a better world: no state and events abstractions, server controls, and Master pages. Proper separation of code and markup (which itself went better and readable with the introduction of Razor views). It all benefited from an MVC architecture, proven with other web technologies, such as Ruby on Rails. Moreover, the implementation allowed extensibility at always every lifecycle of web request while ASP.NET MVC going open-source allowed writing your code aligned with the exact implementation of the framework.

MVC made a great step ahead and stayed a default way of making sites with Sitecore for as long as 5-8 years. Being so close to a raw request was a great strength but at a cost of having a lot of repetitive activities.

The introduction of SXA fixed most of these issues by strongly relying on Sitecore PowerShell for addressing most things that should and were in fact automated. The overall developers and editors' experience has improved with SXA due to the introduction of Page and Partial Designs, powerful components adjustable with rendering variants, most popular grid systems support, flexible search and SEO tools.

SXA was great in most aspects, except the one but most important - it was still based on top of MVC. That means web pages were generated at the server by rendering content into HTML views. Or in other words, it was not headless...

Headless

Meanwhile, the world of front-end development has experienced massive growth and after half-a-decade craziness of JS frameworks appearing one after another, a triad of winners stood out: React, Angular, and Vue. Those good old days of using jQuery came to an end giving way to industry-proven frameworks with the bigger feature sets and revised architecture that suits modern web development.

With time It became even harder and harder to split work between back-end and front-end teams (as for full-stack guys most of them tend to choose either side). Even bigger efforts have been spent on unwanted work of merging FE and BE teams in sync, which could not last long as both sides were struggling from that situation.

The headless approach was the right answer resolving all those issues with JSS being Sitecore response for that.

With the release of JSS, it became possible to separate BE and FE in a way that page each side becomes responsible for only its own duties. Front-end becomes free of previous limitations and could use React / Vue / Angular as much as they wanted. They did not need to use a heavily loaded web server with Sitecore for generating HTML pages - a new component called Rendering Host did that job exclusively for them. The only interaction with the back-end left was receiving just the necessary data asynchronously thanks to Layout Service and GraphQL.

NOTE: Actually headless means anything can consume the data from back-end services, not just FE frameworks. That is well done in ASP.NET Core renderings as an alternative option for headless implementation for Sitecore.

Client-Side Rendering

With a typical non-Sitecore single-page application the webserver firstly sends the browser an HTML page being in some initial state. Once that page gets loaded, the browser executes its JavaScript code which raises an asynchronous request(s) to an API endpoint in order to get actual data. As the user progresses with this app, more requests are sent by the browser, which will partially update content on a page without the whole page reloading from a web server. This approach is known as Client-Side Rendering, CSR and it brings lots of advantages such as apps responding faster and reducing traffic between client and server.

What's wrong with single-page applications?

Since single-page apps only load an initial HTML page once, this is the same as what search engine bots get. They struggle to obtain follow-up data from APIs and cannot index the page. Also without page reload the URL reaming the same and it can vary by only appending a #-anchor to a page URL. Often these URLs cannot be correctly processed when called directly.

Next.js

To address the above we have Next.js - a framework for statically generated and server-rendered React applications that opens up a lot of possibilities for developers: creating ready-to-use, zero-configuration applications, code separation, static HTML exporting, better UX, faster performance, and more. You can see many of its features below:

Next.js will ensure SEO without any extra actions from users beyond creating an application. Just to make clear, that results not from Next.js specifically, but from server-side rendering.

Once can do some SEO reports with Lighthouse even at earlier stages as you begin building your application.

But that still wasn't that....

SSG challenge

The idea behind Jamstack is truly attractive: instead of serving webpages in real-time (even when taking those from a cache), the webpages are already pre-rendered and deployed to CDN being globally accessible immediately upon publishing. In a simple scenario, one does not even have to keep a running server up as the traffic never reaches it going to CDN. Static content is fast, resilient to downtime, and gets indexed immediately by crawlers.

This approach however has some issues.

Let's think about a huge site with millions of pages. Deploying such a site may last hours rather than minutes due to static pages generation and the number of files to process. An increasing amount of content means increasing generation time. It seems to be reasonable to re-generating only those pages been updated, but it is only a small part of the solution (deployment becomes complicated and even one character change in a common part like a header will still make you process all the pages).

ISR

That is where Incremental Static Regeneration (ISR) comes into play. ISR is a new evolution step for Jamstack. Next.js allows you to create or update static pages beyond you’ve built a site. Incremental Static Regeneration enables developers and content editors to use static-generation on a per-page basis, without needing to rebuild the entire site. With ISR, you can benefit best from both worlds while scaling to millions of pages.

The principle difference is that now Static pages could be generated on-demand at runtime. The developers' job is now deciding which portion of pages you pre-generate, i.e. well known 80/20 Pareto's Law where 80% of traffic is served by only 20% of pages, while the other 80% of pages get the remaining 20% of traffic.

So it makes good sense to pre-generate that heavily used 20 % of pages. How to know which pages or sections to go through? You've got an arsenal of tools like analytics, A/B testing, alternative metrics - in any case, you got the flexibility to make your own tradeoff on build times, as the image below compares:

With being given a choice now, developers can define options A or B and choose between them: Selecting option A build time gets faster, while option B generates more pages.

This becomes crucial when working on large eCommerce implementations or headless CMSs such as Sitecore.

How that works

ISR relies on that same API being used for static sites generation getStaticProps. The difference is that by setting revalidate parameter to 60 we make Next.js using ISR for a page. Here's how the request goes with ISR:

  1. With Next.js one can define a revalidation time per page (ie. 60 seconds)
  2. The initial request to a product page will return the cached page with the original price
  3. At this stage, someone makes changes into a product data, affected in the database changes
  4. All requests to the page after the initial request but before 60 seconds are returned immediately as are cached.
  5. After a given 60-second window, the following request will still show the cached (old) page. But Next.js triggers background regeneration of that page. Once completed, it will update a cache for that single page or keep an old cached page upon a background regeneration failure.

Finding a compromise

Since all the sites vary by volume, audience, purpose, and internal architecture - there's no a silver bullet to cover them all with a universal solution. That is why Next.js is end-user-centric, offering developers shifting between solutions without leaving the bounds of the framework. It's for you to choose the right tool for a project.

Edge caching

In certain cases, ISR is not the best option, like some apps where live data display is crucial. Those would be better handled with server rendering, with some option of own Cache-Control headers with surrogate keys to invalidate content. Server rendered pages could get cached at some edge servers. With a hybrid framework, one can make own tradeoff and still stay within the framework.

SSR with edge server caching may look similar to ISR (especially with stale-while-revalidate headers for cache control).

The major difference comes from the way of handling the first request. With ISR it returns a statically rendered page that ensures the user will see a page even in case of API connectivity loss or database failure. SSR allows setting the pages depending on the specific features of requests.

One thing to care about in that case is using SSR whiteout caching may affect the performance as every millisecond of wait is important. In addition SSR with no cache badly impacts the TTFB metric (Time to First Byte) being used by Lighthouse.

In addition to that, ISR is not beneficial for small websites. That is reasonable if build time for the whole site is times lower than the revalidation parameter - just use classic SSR instead.

ISP fallback options

This is an important parameter with two potential options. When working with data that is fast to retrieve it makes sense using fallback: blocking. In that case, you do not need to display using a temporal "in progress" page while the data retrieval. That will guarantee users see the right page regardless of it is cached or not.

For uncertain or slow loading data the above approach will affect UX badly, therefore setting fallback: true makes an immediate display of the "please wait" page while data is processed.

SEO is the cause

SEO (search engine optimization) is a set of techniques (and even unobvious tricks) for changing your site in order to attract higher traffic from search engines. In order to increase the site's search rate, one needs to keep in mind many of them, such as:

Visitors won’t wait an eternity until your page loads. Performance is actually a crucial factor for SEO and therefore should be the main concern when building an app. In addition to FTFB (mentioned previously), there is another important parameter abbreviated as FCP (First Contentful Paint). Google uses FCP as a key metric for performance - FCP directly affects SEO rating. You can read more about improving FCP.

With Next.js you can analyze FCP and LCP (Largest Contentful Paint - time used for major content shown) by creating App component with a reportWebVitals function:

// pages/_app.js
export function reportWebVitals(metric)
{
  console.log(metric)
}

Once these parameters get calculated reportWebVitals function is called with all the metrics for you to log and analyze. Follow this link for more details about measuring performance with Next.js

I hope this post gives an overall highlight on the rendering evolution from the old days till ISR and nuances choosing them with Next.js.

Yet another SXA rendering variant - Script Reference Tag coming to improve your SEO

Note! The code used in this post can be cloned from GitHib repository: SXA.Foundation.Variants

I previously wrote a post about having a rendering variant holding an inline JavaScript one might need along adding some basic JS functionality into your components. 

This is useful when you're early developing your pages and have no possibility or capacity of recompiling entire frontend and updating Creative Exchange package into your solution because of adding/changing few lines; however, given approach is not SEO-friendly as search engines penalize sites for excessive inline scripts and styles. So use it considering to be technical debt, that should be addressed prior to going to production.

The very minimal change one can do is to replace the inline script with a reference to that same script stored in Media Library - same that SXA does itself with themes. This blog post below reveals an approach:

Firstly, create a template:

Then reference given template IDs within Constants.cs file:

using Sitecore.Data;

namespace Platform.Foundation.Variants.Pipelines.VariantFields.ScriptReferenceTag
{
    public static partial class Constants
    {
        public static partial class RenderingVariants
        {
            public static partial class Templates
            {
                public static ID ScriptReferenceTag { get; } = new ID("{0EC036D7-384D-4CF6-AD1F-FE949E96126A}");
            }

            public static partial class Fields
            {
                public static class ScriptReferenceTag
                {
                    public static ID ScriptMedia { get; } = new ID("{F1497AF9-7DD3-4B38-BE22-5F092007F929}");
                }
            }
        }
    }
}
Model class, having just one property that stores a GUID of a referenced script from Media Library 
using Sitecore.Data.Items;
using Sitecore.XA.Foundation.RenderingVariants.Fields;

namespace Platform.Foundation.Variants.Pipelines.VariantFields.ScriptReferenceTag
{
    public class VariantScriptReferenceTag : RenderingVariantFieldBase
    {
        public string ScriptMedia { get; set; }

        public VariantScriptReferenceTag(Item variantItem) : base(variantItem)
        {
        }
    }
}
Parser:
using Sitecore.Data;
using Sitecore.XA.Foundation.Variants.Abstractions.Pipelines.ParseVariantFields;

namespace Platform.Foundation.Variants.Pipelines.VariantFields.ScriptReferenceTag
{
    public class ParseScriptReferenceTag : ParseVariantFieldProcessor
    {
        public override ID SupportedTemplateId =>  Constants.RenderingVariants.Templates.ScriptReferenceTag;
        
        public override void TranslateField(ParseVariantFieldArgs args)
        {
            ParseVariantFieldArgs variantFieldArgs = args;

            var variantHtmlTag = new VariantScriptReferenceTag(args.VariantItem) { Tag = "script" };
            variantHtmlTag.ScriptMedia = args.VariantItem[Constants.RenderingVariants.Fields.ScriptReferenceTag.ScriptMedia];
            variantFieldArgs.TranslatedField = variantHtmlTag;
        }
    }
}
Renderer:
using System;
using Sitecore.Data;
using System.Web.UI.HtmlControls;
using Sitecore.XA.Foundation.RenderingVariants.Pipelines.RenderVariantField;
using Sitecore.XA.Foundation.Variants.Abstractions.Pipelines.RenderVariantField;
using Sitecore.Resources.Media;

namespace Platform.Foundation.Variants.Pipelines.VariantFields.ScriptReferenceTag
{
    public class RenderScriptReferenceTag : RenderVariantField
    {
        public override Type SupportedType => typeof(VariantScriptReferenceTag);

        public override void RenderField(RenderVariantFieldArgs args)
        {
            var variantField = args.VariantField as VariantScriptReferenceTag;
            if (variantField != null)
            {
                var id = variantField?.ScriptMedia;
                if (string.IsNullOrWhiteSpace(id))
                {
                    return;
                }

                var scriptItem = Context.Database.GetItem(new ID(id));
                if(scriptItem == null)
                {
                    return;
                }

                var url = MediaManager.GetMediaUrl(scriptItem);

                var tag = new HtmlGenericControl(variantField.Tag);
                tag.Attributes.Add("type", "text/javascript");
                tag.Attributes.Add("defer", String.Empty);
                tag.Attributes.Add("src", url);

                args.ResultControl = tag;
                args.Result = RenderControl(args.ResultControl);
            }
        }
    }
}

Example of usage:

This rendering variant field generates the following output:

<script src="/-/media/Project/Platform/Other/Scripts/Header-script.js" type="text/javascript" defer="" ></script>

This approach works perfectly well. But once again for a second, have you ever considered moving such scripts into a Theme along with related component (if any) instead of leaving it like that? Hope this helps!

Creating XML Sitemap for the Helix solution

I am working on a solution that already has HTML sitemap as a part of Navigation feature. Now I got a request to add also a basic XML sitemap with common set requirements. Habitat ships with an interface template _Navigable, so let's extend this template by adding a checkbox field called

ShowInSitemap, stating whether a particular page will be shown in that sitemap:


In order to start, we need to create a handler. Having handlers in web.config is not the desired way of doing things, it will require also doing configuration transform for the deployments, so let's do things in a Sitecore way (Feature.Navigation.config file):

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
    <sitecore>
        <pipelines>
            <httpRequestBegin>
                <processor type="Platform.Feature.Navigation.Pipelines.SitemapHandler, Platform.Feature.Navigation"
                           patch:before="processor[@type='Sitecore.Pipelines.HttpRequest.CustomHandlers, Sitecore.Kernel']">
                </processor>
            </httpRequestBegin>
            <preprocessRequest>
                <processor type="Sitecore.Pipelines.PreprocessRequest.FilterUrlExtensions, Sitecore.Kernel">
                    <param desc="Allowed extensions">aspx, ashx, asmx, xml</param>
                </processor>
            </preprocessRequest>
        </pipelines>
    </sitecore>
</configuration>

We rely on httpRequestBegin pipeline and incline our new SitemapHandler from Navigation feature right before CustomHandlers processor.

SitemapHandler is an ordinary pipeline processor for httpRequestBegin pipeline, so is inherited from HttpRequestProcessor:

    public class SitemapHandler : HttpRequestProcessor
    {
        const string sitemapHandler = "sitemap.xml";

        private readonly INavigationRepository _navigationRepository;

        public SitemapHandler()
        {
            _navigationRepository = new NavigationRepository(RootItem);
        }

        public override void Process(HttpRequestArgs args)
        {
            if (Context.Site == null 
                || args == null
                || string.IsNullOrEmpty(Context.Site.RootPath.Trim()) 
                || Context.Page.FilePath.Length > 0 
                || !args.Url.FilePath.Contains(sitemapHandler))
            {
                return;
            }

            Response.ClearHeaders();
            Response.ClearContent();
            Response.ContentType = "text/xml";

            try
            {
                var navigationItems = _navigationRepository.GetSitemapItems(RootItem);
                string xml = new XmlSitemapService().BuildSitemapXML(flatItems);

                Response.Write(xml);
            }
            finally
            {
                Response.Flush();
                Response.End();
            }
        }

        private Item RootItem => Context.Site.GetRootItem();

        private HttpResponse Response => HttpContext.Current.Response;
    }

And XmlSitemapService code below:

    public class XmlSitemapService
    {
        public string CreateSitemapXml(IEnumerable<NavigationItem> items)
        {
            var doc = new XmlDocument();

            var declarationNode = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
            doc.AppendChild(declarationNode);

            var urlsetNode = doc.CreateElement("urlset");

            var xmlnsAttr = doc.CreateAttribute("xmlns");
            xmlnsAttr.Value = "http://www.sitemaps.org/schemas/sitemap/0.9";
            urlsetNode.Attributes.Append(xmlnsAttr);
            doc.AppendChild(urlsetNode);

            foreach (NavigationItem itm in items)
            {
                doc = CreateSitemapRecord(doc, itm);
            }
            return doc.OuterXml;
        }

        private XmlDocument CreateSitemapRecord(XmlDocument doc, NavigationItem item)
        {
            string link = item.Url;

            string lastModified = HttpUtility
             .HtmlEncode(item.Item.Statistics.Updated.ToString("yyyy-MM-ddTHH:mm:sszzz"));

            XmlNode urlsetNode = doc.LastChild;

            XmlNode url = doc.CreateElement("url");
            urlsetNode.AppendChild(url);

            XmlNode loc = doc.CreateElement("loc");
            url.AppendChild(loc);
            loc.AppendChild(doc.CreateTextNode(link));

            XmlNode lastmod = doc.CreateElement("lastmod");
            url.AppendChild(lastmod);
            lastmod.AppendChild(doc.CreateTextNode(lastModified));

            return doc;
        }
    }
Also, NavigationItem is a custom POCO:
 
public class NavigationItem
{
    public Item Item { get; set; }
    public string Title { get; set; }
    public string Url { get; set; }
    public bool IsActive { get; set; }
    public int Level { get; set; }
    public NavigationItems Children { get; set; }
    public string Target { get; set; }
    public bool ShowChildren { get; set; }
}


Few things to mention.
1. Since you are using LinkManager in order to generate the links, you need to make sure you have full URL path as required by protocol, not the site-root-relative path. So you'll need to pass custom options in that case:

2. Once deployed to production, you may face an unpleasant behavior of HTTPS links generated along with 443 port number (such as . That is thanks to LinkManager not being wise enough to predict such a case. However there is a setting that make LinkManager works as expected. Not obvious
var options = LinkManager.GetDefaultUrlOptions();
options.AlwaysIncludeServerUrl = true;
options.SiteResolving = true;
LinkManager.GetItemUrl(item, options);

or better option in Heliix to rely on Sitecore.Foundation.SitecoreExtensions:

item.Url(options) from


//TODO: Update the code with the recent



That's it!

Sitecore with SEO: overview and compare ways for managing duplicate content

In this blog post I decided to cover all ways of managing duplicate content in Sitecore and overview possible ways of dealing with that with emphasis on SEO. So, we have the following options to consider:

  1. Duplicates
  2. Clones
  3. Proxies
  4. Aliases
  5. IIS URL Rewrite module
  6. Sitecore Redirect Module
  7. External Reverse Proxy

1. Duplicates are commonly known and most straightforward way of creating duplicates (clones) of the items. The easiest way to perform that operation is to right click the item you'd want to copy, select Duplicate from context menu and specify new item's name.


This ends up with an entirely independent new item (and all its ancestors) located at the same level, including all field values, presentation details, permissions etc. Beware the locks and workflows - those also would be exact match of those original items have. After that, new item lives it own life and is no way synchronized to its original prototype (except Standard Values, for sure, as both new and duplicate items share the same template).

Also it's worth of mentioning Copy To - this brings similar behavior, but allows to create duplicates keeping same name but at other paths rather than original item. Copy To is available from the same context menu.


2. Clones are sort of similar to duplicates with the difference that no new item is created when using clones. To create a clone for a highlighted item from a Sitecore tree, select Configure tab, hit Clone and specify where you'd want to locate your clone.


Notice, that clones are displayed in content tree in a slightly light font color, I personally think that may create some future issues when business users may perform some actions on item without realizing that item is a clone. Why is that important to know? Let's view the way clones function on a lower level.

When you create a clone the item and the values are not physically copied. Instead, the inheritance similar to the one between Standard Values (that is sort of prototype item for a template) and real template item, is created (clone inherits not from s.v. but from original item). When you modify a filed value of original item, that would affect same field of cloned item. However the reverse process, when you modify a field value on cloned item, overrides that individual field value and it is no more tied to original item's field. Other fields of the same item will still keep the reference to their originals. Clones use the __Source field of the Advanced section from standard template to specify the cloned item:


Unlike duplicates, clones do not clone most of standard fields (those coming from standard template) like locks / workflows and statistics (created, updated, revision). But they do clone security settings, which, again, can be overridden for a clone item.

If you want to get rid of clone item - there are 2 ways to do that: just delete the clone (obvious) and unclone it. Uncloning turns cloned item into a normal item and copies field values from originals. Clones exist only in master database, when you perform publishing to CD servers - uncloning takes place there.

You also can do some crazy things like creating clones of clones - inheritances chain take place in this case; each field at each level can be overridden, for sure.

To get even more understanding on how clones work in Sitecore I recommend reading Cloning What Ifs article.


3. Proxies is another mechanism of creating and managing duplicate content in Sitecore. The are frequently used in cases similar when you have an item that you may want to be a child of multiple parent items. In order to use them you must ensure a config file setting called proxiesEnabled is true; then you create proxy items at /Sitecore/System/Proxies based on /System/Proxy template. However, proxies considered to be outdated in favor of Clones. Please do not use Proxies!


4. Aliases are the different beast. They are perfectly good for promos and campaigns as the normally specify a quick URL for campaign landing page. Aliases have out-of-box limitation that they are set only per root level and not multisite-friendly (however there is a link that explains how to implement that feature on your own).

Aliases are defined under /sitecore/system/Aliases based on the System/Alias template.

There is just one field in alias template that allows to select target item.

There are two more overheads when working with aliases - sometimes you may need to identify if an item is alias:

bool isAlias = Context.Database.Aliases.Exists(path);

Also you may need to set canonical URLs on them to improve SEO. Good way of doing that is:

public class AliasResolver : Sitecore.Pipelines.HttpRequest.AliasResolver
{
    public override void Process(HttpRequestArgs args)
    {
        base.Process(args);

        if (Context.Item != null)
        {
            args.Context.Items["CanonicalUrl"] = Context.Item.GetFullUrl(args.Context.Request.Url);
        }
    }
}

Also, do not forget to publish your aliases to content delivery databases, as they won't work until published.



5. IIS URL Rewrite module is probably most functional option, it is external to Sitecore, that means it happens before routing and before pipelines.

For the drawbacks of using IIS URL Rewrite I would mention that you'd need to have access to IIS Manager or web.config write permission on each of content delivery servers. I previously wrote a blog post IIS URL Rewrite module - few SEO tricks that can demonstrate how powerful it is.

Also I would beware you of some specific Sitecore URLs and create appropriate extensions (ex. for WebResource.axd - take from real code).



6. Sitecore Redirect Module is another good choice as it does perfect server side 301 redirect for both URLs and items. It is almost as powerful as IIS URL Rewrite, but because it is configured in Sitecore - you do not need to have CD environments access at all - just create and publish redirect rules (as you normally do with generic content) - they will take effect immediately! Module is transparent to multi-site configuration, it can do redirects from one site's URL or item to another.

One more advantage of the module - availability of source code, so functionality can be extended to any bespoke requirement, also it becomes compatible with new Sitecore versions by just rebuilding it with appropriate Sitecore.Kernel.dll and replacing updated module DLL in webroot bin folder.

The only drawback, probably, is that in default state it performs only 301 redirects (however you may implement whatever you require). Please remember, that 301 requests are cached by browser -so you you are testing it intensively - you may need to purge browser cache from time to time.


7. External Reverse Proxy can be another option. It can do not only rewrites to external websites, but also rewrite some requests to alternative internal URL and pass that to IIS as "given" and further down to Sitecore. I met such scenarios several times on projects I took part. By the way, did you know that IIS can also serve as reverse proxy?

Performing rewrites and URL resolving logic outside of Sitecore can be both advantageous and disadvantageous. What traps does it bring?

Well, imagine you are new developer who start working on a new working copy of source code. When you run locally you may have different URL patterns compared to those on production environment. Business users usually deal with external production URLs and do not know internal structure, so that is how they form tasks and change requests. If you are not enough lucky to have comprehensive documentation or senior colleagues who can explain how is that configured - you may end up in multiple puzzling hours of attempting to find and match URLs from different environments.

Also, SEO much relies on sitemaps, so if you are using dynamic sitemaps - you need to implement that custom URL resolving logic that you have on reverse proxy. Also Sitemap Module from the Market would not work for you in that case.


I hope this article helped you to understand you options are with their pros and cons and to pick up a proper implementation depending on exact scenario.

IIS URL Rewrite module - as reverse proxy with links rewrite

Not many people know that IIS itself can serve as Reverse Proxy, with rewriting URLs on-the-fly. We are going to take a look on how to configure that feature. Let's assume we have 2 websites - primary website that has URL http://test2/ and is a hosted by IIS, moreover there is an instance of Sitecore installed; and another external static website that has URL http://external/ and it has few static pages and resources. For this experiment I got external website hosted at the same IIS instance, while in reality it can be literary anything and anywhere.


Apart from having IIS, you will need the following prerequisite:

- URL Rewrite Module installed, version 2.0

- Application Request Routing version 2.0


The easiest way to get all the prerequisites is to install them through Web Platform Installer. It will install all of them so you'll just need to have IIS refreshed and get ready to start.



External website contains static.html file with the following code

<div>
    img/sitecore.png<br>
    <img src="img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    /img/sitecore.png<br>
    <img src="/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    http://external/img/sitecore.png<br>
    <img src="http://external/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<p>
    <a href="sitecore.zip">sitecore.zip</a><br>
    <a href="/sitecore.zip">/sitecore.zip</a><br>
    <a href="http://external/sitecore.zip">http://external/sitecore.zip</a><br>
</p>

This code has 3 images and 3 links to an archive file, each of them is either relative link (from the doc level, for sure) or absolute link (from web root) or fully qualified link including domain name and protocol. This HTML renders renders into the following screenshot:


Our objective is to have a "virtual" "folder" called ext on the test2 website so that it "mapped" to external website and also correctly "maps" and rewrites all the resources of external website on resulting page.

Example:

When we hit http://test2/ in browser - we get default Sitecore page as it is provided by Test2 website, as normally.

When we hit http://test/ext/static.html - we get the page at that URL but with the content of external/static.html page with all links and references rewritten to be test2/ext/*.* instead of external/*.*

So, to make IIS Rewrite work as reverse Proxy, let's do the following steps:


Make sure "Enable proxy"is checked, otherwise nothing will work.


In URL Rewrite section, click "Add Rule(s)" link, then from popup screen select "Reverse Proxy" and specify the rule. Also check outbound rules as the are rules that factually rewrite internal links. Please note that this function may add some overhead to your website performance.


After you specify the rules - one inbound and 2 outbound (they are shown below) - reverse proxy now functions and you may verify that by requesting the following ULR (as on the screenshot below):


Notice, that all links and images look correct, as the were before. To ensure they were rewritten correctly, let's view the source file of resulting page. Here is it:

<div>
    img/sitecore.png<br>
    <img src="img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    /img/sitecore.png<br>
    <img src="http://test2/ext/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    http://external/img/sitecore.png<br>
    <img src="http://test2/ext/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<p>
    <a href="sitecore.zip">sitecore.zip</a><br>
    <a href="http://test2/ext/sitecore.zip">/sitecore.zip</a><br>
    <a href="http://test2/ext/sitecore.zip">http://external/sitecore.zip</a><br>
</p>

As there were no need to rewrite relative URLs - they remain untouched. However root-folder URL and full URL were rewritten to satisfy new domain name and desired folder-path.

And finally, here is resulting configuration that makes it all work. Whatever we have previously configured is stored in the configuration file within system.webserver node in rewrite section:

<rewrite>
      <rules>
        <clear></clear>
          <rule name="ReverseProxyInboundRule2" stopprocessing="true">
            <match url="(ext)/(.*)?"></match>  
              <conditions>
                  <add input="{CACHE_URL}" pattern="^(https?)://"></add>
              </conditions>
              <action type="Rewrite" url="{C:1}://external/{R:2}"></action>
          </rule>
      </rules>
      <outboundrules>
        <rule name="ReverseProxyOutboundRule2" precondition="ResponseIsHtml1">
          <match filterbytags="A, Form, Img" pattern="^/(.*)" negate="false"></match>
          <action type="Rewrite" value="http://test2/ext/{R:1}"></action>
        </rule>
        <rule name="ReverseProxyOutboundRule1" precondition="ResponseIsHtml1">
          <match filterbytags="A, Form, Img" pattern="^http://external/(.*)?" negate="false"></match>
          <action type="Rewrite" value="http://test2/ext/{R:1}"></action>
        </rule>
        <preconditions>
              <precondition name="ResponseIsHtml1">
                  <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html"></add>
              </precondition>
          </preconditions>
      </outboundrules>
    </rewrite>

There is no need to use visual configurer at all, you may just drop this snippet on web.config into appropriate section and it will start working straight away!


IIS URL Rewrite module - few SEO tricks



1. Canonicals - do 301 permanent redirect (this also works with HTTPS).






    

  


2. Rewrite URL lowercase - one more rewrite rule aiming SEO improvement.






3. Append trailing slash - this is another SEO (sometimes arguable) improvement. It is believed that having trailing slash on your URLs (except when it deals with file names for sure) will improve search engines ranking.










4. Query string rewrite - the example of extracting query string parameters and rewriting it in your own manner






    



5. Redirect to HTTPS - forcely redirect all non-secure request to HTTPS









6. Prevent image hot-linking - disallow strangers of reusing images hosted on your website in order to protect them and traffic.