Experience Sitecore ! | All posts tagged 'Reverse-Proxy'

Experience Sitecore !

More than 200 articles about the best DXP by Martin Miles

What is a Reverse Proxy and what do you need one for?

There are a variety of Reverse Proxy solutions on the market. You may have already heard about some:

Major cloud providers also have their proprietary solutions:


But what is a Reverse Proxy? Why "Reverse"?
As Wikipedia says, a common type of proxy server that is accessible from the public network. Large websites and content delivery networks use reverse proxies - together with other techniques - to balance the load between internal servers. Reverse proxies can keep a cache of static content, which further reduces the load on these internal servers and the internal network. It is also common for reverse proxies to add features such as compression or TLS encryption to the communication channel between the client and the reverse proxy.

Reverse proxies are typically owned or managed by the web service, and they are accessed by clients from the public internet. In contrast, a forward proxy is typically managed by a client (or their company) who is normally restricted to a private, internal network. The client can, however, access the forward proxy, which then retrieves resources from the public internet on behalf of the client. Here's a reverse proxy in action from a very high:


What are typical scenarios for using a Reverse Proxy?

1. SSL Offload. Let's assume we've got a website which works at HTTP only, and for some reason (legacy, gone developers or being unable bringing changes into a running solution that may huge or any other) it is not possible to change the website itself - "If it works, don't touch it" paradigm in action. For compliance, we must add HTTPS support for that website.
With using a Reverse Proxy it comes to a really quick and easiy solution - we don't need developers at all. All we need is asking our Ops professional asking him to instantiate a proxy server with SSL Termination. (obviously, we'll also need SSL certificates for domain hostname(s) of a given website). Job done!


2. Load Balancer
. Next, we'd want horizontally scaling that website and even deployed two equal client facing copies of it. How do we "split" traffic to distributing it equally to both sites? In this case we introduce a Proxy Server functioning as a Load Balancer.
But what if one of websited dies or crashes half way down the road? Load Balancer needs somehow to know each of the "boxes" functions well and react the outages by re-distributing traffic to the rest of mchines functioning well. This is traditionally implemented by "pinging" a such called "HealthCheck" URL on each particular box. As soon as one of the healthchecks keeps failing, an alert is raised and the traffic is no longer routed to a faulty box (be careful with sticky sessions!).



3. Cybersecurity enforcement
. Sending specially formed packets hackers can undertake a Deny-of-Service attack, when sending a request comes times cheaper than serving it back. At some moment your servers won't cope with this parasitic workload and will fail.
In order to prevent that, dangerous traffic should not reach your servers, being filtered at a proxy. Namely, a Firewall with an adequate rule set that filters out all patters with anomalies, raises alerts and bypasses the legitimate requests.


4. Caching and compressing
. Even with purely legitimate traffic beyond the proxy, one may still get a large request payload. But how comes? Well, there may be different reasons, such as usage patterns where all users navigate to the same heavy-loaded area of a website; alternatively the website itself could be written by junior developers who did not enough care about the way if functions in the most optimal way once deployed. Regardless of the reason, we could still soften things up by identifying some of the popular endpoint that consume much of server's resources and cache it up right at the proxy level. Of course assuming that a given set of parameters always returns that same result, there is no longer a need spending expensive server resources on producing the results we're already got in past and have effectively cached at the proxy level and never reach servers at all. If we however must ensure this traffic reaching the end servers, we could at least compress / encode the "last mile" beyond the proxy.


5. Smooth automated deployments
? Why not, have you ever heard of Blue-Green Deployments? With that in action end users won't even realize that you're upgrading the solution while they're browsing your site.


6. A/B Testing
. As a result of previous point, it may be a case you've updated some but not the all end servers. You do not want updating them all, instead you'd like to perform an A/B Testing on both sets and based on a result decide to complete an update or rollback to the most recent version. This would be a pretty valid scenario that a reverse proxy can do for you.


7. URL and Links Rewriting
. What if you have a legacy website that functions perfectly well, but similar to a scenario (1) the is no way (and need) of maintaining it. The development team has gone and in any case there is no single reason of investing a lot into a smth to be dismissed at some stage. At the same time you got another website(s) that either could be successor(s) for a legacy one, or some additional areas, written in isolation by more modern tools and thus either incompatible or expensive to merge with an existing solution. However the business wants everything to function with the same main domain name, just in different "folders" under it, so that end users (and search robots!) see no difference between consisting parts and naturally experience them both as being a single solid website.
Achieving that is also possible with a Reverse Proxy by rewriting URLs. Please note that not just a external request coming to site.com/company1 will be rewritten to www.company1.com but also all the internal URLs within all the requested pages need to be rewritten as well. Please note that it becomes possible only in conjunction of SSL offload, otherwise the traffic gets encrypted and proxy becomes "a man in the middle".

Not just that - once 6 years ago I wrote a walkthrough how one can achieve that same result purely and entirely by the means of IIS on Windows


Conclusion.
This article give a high-level explanation on Reverse Proxies and their primary features. It intentionally does not focus on a specific implementation avoiding going deeper in technical details.

In real-world solutions of course you will meet Reverse proxy solutions implementing several of the features combined. This may be a typical workflow for processing inbound traffic with a Reverse Proxy:



In any case, I hope you found this helpful!

Sitecore with SEO: overview and compare ways for managing duplicate content

In this blog post I decided to cover all ways of managing duplicate content in Sitecore and overview possible ways of dealing with that with emphasis on SEO. So, we have the following options to consider:

  1. Duplicates
  2. Clones
  3. Proxies
  4. Aliases
  5. IIS URL Rewrite module
  6. Sitecore Redirect Module
  7. External Reverse Proxy

1. Duplicates are commonly known and most straightforward way of creating duplicates (clones) of the items. The easiest way to perform that operation is to right click the item you'd want to copy, select Duplicate from context menu and specify new item's name.


This ends up with an entirely independent new item (and all its ancestors) located at the same level, including all field values, presentation details, permissions etc. Beware the locks and workflows - those also would be exact match of those original items have. After that, new item lives it own life and is no way synchronized to its original prototype (except Standard Values, for sure, as both new and duplicate items share the same template).

Also it's worth of mentioning Copy To - this brings similar behavior, but allows to create duplicates keeping same name but at other paths rather than original item. Copy To is available from the same context menu.


2. Clones are sort of similar to duplicates with the difference that no new item is created when using clones. To create a clone for a highlighted item from a Sitecore tree, select Configure tab, hit Clone and specify where you'd want to locate your clone.


Notice, that clones are displayed in content tree in a slightly light font color, I personally think that may create some future issues when business users may perform some actions on item without realizing that item is a clone. Why is that important to know? Let's view the way clones function on a lower level.

When you create a clone the item and the values are not physically copied. Instead, the inheritance similar to the one between Standard Values (that is sort of prototype item for a template) and real template item, is created (clone inherits not from s.v. but from original item). When you modify a filed value of original item, that would affect same field of cloned item. However the reverse process, when you modify a field value on cloned item, overrides that individual field value and it is no more tied to original item's field. Other fields of the same item will still keep the reference to their originals. Clones use the __Source field of the Advanced section from standard template to specify the cloned item:


Unlike duplicates, clones do not clone most of standard fields (those coming from standard template) like locks / workflows and statistics (created, updated, revision). But they do clone security settings, which, again, can be overridden for a clone item.

If you want to get rid of clone item - there are 2 ways to do that: just delete the clone (obvious) and unclone it. Uncloning turns cloned item into a normal item and copies field values from originals. Clones exist only in master database, when you perform publishing to CD servers - uncloning takes place there.

You also can do some crazy things like creating clones of clones - inheritances chain take place in this case; each field at each level can be overridden, for sure.

To get even more understanding on how clones work in Sitecore I recommend reading Cloning What Ifs article.


3. Proxies is another mechanism of creating and managing duplicate content in Sitecore. The are frequently used in cases similar when you have an item that you may want to be a child of multiple parent items. In order to use them you must ensure a config file setting called proxiesEnabled is true; then you create proxy items at /Sitecore/System/Proxies based on /System/Proxy template. However, proxies considered to be outdated in favor of Clones. Please do not use Proxies!


4. Aliases are the different beast. They are perfectly good for promos and campaigns as the normally specify a quick URL for campaign landing page. Aliases have out-of-box limitation that they are set only per root level and not multisite-friendly (however there is a link that explains how to implement that feature on your own).

Aliases are defined under /sitecore/system/Aliases based on the System/Alias template.

There is just one field in alias template that allows to select target item.

There are two more overheads when working with aliases - sometimes you may need to identify if an item is alias:

bool isAlias = Context.Database.Aliases.Exists(path);

Also you may need to set canonical URLs on them to improve SEO. Good way of doing that is:

public class AliasResolver : Sitecore.Pipelines.HttpRequest.AliasResolver
{
    public override void Process(HttpRequestArgs args)
    {
        base.Process(args);

        if (Context.Item != null)
        {
            args.Context.Items["CanonicalUrl"] = Context.Item.GetFullUrl(args.Context.Request.Url);
        }
    }
}

Also, do not forget to publish your aliases to content delivery databases, as they won't work until published.



5. IIS URL Rewrite module is probably most functional option, it is external to Sitecore, that means it happens before routing and before pipelines.

For the drawbacks of using IIS URL Rewrite I would mention that you'd need to have access to IIS Manager or web.config write permission on each of content delivery servers. I previously wrote a blog post IIS URL Rewrite module - few SEO tricks that can demonstrate how powerful it is.

Also I would beware you of some specific Sitecore URLs and create appropriate extensions (ex. for WebResource.axd - take from real code).



6. Sitecore Redirect Module is another good choice as it does perfect server side 301 redirect for both URLs and items. It is almost as powerful as IIS URL Rewrite, but because it is configured in Sitecore - you do not need to have CD environments access at all - just create and publish redirect rules (as you normally do with generic content) - they will take effect immediately! Module is transparent to multi-site configuration, it can do redirects from one site's URL or item to another.

One more advantage of the module - availability of source code, so functionality can be extended to any bespoke requirement, also it becomes compatible with new Sitecore versions by just rebuilding it with appropriate Sitecore.Kernel.dll and replacing updated module DLL in webroot bin folder.

The only drawback, probably, is that in default state it performs only 301 redirects (however you may implement whatever you require). Please remember, that 301 requests are cached by browser -so you you are testing it intensively - you may need to purge browser cache from time to time.


7. External Reverse Proxy can be another option. It can do not only rewrites to external websites, but also rewrite some requests to alternative internal URL and pass that to IIS as "given" and further down to Sitecore. I met such scenarios several times on projects I took part. By the way, did you know that IIS can also serve as reverse proxy?

Performing rewrites and URL resolving logic outside of Sitecore can be both advantageous and disadvantageous. What traps does it bring?

Well, imagine you are new developer who start working on a new working copy of source code. When you run locally you may have different URL patterns compared to those on production environment. Business users usually deal with external production URLs and do not know internal structure, so that is how they form tasks and change requests. If you are not enough lucky to have comprehensive documentation or senior colleagues who can explain how is that configured - you may end up in multiple puzzling hours of attempting to find and match URLs from different environments.

Also, SEO much relies on sitemaps, so if you are using dynamic sitemaps - you need to implement that custom URL resolving logic that you have on reverse proxy. Also Sitemap Module from the Market would not work for you in that case.


I hope this article helped you to understand you options are with their pros and cons and to pick up a proper implementation depending on exact scenario.

IIS URL Rewrite module - as reverse proxy with links rewrite

Not many people know that IIS itself can serve as Reverse Proxy, with rewriting URLs on-the-fly. We are going to take a look on how to configure that feature. Let's assume we have 2 websites - primary website that has URL http://test2/ and is a hosted by IIS, moreover there is an instance of Sitecore installed; and another external static website that has URL http://external/ and it has few static pages and resources. For this experiment I got external website hosted at the same IIS instance, while in reality it can be literary anything and anywhere.


Apart from having IIS, you will need the following prerequisite:

- URL Rewrite Module installed, version 2.0

- Application Request Routing version 2.0


The easiest way to get all the prerequisites is to install them through Web Platform Installer. It will install all of them so you'll just need to have IIS refreshed and get ready to start.



External website contains static.html file with the following code

<div>
    img/sitecore.png<br>
    <img src="img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    /img/sitecore.png<br>
    <img src="/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    http://external/img/sitecore.png<br>
    <img src="http://external/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<p>
    <a href="sitecore.zip">sitecore.zip</a><br>
    <a href="/sitecore.zip">/sitecore.zip</a><br>
    <a href="http://external/sitecore.zip">http://external/sitecore.zip</a><br>
</p>

This code has 3 images and 3 links to an archive file, each of them is either relative link (from the doc level, for sure) or absolute link (from web root) or fully qualified link including domain name and protocol. This HTML renders renders into the following screenshot:


Our objective is to have a "virtual" "folder" called ext on the test2 website so that it "mapped" to external website and also correctly "maps" and rewrites all the resources of external website on resulting page.

Example:

When we hit http://test2/ in browser - we get default Sitecore page as it is provided by Test2 website, as normally.

When we hit http://test/ext/static.html - we get the page at that URL but with the content of external/static.html page with all links and references rewritten to be test2/ext/*.* instead of external/*.*

So, to make IIS Rewrite work as reverse Proxy, let's do the following steps:


Make sure "Enable proxy"is checked, otherwise nothing will work.


In URL Rewrite section, click "Add Rule(s)" link, then from popup screen select "Reverse Proxy" and specify the rule. Also check outbound rules as the are rules that factually rewrite internal links. Please note that this function may add some overhead to your website performance.


After you specify the rules - one inbound and 2 outbound (they are shown below) - reverse proxy now functions and you may verify that by requesting the following ULR (as on the screenshot below):


Notice, that all links and images look correct, as the were before. To ensure they were rewritten correctly, let's view the source file of resulting page. Here is it:

<div>
    img/sitecore.png<br>
    <img src="img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    /img/sitecore.png<br>
    <img src="http://test2/ext/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<div>
    http://external/img/sitecore.png<br>
    <img src="http://test2/ext/img/sitecore.png" alt="sitecore" width="230" height="106">
</div>
<p>
    <a href="sitecore.zip">sitecore.zip</a><br>
    <a href="http://test2/ext/sitecore.zip">/sitecore.zip</a><br>
    <a href="http://test2/ext/sitecore.zip">http://external/sitecore.zip</a><br>
</p>

As there were no need to rewrite relative URLs - they remain untouched. However root-folder URL and full URL were rewritten to satisfy new domain name and desired folder-path.

And finally, here is resulting configuration that makes it all work. Whatever we have previously configured is stored in the configuration file within system.webserver node in rewrite section:

<rewrite>
      <rules>
        <clear></clear>
          <rule name="ReverseProxyInboundRule2" stopprocessing="true">
            <match url="(ext)/(.*)?"></match>  
              <conditions>
                  <add input="{CACHE_URL}" pattern="^(https?)://"></add>
              </conditions>
              <action type="Rewrite" url="{C:1}://external/{R:2}"></action>
          </rule>
      </rules>
      <outboundrules>
        <rule name="ReverseProxyOutboundRule2" precondition="ResponseIsHtml1">
          <match filterbytags="A, Form, Img" pattern="^/(.*)" negate="false"></match>
          <action type="Rewrite" value="http://test2/ext/{R:1}"></action>
        </rule>
        <rule name="ReverseProxyOutboundRule1" precondition="ResponseIsHtml1">
          <match filterbytags="A, Form, Img" pattern="^http://external/(.*)?" negate="false"></match>
          <action type="Rewrite" value="http://test2/ext/{R:1}"></action>
        </rule>
        <preconditions>
              <precondition name="ResponseIsHtml1">
                  <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html"></add>
              </precondition>
          </preconditions>
      </outboundrules>
    </rewrite>

There is no need to use visual configurer at all, you may just drop this snippet on web.config into appropriate section and it will start working straight away!