
RELAY2 - multi-site clustering
==============================

Author: Bela Ban
JIRA:   https://issues.jboss.org/browse/JGRP-1433


The architecture of RELAY2 is similar to RELAY: relaying between sites is done via a protocol (RELAY2). The biggest
differences between RELAY2 and RELAY are:

* RELAY2 does *not* expose a virtual view to the application, e.g. if we have LON={A,B,C} and SFO={X,Y,Z}, then the
  view in LON is {A,B,C}, and *not* {A,B,C,X',Y',Z'} (where the primes are remote nodes).

* Clustering between multiple sites. The routing between sites is static, and needs to be defined in a
  configuration file. Let's look at an example:

      --------                    --------                      --------
     |        |      backup      |        |      backup        |        |
     |   NYC  |  <-------------  |   LON  |   -------------->  |  SFO   |
     |        |                  |        |                    |        |
      --------                    --------                      --------

  Here we have the active site London (LON), which backs up its data to New York (NYC) and San Francisco (SFO). SFO and
  NYC are also connected, but that's not shown here.

  Every address is by default local, e.g. B in the LON site is a simple UUID. A *site address* is a subclass (of UUID)
  and has a suffix, e.g. "sfo" added. So if we encounter a message M with destination address Y:sfo in LON, then we
  know we have to route M to Y:sfo to the SFO relay master (X, the member doing the relaying), who then forwards it to Y.

  Multicasts are by default not relayed; they're always local. This can be changed in a configuration option. If a
  multicast message M *is* relayed, we forward M to all attached sites, which then in turn forward it to all of their
  attached sites (excluding the originating site, to avoid cycles). Using this routing algorithm, we can build trees
  of sites, in a hierarchical fashion.

  Unicasts are relayed if the destination address is non-local, e.g. the address is a site address, and the site's
  suffix is not the same as the current site's name.

  There are 2 scenarios in which we send a unicast across sites: when a multicast is received from a different site and
  we want to send a response (e.g. a result of an RPC invocation), and if we use a 'special site address' (SiteMaster).
  The only special site address in the first iteration is a SiteMaster("sfo"). This is a moniker that points to the
  current relay master of site "sfo", in the example above, X.

  There is a use case in Infinispan, where we'll invoke (in LON) an RPC on {B,C,SiteMaster("sfo")}. This will send
  2 local unicasts (to B and C) and a remote unicast to X in SFO. There's probably going to be a retry mechanism,
  so if X fails, we'll retry sending the unicast to Y, and if Y fails, to Z. The retry mechanism will be hidden
  from Infinispan.

  If we cannot reach the target, e.g. because a max retry count has been exceeded, or the entire SFO site is down, a
  HOST-UNREACHABLE(Master("sfo")) message will be sent back to the originator of the RPC, flagging the response
  from SiteMaster("sfo") as 'suspected'. (Not yet sure as an entire site may be down, e.g. because it hasn't
  yet been started.

  The routing table in LON will have an entry for SFO and one for NYC; in this case, both entries point to a bridge
  (see RELAY.txt for the definition of a bridge). If there was a site LAX, attached to SFO, then the routing table
  in LON would also have an entry for LAX, pointing to SFO. This means that LAX is not directly accessible from LON,
  but needs to be routed via SFO.

  The routing table of NYC will also have 2 entries: 1 for LON and 1 for SFO, and the routing table in SFO is configured
  the same way.

  Routing as used by RELAY2 is similar to TCP/IP routing: direct hosts are local messages, and networks are bridges.
  Because we don't use IP addresses for routing, but UUIDs and SiteUUIDs instead, we have to build the routing tables
  based on the SiteUUID information.


* Routing

  When a site master creates the N channels which act as bridges, it uses the channels' view information to populate
  the routing table. E.g. if we're in LON and are connected to SFO and NYC, we'll have a view {xx:lon, yy:sfo, zz:nyc}.
  Members xx, yy and zz are the endpoints to which messages are to be sent for routing. From the above sample view, we
  can construct a routing table like the following:
  sfo: yy
  nyc: zz

  The suffixes of the view members are the routes and are used as keys in the routing table (internally we actually use
  IDs (shorts)).
  We omitted LON because we're in LON, and all messages to a LON destination are local.

  So when we want to send a message from LON to a member in SFO, we wrap the message and forward it to member yy, which
  in turn either forwards the message (not in this case), or sends it to a local member.




* Misc

- We used string suffixes in the examples above, but in reality we'll probably use shorts, which are mapped to site
  names in the configuration file. Thus, addresses always have the same (marshalled) length, and don't depend on the
  length of the suffix. Also, shorts are more efficient to do lookups on in a routing table.

- An RpcDispatcher flags messages to targets which are not in the current views as 'suspected'. To prevent RPCs to
  cross-site targets from being flagged as suspected, we need to change code in the RequestCorrelator
  (or perhaps Unicast/GroupRequest) slightly: we need to skip checking of site addresses.

  To nevertheless being able to terminate an RPC, we need to send HOST-UNREACHABLE back to RPC originators if a
  cross-site target is unreachable, e.g. because one of the relay masters crashed, the retry count was exceeded, or the
  target crashed.