Many businesses and agencies need to sync files between Linux servers in the same data center or between data centers around the globe. EnduraData EDpCloud Linux replication and synchronization tools help us achieve this.
In this blog, I cover a few configuration scenarios allowing us to synchronize data between Linux servers in Minneapolis, Chicago, London, and Rome. The Minneapolis Linux server replicates files to Chicago, London and Rome. Replication can be in real-time, on-demand or on a scheduled basis.
For example a financial institution located in Minneapolis can send all data changes that occure in Minneapolis to the other three sites.
Linux file synchronization
Our objective is to set up Linux file synchronization to mirror and synchronize data from the Linux server in Minneapolis with the other Linux servers in data centers in Chicago, London, and Rome. Therefore, any file changes in the Linux server in Minneapolis are replicated to the other Linux servers using incremental copies of only the delta file changes. This kind of replication is also known as delta or bloc file replication. The data is compressed and encrypted by default in transit.
The following are some options available to the system administrators to configure the Linux file synchronization tools and suite:
- Create one link or replication set for all the Linux servers with Minneapolis as the replication source and with the others as the Linux data replication targets, also known as destinations or receivers
- Create a file replication set or link for each Linux replication destination.
Contact An Engineer Now
1. Using one replication set to sync files securely between remote Linux servers
The following is an XML configuration that syncs data from the Linux server in Minneapolis to all the other Linux servers.
1.1 Replicating data using one link only
Figure 1: We store the following content in the eddist.cfg configuration file.
<?xml version="1.0" encoding="UTF-8"?>
<config name="Network_Configuration" rowsize="8192">
<link name="outgoing" isrealtime="1" workers="4" password="foo">
<sender hostname="mpls" alias="*" />
<receiver hostname="chicago" storepath="/home/incoming/chicago"/>
<receiver hostname="london" storepath="/home/incoming/london"/>
<receiver hostname="rome" storepath="/home/incoming/rome"/>
</link>
</config>
The directive and keywords in the configuration above instruct the Linux replication software to do the following:
- The replication set or link name is called outgoing
- The sending Linux server in Minneapolis is a host with mpls as its hostname
- host mpls replicates data in real time (mpls monitors changes to the file systems and synchronizes the file changes with the remote Linux servers; See section 1.2 for more)
- The Linux replication receiving server or called chicago stores data received from mpls in /home/incoming/chicago
- The Linux replication receiver called london stores data received from mpls in /home/incoming/london
- The Linux replication receiver called rome stores replicated data from mpls in /home/incoming/rome
Creating one replication set to sync files to multiple remote sites works but is not advised for reasons I list at the end of this post.
1.2 Creating a configuration for Linux real-time file system replication monitoring
The following is the content of the Linux real-time file replication file system monitor (edfsmonitor.cfg). The file replication monitor service will use this configuration file to decide which paths to monitor in real time for changes in the data or meta data.
Figure 2: Content of edfsmonitor.cfg (Linux real time file system monitor)
/home/code/svn
/home/users
/data/outgoing
/nfsserver/finacial/models/run
The content of edfsmonitor.cfg has a list of all directories monitored in real time, and any changes made to these file systems on the mpls Linux server are sent to all the other servers in London, Rome, and Chicago.
This configuration allows the file system monitor to capture data changes that the file synchronization services will queue for remote filesync.
2. Using multiple replication sets to sync files between Linux servers
This is the recommended configuration to replicate files between Linux servers described above.
This configuration will replicate data from mps to three locations: chicago, london and rome.
Figure 3: A better Linux file sync alternative to mirroring files from one site to multiple sites.
<?xml version="1.0" encoding="UTF-8"?>
<config name="Network_Configuration" rowsize="8192">
<link name="chicago" isrealtime="1" workers="4" password="foo">
<sender hostname="mpls" alias="*" />
<receiver hostname="chicago" storepath="/home/incoming/chicago"/>
</link>
<link name="london" isrealtime="1" workers="4" password="foo">
<sender hostname="mpls" alias="*" />
<receiver hostname="london" storepath="/home/incoming/london"/>
</link>
<link name="rome" isrealtime="1" workers="4" password="foo">
<sender hostname="mpls" alias="*" />
<receiver hostname="rome" storepath="/home/incoming/rome"/>
</link>
</config>
What we have done in figure 3 is to create one replication set or link for each Linux server in each city. This type of configuration has several advantages.
3. On-demand file replication and synchronization
The on demand replication can simply be issued from the command line, the scheduler, the GUI or other applications.
Example:
To replicate data under path name /data/minneapolis/out/financials/u187dld to receiver london, using replication set name london we use the following command:
edq -l london -r london -n /data/minneapolis/out/financials/u187dld
edmfq command is a more intelligent version of edq.
You can:
- pause replication sets independently of each other
- resume replication sets independently of each other
- perform initial file sync of all sets independently of each other
- monitor each Linux replication set separately
- troubleshoot each replication set alone
- generate replication history for each link in a different file
- cancel replication for each link or receiver
- use different include or exclude replications to restrict what is sent by the Linux server in Minneapolis and what data is received by the Linux servers in Chicago, London, or Rome
- replication takes advantage of parallelism for disk I/O, network I/O, and CPU scheduling.
See also:
https://www.enduradata.com/including-excluding-files-data-synchronization-online-backup/
Share this Post