status file structure --------------------- Each host in the CAP is responsible of his own status file. This file 'lives' in /var/lib/lliurex-cap/CAPNAME/status/CAP_FQDN.cap The status file of every host is propagated via csync2/rsync to the rest of the CAP The status file is 5 lines long, and looks like this: NEW_TIMESTAMP format is `date +%Y%m%d%H%M%S`[01] OLD_TIMESTAMP format is `date +%Y%m%d%H%M%S`[01] (100000000000000 for new members) HOST_RANK format is [:digit:] RND fomat is [:digit:]. It will be generated with RANDOM=$$;$(($RANDOM%10) HOST_STATUS fomat is [:digit:] Currently, the defined HOST_STATUS are: 0: waiting 1: standard The "extra" digit of timestamps is explained in the challenges section The "computated weight" for a host is just the concatenation of the first 4 lines. The heaviest host became the CAP leader, so it's supposed that the leader verifies the following conditions: 1. is updated with the last value of NEW_TIMESTAMP 2. it was up in the latest "challenge" (no TIMESTAMPS losts) 3. it has the highest rank available 4. he's a lucky guy :-) challenges ---------- when an update in status directory is detected (the detection is triggered by csync2 "action" statement), each host checks his status file and updates it to the NEW_TIMESTAMP if required. The OLD_TIMESTAMP of the rest of hosts is also tested, so every host can detect "missing" challenges. In this case, the host becomes "waiting" (for the new leader) When a host starts a challenge, the last digit of NEW_TIMESTAMP is set to 0. When a host "replies" the challenge, sets his NEW_TIMESTAMP to the same value but replaces the last 0 by 1. The change can be used to detect "accepted" challenges. In fact, zero-terminated OLD_TIMESTAMP's are "ignored" (with the notable exception of new joined and "just started" hosts) because represent "isolated host challenges" waiting for leader ----------------- A host enters this state as result of one of this events: 1. at init (after propagate pending changes) 2. just after joinning a new CAP 3. when a missed challenge is detected. In this state, the host waits "CAP_CHALLENGE_WAIT" time for challenge's responses When the waiting host "wakes up", computates current status files to determine leadership. The host initiates an rsync-data request against challenge's new leader (if any), goes to "standard" status, and initiates a new challenge. The latest challenge gives an oportunity to the just updated machine to get leadership status. cap-cron-script ----------- Each CAP host runs a cron script every CAP_CRON_INTERVAL to check current time against current CAP TIMESTAMP If elapsed time is greather than CAP_TIME_TOLERANCE, the host waits a time inversely proportional to his total weight and initiates a new challenge (the inversely proportional restriction gives "some advantage" to the leader). If the host is in waiting state, the script does not initiate a challenge, but checks if CAP_CHALLENGE_WAIT has been exceeded to start the rsync-data request init considerations ------------------- 1. cap-cron-script is disabled at boot time 2. try to propagate all unsynced changes. TODO: may be with -x (checking) or just with -u (dirty files only) 3. start a new challenge, become "waiting" and enable cron script rsync-data considerations ------------------------- In the rsync-data transfers, local (but NOT remote) locks are excluded clones.cap ---------- The clones.cap file describes the "data" we wanto to clone across the CAP. All the cloned dirs are relative to CAP_CLONES_ROOT configuration variable (/net by default). This directory will be created (if required) in every CAP host. Each 'clone' must be just ONE subdirectory of CAP_CLONES_ROOT, NO MORE LEVELS are allowed, but every sudirectory of the clone will be propagated. If you need to bypass this ONE LEVEL restriction, create a directory under CAP_CLONES_ROOT and use 'mount --bind ...' command (much like in the way that NFS4 works ....). The format of the file is just a 'clone per line'. Please avoid using of '/', and use just the subdirectory name under CAP_CLONES_ROOT. Every clone 'name' will be replicated across CAP hosts as a directory under CAP_CLONES_ROOT.